# Ranking Toxic Comments using LSTM network maybe?

So while ray is trying the more efficient approach of using naive-bayes classification, I'd try training an entire LSTM to solve this problem... in jupyter notebook... so uh.. yeah

In [1]:
# Import
import tensorflow as tf
import pandas as pd
import numpy as np
from tensorflow.keras import Model
from tensorflow.keras.layers import Input, Embedding, GRU, Dense, Dropout
from tensorflow.keras.losses import CategoricalCrossentropy
from sklearn.preprocessing import OneHotEncoder

My plan is to use two LSTM networks, to encode both comments and output a one-hot vector with 2 values, one for if it's greater and one for if it's less. Given that the dataset only identifies if one is greater than the other, we can only make determinations based on that, so the network reflects that

Something like this:

![Toxicity LSTM Network](toxicity-lstm-network.svg)

I think the size of each layer can be determined by some hyperparameter tuning (or just guess & check since I'm not google and don't have infinite compute power)

But first we need to download and extract the data. This requires `unzip` if you're on mac/linux, and `7z` if you're on Windows

In [2]:
%%script false --no-raise-error
%%bash
# Download and extract data (run if on Linux/MacOS)
rm -rf data
mkdir data
cd data
kaggle competitions download --force -c jigsaw-toxic-severity-rating
unzip jigsaw-toxic-severity-rating.zip

Couldn't find program: 'false'


In [3]:
%%cmd
rmdir /S/Q data
mkdir data
cd data
kaggle competitions download --force -c jigsaw-toxic-severity-rating
7z.exe x jigsaw-toxic-severity-rating.zip

Microsoft Windows [Version 10.0.19043.1415]
(c) Microsoft Corporation. All rights reserved.

(env) C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle>rmdir /S/Q data

(env) C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle>mkdir data

(env) C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle>cd data

(env) C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\data>kaggle competitions download --force -c jigsaw-toxic-severity-rating
Downloading jigsaw-toxic-severity-rating.zip to C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\data


100%|##########| 6.72M/6.72M [00:00<00:00, 17.2MB/s]




(env) C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\data>7z.exe x jigsaw-toxic-severity-rating.zip

7-Zip 19.00 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2019-02-21

Scanning the drive for archives:
1 file, 7041334 bytes (6877 KiB)

Extracting archive: jigsaw-toxic-severity-rating.zip
--
Path = jigsaw-toxic-severity-rating.zip
Type = zip
Physical Size = 7041334

Everything is Ok

Files: 3
Size:       28845412
Compressed: 7041334

(env) C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\data>

Now that we have our data, we can create a function to pull it and convert it into a tensorflow dataset for our model. The worker is the id of the person that scored the comments. They're not really important so we can ignore that column. So for each row, we'll need the row as is and also the reverse of the row (a,b and b,a). This way we can train the model to understand both greater than and less than cases. Otherwise all of these outputs will be just 1. Then we need to encode our comments into a form our network can understand. We can do this using `tf.keras.layers.TextVectorization`. This will learn the best way to encode our characters by analyzing the dataset

In [4]:
def input_data(train_frac=0.7, shuffle=200, batch=200, repeat=3, display=False):
    """
    Extract and preprocess data for trainer
    
    :param shuffle: size of groups to shuffle rows in
    :param batch: size of batches to segment data into
    :param repeat: number of times to repeat dataset
    :param display: if true, print a sample of data
    """
    # Pull data from csv
    print('Pulling data...')
    csv_data = pd.read_csv('data/validation_data.csv')
    csv_data = csv_data[['less_toxic', 'more_toxic']]
    
    # Our inputs are labeled as "more toxic" and "less toxic"
    # but we want to pass in both with their comparision being
    # unknown, as the network is supposed to figure that out.
    # So, we create two sets, one of which has the order swapped
    # and we name both sequence columns as "sequence A" and 
    # "sequence B" We then assign a label to each, with the 
    # original having 'greater' and the swapped one having 'less'. 
    # Therefore, the network will see both cases for each input 
    # in the dataset and can train for both
    print('Generating labeled data...')
    labeled_data_greater = csv_data.copy()
    labeled_data_greater.rename(
        columns={'less_toxic': 'seq_a', 'more_toxic': 'seq_b' }, 
        inplace=True)
    labeled_data_greater['label'] = 'greater'
    labeled_data_less = csv_data.copy()
    labeled_data_less.rename(
        columns={ 'more_toxic': 'seq_a', 'less_toxic': 'seq_b' }, 
        inplace=True)
    labeled_data_less['label'] = 'less'
    labeled_data = pd.concat([ labeled_data_greater, labeled_data_less ])
    labeled_data = labeled_data.sample(frac=1)
    
    # Now we take all sequences of characters and convert them to sequences 
    # of integers. We can do that using keras's TextVectorization preprocessing 
    # layer, which will take a string and spit out an array of integers 
    # encoding the woprds of the string. This layer will need to scan 
    # the dataset to determine the appropriate encoding vocabulary for the 
    # characters. Since both sequences essentially contain all of the data, 
    # we can just use one of the sequences for the TextVectorization to scan.
    print('Encoding inputs...')
    seq_a = tf.constant(labeled_data['seq_a'].values.reshape(-1,1))
    seq_b = tf.constant(labeled_data['seq_b'].values.reshape(-1,1))
    encoder = tf.keras.layers.TextVectorization(
        standardize=None,
        ragged=True)
    encoder.adapt(seq_a)
    vocab_size = encoder.vocabulary_size()
    if display:
        print('Vocab size:', vocab_size)
    seqint_a = encoder(seq_a)
    seqint_b = encoder(seq_b)
    
    # Then we can one-hot encode our labels using scikit-learn's 
    # OneHotEncoder class. Since we know our labels ahead of time, 
    # I figured we don't need to train it. HOWEVER, scikit-learn 
    # doesn't seem to think so, as it expects us to call fit on 
    # the data anyway... so yeah.
    print('Encoding labels...')
    label_encoder = OneHotEncoder(
        categories=[['greater', 'less']], 
        handle_unknown='ignore')
    label_array = labeled_data['label'].values.reshape(-1, 1)
    label_encoder.fit(label_array)
    labels = label_encoder.transform(label_array)
    labels = tf.constant(labels.toarray())
    
    # Create dataset. Shuffle, batch, repeat, etc.
    print('Creating dataset...')
    dataset = tf.data.Dataset.from_tensor_slices(
        ((seqint_a, seqint_b), labels))
    dataset = dataset.shuffle(shuffle)
    dataset = dataset.batch(batch)
    dataset = dataset.repeat(repeat)
    
    # Display a sample
    if display:
        print('Dataset spec:', dataset)
        for (seq_a, seq_b), output in dataset.take(1):
            print(f'Sample input sequence A:', seq_a[0])
            print(f'Sample input sequence B:', seq_b[0])
            print(f'Sample output labels:', output[0])
            
    # Split into training and testing data
    print('Splitting into training and testing...')
    train_num = int(train_frac*len(dataset))
    train_dataset = dataset.take(train_num)
    test_dataset = dataset.skip(train_num)
    
    # Return training data, testing data, and vocab size
    print('All done :-)')
    return train_dataset, test_dataset, vocab_size
    
# Run input data function to test it out
train_dataset, test_dataset, vocab_size = input_data(display=True)

Pulling data...
Generating labeled data...
Encoding inputs...
Vocab size: 97647
Encoding labels...
Creating dataset...
Dataset spec: <RepeatDataset shapes: (((None, None), (None, None)), (None, 2)), types: ((tf.int64, tf.int64), tf.float64)>
Sample input sequence A: tf.Tensor([  417   167  6872  2993  7660 43622], shape=(6,), dtype=int64)
Sample input sequence B: tf.Tensor(
[ 3299 83209  1469    28    28    15    40 19753     5  3634    39   264
    30     4  7004    32   638    24   462  1847   272    45     2 82892
    98   190    28 27332   395    61  1069     8     2   187     3    20
    29     2  3299 83210    39    64     8  4503  2669   832    16     4
  2329 91941], shape=(50,), dtype=int64)
Sample output labels: tf.Tensor([0. 1.], shape=(2,), dtype=float64)
Splitting into training and testing...
All done :-)


After this. We build the model using Keras's framework, train it and then validate it on the test set.

The original problem calls for ranking comments based on toxicity, so we can use this network as a comparator function to sort the list of toxic comments. So, first, the network. I'm creating a function which would return a model based on parameters. This will be used for the optimization step.

In the last minute, I decided that, instead of an LSTM network, I would be using a GRU network.

If you don't like that decision I will stuff you in the crust

In [5]:
def create_model(vocab_size, embed_units=128, recur_units=64, dense_units=64, dropout_rate=0.1):
    """
    Create model using parameters
    """
    # A LSTM network
    input_a = Input((None,), name='input_a')
    embed_a = Embedding(vocab_size, embed_units, name='embed_a')(input_a)
    recur_a = GRU(recur_units, name='recur_a')(embed_a)
    drop_a = Dropout(dropout_rate, name='drop_a')(recur_a)
    
    # B LSTM network
    input_b = Input((None,), name='input_b')
    embed_b = Embedding(vocab_size, embed_units, name='embed_b')(input_b)
    recur_b = GRU(recur_units, name='recur_b')(embed_b)
    drop_b = Dropout(dropout_rate, name='drop_b')(recur_b)
    
    # Concatenation and dense layers
    concat = tf.concat([ drop_a, drop_b ], axis=1, name='concatenate')
    dense = Dense(dense_units, activation='relu', name='dense')(concat)
    drop_d = Dropout(dropout_rate, name='drop_d')(dense)
    output = Dense(2, activation='softmax', name='labels')(drop_d) # 2 labels in the output layer
    
    # Final model configuration
    model = Model([input_a, input_b], output)
    model.summary()
    model.compile(
        optimizer='adam',
        loss=CategoricalCrossentropy(),
        metrics=['accuracy'])
    return model

# Horay model created
tf.keras.backend.clear_session()
model = create_model(vocab_size)

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_a (InputLayer)           [(None, None)]       0           []                               
                                                                                                  
 input_b (InputLayer)           [(None, None)]       0           []                               
                                                                                                  
 embed_a (Embedding)            (None, None, 128)    12498816    ['input_a[0][0]']                
                                                                                                  
 embed_b (Embedding)            (None, None, 128)    12498816    ['input_b[0][0]']                
                                                                                              

Now we fit

In [6]:
model.fit(train_dataset, epochs=1)



 31/634 [>.............................] - ETA: 1:59:13 - loss: 0.6892 - accuracy: 0.5410

ResourceExhaustedError: 2 root error(s) found.
  (0) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[200,1000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node gradient_tape/model/recur_a/transpose/transpose
 (defined at C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py:464)
]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

	 [[gradient_tape/model/recur_a/RaggedToTensor/strided_slice/_180]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) RESOURCE_EXHAUSTED:  OOM when allocating tensor with shape[200,1000,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node gradient_tape/model/recur_a/transpose/transpose
 (defined at C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py:464)
]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_9422]

Errors may have originated from an input operation.
Input Source operations connected to node gradient_tape/model/recur_a/transpose/transpose:
In[0] gradient_tape/model/recur_a/TensorArrayUnstack/TensorListStack:	
In[1] gradient_tape/model/recur_a/transpose/InvertPermutation:

Operation defined at: (most recent call last)
>>>   File "c:\program files\python38\lib\runpy.py", line 193, in _run_module_as_main
>>>     return _run_code(code, main_globals, None,
>>> 
>>>   File "c:\program files\python38\lib\runpy.py", line 86, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
>>>     app.start()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
>>>     self.io_loop.start()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "c:\program files\python38\lib\asyncio\base_events.py", line 570, in run_forever
>>>     self._run_once()
>>> 
>>>   File "c:\program files\python38\lib\asyncio\base_events.py", line 1859, in _run_once
>>>     handle._run()
>>> 
>>>   File "c:\program files\python38\lib\asyncio\events.py", line 81, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 461, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 450, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 357, in dispatch_shell
>>>     await result
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 652, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
>>>     res = shell.run_cell(code, store_history=store_history, silent=silent)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\zmqshell.py", line 532, in run_cell
>>>     return super().run_cell(*args, **kwargs)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 2768, in run_cell
>>>     result = self._run_cell(
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 2814, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 3012, in run_cell_async
>>>     has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 3191, in run_ast_nodes
>>>     if await self.run_code(code, result, async_=asy):
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 3251, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\AppData\Local\Temp\ipykernel_10760\1211049778.py", line 1, in <module>
>>>     model.fit(train_dataset, epochs=1)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 1216, in fit
>>>     tmp_logs = self.train_function(iterator)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 816, in train_step
>>>     self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 530, in minimize
>>>     grads_and_vars = self._compute_gradients(
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 583, in _compute_gradients
>>>     grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 464, in _get_gradients
>>>     grads = tape.gradient(loss, var_list, grad_loss)
>>> 

Input Source operations connected to node gradient_tape/model/recur_a/transpose/transpose:
In[0] gradient_tape/model/recur_a/TensorArrayUnstack/TensorListStack:	
In[1] gradient_tape/model/recur_a/transpose/InvertPermutation:

Operation defined at: (most recent call last)
>>>   File "c:\program files\python38\lib\runpy.py", line 193, in _run_module_as_main
>>>     return _run_code(code, main_globals, None,
>>> 
>>>   File "c:\program files\python38\lib\runpy.py", line 86, in _run_code
>>>     exec(code, run_globals)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel_launcher.py", line 16, in <module>
>>>     app.launch_new_instance()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\traitlets\config\application.py", line 846, in launch_instance
>>>     app.start()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelapp.py", line 677, in start
>>>     self.io_loop.start()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\tornado\platform\asyncio.py", line 199, in start
>>>     self.asyncio_loop.run_forever()
>>> 
>>>   File "c:\program files\python38\lib\asyncio\base_events.py", line 570, in run_forever
>>>     self._run_once()
>>> 
>>>   File "c:\program files\python38\lib\asyncio\base_events.py", line 1859, in _run_once
>>>     handle._run()
>>> 
>>>   File "c:\program files\python38\lib\asyncio\events.py", line 81, in _run
>>>     self._context.run(self._callback, *self._args)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 461, in dispatch_queue
>>>     await self.process_one()
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 450, in process_one
>>>     await dispatch(*args)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 357, in dispatch_shell
>>>     await result
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\kernelbase.py", line 652, in execute_request
>>>     reply_content = await reply_content
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\ipkernel.py", line 353, in do_execute
>>>     res = shell.run_cell(code, store_history=store_history, silent=silent)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\ipykernel\zmqshell.py", line 532, in run_cell
>>>     return super().run_cell(*args, **kwargs)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 2768, in run_cell
>>>     result = self._run_cell(
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 2814, in _run_cell
>>>     return runner(coro)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\async_helpers.py", line 129, in _pseudo_sync_runner
>>>     coro.send(None)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 3012, in run_cell_async
>>>     has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 3191, in run_ast_nodes
>>>     if await self.run_code(code, result, async_=asy):
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\IPython\core\interactiveshell.py", line 3251, in run_code
>>>     exec(code_obj, self.user_global_ns, self.user_ns)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\AppData\Local\Temp\ipykernel_10760\1211049778.py", line 1, in <module>
>>>     model.fit(train_dataset, epochs=1)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 1216, in fit
>>>     tmp_logs = self.train_function(iterator)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\engine\training.py", line 816, in train_step
>>>     self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 530, in minimize
>>>     grads_and_vars = self._compute_gradients(
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 583, in _compute_gradients
>>>     grads_and_vars = self._get_gradients(tape, loss, var_list, grad_loss)
>>> 
>>>   File "C:\Users\Anshul Kharbanda\Documents\GitHub\toxicitykaggle\env\lib\site-packages\keras\optimizer_v2\optimizer_v2.py", line 464, in _get_gradients
>>>     grads = tape.gradient(loss, var_list, grad_loss)
>>> 

Function call stack:
train_function -> train_function
