# Using Deep Learning for Essay Score Prediction - Transformers
## Overview

BERT for text regression

#### Approach
- Split this data set into two sets - one for training our DL model, and one for evaluation  
- Use Keras to create BERT Model with multiple layers. We will train this model on both CPU environments  
- Evaluate and test the model on the test set and look at a few individual examples

In [89]:
import numpy as np
import pandas as pd
import os, re, time
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn import preprocessing

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)

from multiprocessing import cpu_count
print(cpu_count())

16


In [90]:
# %pip install pandarallel
import multiprocessing

num_processors = multiprocessing.cpu_count()
print(f'Available CPUs: {num_processors}')

import pandarallel
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=num_processors-1, use_memory_fs=False, progress_bar=True)

Available CPUs: 16
INFO: Pandarallel will run on 15 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.


In [91]:
import tensorflow as tf
from tensorflow import keras

from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from tensorflow.keras.layers import Embedding
from tensorflow.keras import layers
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.layers import Dense, Input, LSTM, Embedding, Dropout, Activation, GRU, Convolution1D, Flatten, LeakyReLU
from keras.layers import Bidirectional, GlobalMaxPool1D, GlobalAveragePooling1D, MaxPooling1D, GlobalMaxPooling1D
from keras.layers import SpatialDropout1D, MaxPooling1D, Bidirectional, GRU, concatenate, BatchNormalization
from tensorflow.keras.models import Model, Sequential

In [92]:
df_proc=pd.read_csv('../00_gcp_data/preprocessed-essay.csv')
# df_proc.head()

In [93]:
pd.options.display.max_colwidth=None
df_proc[['corrected_text','lemmatized_text']].sample(2)

Unnamed: 0,corrected_text,lemmatized_text
866,opinion people seek guidance experts authorities life important matter instead making their decisions people may agree personally strongly agree opinion important believe asking sharing ideas lead making better decisions three reasons seeking guidance experts authorities better making decisions addition confusion one common reasons people end making bad decision lot people might know going make good decision probably first time going situation never position make decisions young confused going well until asked someone help started making desitionsions also leads better communicate others gives better understanding making decision others helps began speak decisions closest friends helped make feel going alone always made sure decision understand decision taking even making next move honestly helped lot furthermore makes life less stressful talked stressing instead bottling like stress relive example big heavy backpack full books inside get way take books backpack feels less heavy would feel relive talking stressing make decision hard conclusion make decision understand communicate feel relived decision end day would feel good decision sure people around three reasons talking experts authorities life important,opinion people seek guidance expert authority life important matter instead make their decision people may agree personally strongly agree opinion important believe ask share idea lead make well decision three reason seek guidance expert authority well make decision addition confusion one common reason people end make bad decision lot people might know go make good decision probably first time go situation never position make decision young confuse go well until ask someone help start make desitionsions also lead well communicate others give well understand make decision others help begin speak decision closest friend help make feel go alone always make sure decision understand decision take even make next move honestly help lot furthermore make life less stressful talk stress instead bottle like stress relive example big heavy backpack full book inside get way take book backpack feel less heavy would feel relive talk stress make decision hard conclusion make decision understand communicate feel relive decision end day would feel good decision sure people around three reason talk expert authority life important
740,students enjoy summer vacation educators feels students retain information easily return fall summer break long also think students point view see students point view feel educators make students break yearround school week break summer week spring fall much time spending family students also responsibility first reason spend time family school taking time back home students comes everyday day home homework make sure everything ready next day school life even projects work student worry forget parent really effect give enough truth parent whats going life anything happening educator feels student breaks short first reason make students break yearround school week break summer week spring fall finally second reason students also responsibility student also life things summer spring winter break students try help parent around use free ever want stress much homework projects student also believe everything learn school worthy information come back forget thing learned school last reason although people feel summer break long students retain information easily return fall people believe wise fact remain educators make students break yearround school week break summer week spring fall much time spending family students also responsibility remove summer spring winter break,student enjoy summer vacation educator feel student retain information easily return fall summer break long also think student point view see student point view feel educator make student break yearround school week break summer week spring fall much time spending family student also responsibility first reason spend time family school take time back home student come everyday day home homework make sure everything ready next day school life even project work student worry forget parent really effect give enough truth parent whats go life anything happen educator feel student break short first reason make student break yearround school week break summer week spring fall finally second reason student also responsibility student also life thing summer spring winter break student try help parent around use free ever want stress much homework project student also believe everything learn school worthy information come back forget thing learn school last reason although people feel summer break long student retain information easily return fall people believe wise fact remain educator make student break yearround school week break summer week spring fall much time spending family student also responsibility remove summer spring winter break


### Setting variables and helper functions

In [94]:
## Setting text and target variables

textVar=df_proc['lemmatized_text']
targetVar=df_proc[["cohesion", "syntax", "vocabulary", "phraseology", "grammar", "conventions"]]

In [115]:
X = textVar.values
Y = targetVar.values

train_samples, test_samples, train_targets, test_targets = train_test_split(X,Y, test_size = 0.20, random_state = 1010)

### Cost function and hyperparameters

In [98]:
BATCH_SIZE = 30

MAX_LEN = max(len(x.split()) for x in df_proc['lemmatized_text'])
print(MAX_LEN)

551


### Importing transformers

In [99]:
# import transfomers
import torch
from transformers import BertTokenizer , TFBertModel 

AUTO = tf.data.experimental.AUTOTUNE

In [100]:
bert_path = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [101]:
def encode(input_text):
    inputs = tokenizer.batch_encode_plus(input_text,padding='max_length',max_length=MAX_LEN, truncation=True)
    return inputs

In [181]:
# creating the tokenized training dataset
train_input = encode(train_samples)['input_ids']
train_data_ds = (
    tf.data.Dataset
    .from_tensor_slices((train_input,train_targets))
    .repeat()
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

In [182]:
train_data_ds

<PrefetchDataset element_spec=(TensorSpec(shape=(None, 551), dtype=tf.int32, name=None), TensorSpec(shape=(None, 6), dtype=tf.float64, name=None))>

In [153]:
train_targets

array([[2.5, 2.5, 2.5, 2. , 2.5, 2. ],
       [2.5, 2. , 2. , 2.5, 2.5, 2.5],
       [3.5, 2.5, 3. , 3.5, 2.5, 2.5],
       ...,
       [3. , 3.5, 3.5, 3. , 3. , 3.5],
       [2.5, 2.5, 2.5, 2.5, 2.5, 3. ],
       [3. , 2.5, 3. , 2.5, 2.5, 2.5]])

In [119]:
# creating the tokenized testing dataset
testing_input = encode(test_samples)['input_ids']

test_data_ds = (
    tf.data.Dataset
    .from_tensor_slices((testing_input,test_targets))
    .batch(BATCH_SIZE)
    .prefetch(AUTO)
)

In [183]:
test_data_ds

<PrefetchDataset element_spec=(TensorSpec(shape=(None, 551), dtype=tf.int32, name=None), TensorSpec(shape=(None, 6), dtype=tf.float64, name=None))>

---
### Creating the Baseline BERT Model

In [184]:
# Custom error function MCRMSE : column wise root mean squared eoor

def MCRMSE(y_true, y_pred):
    colwise_mse = tf.reduce_mean(tf.square(y_true - y_pred), axis=1)
    return tf.reduce_mean(tf.sqrt(colwise_mse), axis=-1, keepdims=True)


In [185]:
def create_model():
    bert_encoder = TFBertModel.from_pretrained(bert_path )
    input_word_ids = tf.keras.Input(shape=(MAX_LEN,), dtype=tf.int32, name="input_word_ids")

    embedding = bert_encoder(input_word_ids)[0]
    x = tf.keras.layers.GlobalAveragePooling1D()(embedding)
    x = tf.keras.layers.LayerNormalization()(x)
    #Output layer without activation function because regression task
    output = tf.keras.layers.Dense(6,)(x)

    model = tf.keras.models.Model(inputs=input_word_ids, outputs=output)
    model.compile(optimizer=tf.keras.optimizers.Adam(1e-5), loss=MCRMSE
                  , metrics=MCRMSE)

    return model

In [186]:
model= create_model()
model.summary()

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.


Model: "model_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_word_ids (InputLayer)  [(None, 551)]            0         
                                                                 
 tf_bert_model_11 (TFBertMod  TFBaseModelOutputWithPoo  109482240
 el)                         lingAndCrossAttentions(l            
                             ast_hidden_state=(None,             
                             551, 768),                          
                              pooler_output=(None, 76            
                             8),                                 
                              past_key_values=None, h            
                             idden_states=None, atten            
                             tions=None, cross_attent            
                             ions=None)                          
                                                          

In [134]:
(train_samples.shape[0]//BATCH_SIZE)

104

In [169]:
X_train = np.array([s for s in train_input])
X_train

array([[  101, 25732,  3198, ...,     0,     0,     0],
       [  101,  2228, 12731, ...,     0,     0,     0],
       [  101,  2092,  2449, ...,     0,     0,     0],
       ...,
       [  101,  2034,  8605, ...,     0,     0,     0],
       [  101,  5646, 18373, ...,     0,     0,     0],
       [  101,  3166,  6798, ...,     0,     0,     0]])

In [187]:
from tensorflow.keras.callbacks import EarlyStopping

callback = tf.keras.callbacks.EarlyStopping(monitor='MCRMSE', patience = 2 ,restore_best_weights=True)

history = model.fit(train_data_ds,
                    steps_per_epoch= train_samples.shape[0]//BATCH_SIZE,
                    batch_size = BATCH_SIZE,
                    epochs= 3,
                    verbose = 1,
                    shuffle= True,
                    callbacks=[callback])

Epoch 1/3


InvalidArgumentError: Graph execution error:

Detected at node 'model_11/tf_bert_model_11/bert/embeddings/Gather_1' defined at (most recent call last):
    File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/opt/conda/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in <module>
      app.launch_new_instance()
    File "/opt/conda/lib/python3.10/site-packages/traitlets/config/application.py", line 1043, in launch_instance
      app.start()
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 725, in start
      self.io_loop.start()
    File "/opt/conda/lib/python3.10/site-packages/tornado/platform/asyncio.py", line 195, in start
      self.asyncio_loop.run_forever()
    File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
      self._run_once()
    File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
      handle._run()
    File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
      self._context.run(self._callback, *self._args)
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 513, in dispatch_queue
      await self.process_one()
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 502, in process_one
      await dispatch(*args)
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 409, in dispatch_shell
      await result
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/kernelbase.py", line 729, in execute_request
      reply_content = await reply_content
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/ipkernel.py", line 422, in do_execute
      res = shell.run_cell(
    File "/opt/conda/lib/python3.10/site-packages/ipykernel/zmqshell.py", line 540, in run_cell
      return super().run_cell(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3009, in run_cell
      result = self._run_cell(
    File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3064, in _run_cell
      result = runner(coro)
    File "/opt/conda/lib/python3.10/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
      coro.send(None)
    File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3269, in run_cell_async
      has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
    File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3448, in run_ast_nodes
      if await self.run_code(code, result, async_=asy):
    File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
      exec(code_obj, self.user_global_ns, self.user_ns)
    File "/var/tmp/ipykernel_9191/1784808068.py", line 5, in <module>
      history = model.fit(train_data_ds,
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 1023, in train_step
      y_pred = self(x, training=True)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 561, in __call__
      return super().__call__(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/functional.py", line 511, in call
      return self._run_internal_graph(inputs, training=training, mask=mask)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/functional.py", line 668, in _run_internal_graph
      outputs = node.layer(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/training.py", line 561, in __call__
      return super().__call__(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1087, in run_call_with_unpacked_inputs
    File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 1114, in call
      outputs = self.bert(
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_tf_utils.py", line 1087, in run_call_with_unpacked_inputs
    File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 788, in call
      embedding_output = self.embeddings(
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1132, in __call__
      outputs = call_fn(inputs, *args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 96, in error_handler
      return fn(*args, **kwargs)
    File "/opt/conda/lib/python3.10/site-packages/transformers/models/bert/modeling_tf_bert.py", line 223, in call
      position_embeds = tf.gather(params=self.position_embeddings, indices=position_ids)
Node: 'model_11/tf_bert_model_11/bert/embeddings/Gather_1'
indices[0,512] = 512 is not in [0, 512)
	 [[{{node model_11/tf_bert_model_11/bert/embeddings/Gather_1}}]] [Op:__inference_train_function_437792]