**Lab 12 - Attention-based BiLSTM Classifier for Sentiment Analysis**

Last week we applied different embedding techniques to the building of BiLSTM classifiers for sentiment analysis. In today's lab we add an Attention layer to the models.

In [2]:
import pandas as pd # To read data from CSVs directly into dataframes
import numpy as np # For doing a data type cast
from tensorflow import keras # As usual, we will use keras as a front end to TF
from tqdm import tqdm # This is an iterator that makes nice prorgess bars
import nltk # For doing some preprocessing on our string data
nltk.download('stopwords')
from nltk.corpus import stopwords # We will do some cleanup of the text
from nltk.tokenize import sent_tokenize, word_tokenize # Cleanup will start with tokenization
import re # We may use some regex in the cleanup


nltk.download('punkt') # We need this parser - Colab does not have it by default

# Utility functions from TF
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import losses
from tensorflow.keras import preprocessing
from tensorflow.keras import utils
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

# Some Keras utility functions and sequential layers
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Embedding, LSTM, SpatialDropout1D, Bidirectional, Dropout

# For building our own Word2vec model
from gensim.models import Word2Vec

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


We will use some Yelp reviews that have been widely distributed and used as the basis of competitions at Kaggle. The data files are hosted on a dropbox account for convenience of downloading. If you would like to review a copy of this dataset, try: https://www.kaggle.com/ilhamfp31/yelp-review-dataset. If the dropbox download does not work, try getting the dataset from Kaggle and uploading them to the VM that is running your notebook. Note that a couple lines of code for loading the test data from the Kaggle contest have been commented out in the next (hidden) code block. There's so much data in the training set that we can just grab a radom sample of instances of that for validation.

In [3]:
#@title
# Retrieving the test data from drop box is commented out for now.
# How might this test data be useful?

#!wget https://www.dropbox.com/s/nnuxlff1dlgtjf0/YelpTest.csv?dl=1
#test_data = 'YelpTest.csv?dl=1'

In [4]:
# We are using the ! to access a terminal command. The colab image
# comes with the wget utility which can download files from URLs.

# This one is 400 MB, so it takes a moment
!wget https://www.dropbox.com/s/70skwugmktk0idf/YelpTrain.csv?dl=1
train_data = 'YelpTrain.csv?dl=1'


--2023-11-21 18:39:56--  https://www.dropbox.com/s/70skwugmktk0idf/YelpTrain.csv?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/70skwugmktk0idf/YelpTrain.csv [following]
--2023-11-21 18:39:57--  https://www.dropbox.com/s/dl/70skwugmktk0idf/YelpTrain.csv
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc1e30455df0bcd74447b34ccaf6.dl.dropboxusercontent.com/cd/0/get/CH8rqu6G9YAAk_LoW4jJ3MtKg3YfUVeOdEqJnnBuosb4yjjar_zd36nX-yLlTHCSZABJvSmctbzEXszFuvhiCNNl-fOIeBcoMYLcsRcuCf3udRMXOu0GquM3UDCxzxxbc14SPW1HWhWCRG2bnqSr45PY/file?dl=1# [following]
--2023-11-21 18:39:57--  https://uc1e30455df0bcd74447b34ccaf6.dl.dropboxusercontent.com/cd/0/get/CH8rqu6G9YAAk_LoW4jJ3MtKg3YfUVeOdEqJnnBuosb4yjjar_zd36nX-yLlTHCSZABJvSmctbzEXszFuvhiCNNl-fOIeBcoM

This dataset has already been divided into training and test sets probably because it was used in a Kaggle contest. In a content setting, analysts cannot generate their own train/test splits because different splits would not necessarily be comparable. For the sentiment variable, negative polarity is class 1, and positive class 2.

In [5]:
# Pandas gives a simple way of reading a CSV directly into a dataframe.
train = pd.read_csv(train_data, names = ['sentiment', 'text'] )

# We don't need this unless we plan to use the test data.
#test  = pd.read_csv(test_data,  names = ['sentiment', 'text'] )

In [6]:
training_size = 50000 # This cuts the data down to save some processing time
# Once you have a preferred model, it would be a good idea to raise this to a
# higher value such as 100,000 and retrain.

train = train[:training_size]


A cleaning function can easily be applied across all of the texts in a pandas dataframe text variable. Let's define a custom function to pass to apply(). Remember that a function like this has to be interpreted each time it is caleld, so it is important to make it as efficient as possible.

In [7]:
my_stops = stopwords.words('english')
stop_pat = r'\b(?:{})\b'.format('|'.join(my_stops))

def ReturnCleanText(text):
         text = text.lower()
         text = re.sub(r"\W+|_", ' ', text)

         return re.sub(stop_pat, '', text)

# Now use the apply() method to run the function on each text
train['clean_text'] = train['text'].apply(ReturnCleanText)


# Set up our first LSTM model:

We are now going to pre-process our text and configure a LSTM model with a trainable embedding layer. This first block is just a brief experiment to test out the Leras TextVectorization. We use the results just to understand what the vectorization process looks like. Take a peek at https://www.tensorflow.org/api_docs/python/tf/keras/layers/TextVectorization for more information on this approach.

In [8]:
# Let's do a test of the TextVectorization encoder
max_features = 2000 # This is quite a limited vocabulary size

# The vocabulary can have unlimited size or be capped, depending on the
# class instantiation options for this layer; if there are more unique values
# in the input than the maximum vocabulary size, the most frequent terms
# will be used to create the vocabulary. This suggests that if the vocab will
# be capped, we would need to normalize and do stop word removal.

Encoder = keras.layers.experimental.preprocessing.TextVectorization( max_tokens = max_features)
# By default this lowercases and strips punctuation

Encoder.adapt(train['clean_text'].values) #

vocab = np.array(Encoder.get_vocabulary())
print(vocab[:20])

example ="Always a great example for showing fun results!"
print(Encoder(example).numpy())
print(" ".join(vocab[Encoder(example).numpy()]))

['' '[UNK]' 'n' 'food' 'place' 'good' 'like' 'get' 'one' 'time' 'would'
 'service' 'back' 'great' 'go' 'really' 'even' 'ni' 'us' 'never']
[  28    1   13 1749    1 1839  319 1844]
always [UNK] great example [UNK] showing fun results


In [17]:
#
# Exercise 11.5: Find out the length of the vocab array. Add a comment about
# the size of the vocabulary. Is it large enough?
print(len(vocab))
#

2000


In [10]:
train['clean_text'].max() # Gives the max length string, 32 tokens
# Useful as a guide to setting max_message below.

'zumba tony brandon ryry   fantastic  recently moved back  phoenix  san diego  even though     zumba    years every class  different  guys  fun energetic motivating  great play lists  easy  follow even    new  zumba  love  classes '

In [11]:
# Here's the real text pre-processing approach we are taking for this notebook.
# We use the Keras Tokenizer to prepare our text.

max_features = 2000 # As used in the next line, this only keeps the 2000 most frequent words
max_message = 35 # A relatively short message

tokenizer = Tokenizer(num_words = max_features, ) #https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer
tokenizer.fit_on_texts(train['clean_text'].values)
X = tokenizer.texts_to_sequences(train['clean_text'].values)

# It is important to measure the longest message before setting this
X = pad_sequences(X, padding = 'post' ,maxlen=max_message)
Y = pd.get_dummies(train['sentiment']).values

vocab_size = X.max() + 1 # We need to have one extra slot for the OOV/UNK token
vocab_size # Show the vocab size: Should match max_features from above

2000

In [12]:
from sklearn.model_selection import train_test_split

# Not enturely clear why we are doing a train/test split on the training data
# when we already have a test data set.
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.25, random_state = 42)


In [19]:
#
# Exercise 11.6: Show the shape attributes for X_train, X_test, Y_train, Y_test
print("X_train shape: ",X_train.shape)
print("X_test shape: ",X_test.shape)
print("Y_train shape: ",Y_train.shape)
print("Y_test shape: ",Y_test.shape)
#

X_train shape:  (37500, 35)
X_test shape:  (12500, 35)
Y_train shape:  (37500, 2)
Y_test shape:  (12500, 2)


# Training with Keras default Embedding Layer

### Keras Embedding Layer:

Embedding layers in Keras are trained just like any other layer in a network architecture: Embedding weights will be trained using backprop to minimize the loss function by using the selected optimization method.

Think about how this trained Embedding layer in Keras would be similar or different to a pretrained vector model like word2vec. By training the embeddings yourself, you minimize the loss function in an effort to accurately predict the two sentiments in the Yelp data. Will the learned embeddings capture complete word semantics?

More Here:
1. https://stats.stackexchange.com/questions/324992/how-the-embedding-layer-is-trained-in-keras-embedding-layer
2. https://stats.stackexchange.com/questions/270546/how-does-keras-embedding-layer-work

In [14]:
embid_dim = 100
lstm_out = 64


model = keras.Sequential()
model.add(Embedding(max_features, embid_dim, input_length = X.shape[1]))
model.add(Bidirectional(LSTM(lstm_out, dropout=0.2)))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
# model.add(Dense(64, activation = 'relu'))
model.add(Dense(2, activation = 'softmax')) # Why do we need this dense layer?
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 35, 100)           200000    
                                                                 
 bidirectional (Bidirection  (None, 128)               84480     
 al)                                                             
                                                                 
 dense (Dense)               (None, 128)               16512     
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 2)                 258       
                                                                 
Total params: 301250 (1.15 MB)
Trainable params: 301250 (1.15 MB)
Non-trainable params: 0 (0.00 Byte)
____________________

In [15]:
batch_size = 128
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
history = model.fit(X_train, Y_train, epochs = 4, batch_size=batch_size, verbose = 1, validation_data =(X_test, Y_test))

#
# Take careful note of the training history. How is the validation accuracy
# changing across epochs?
#

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


# Training with GloVe 100D Embeddings

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. For more info: https://nlp.stanford.edu/projects/glove/

In [24]:
embedding_width = 100

# Now grab the GloVe embeddings we will need: Takes about a minute to download.
# Then it take about another minute to fill the data structure.
# Note that the zip file with the embeddings is hosted in Dropbox. If this does
# not work, it could be downloaded from the GloVe website and uploaded to the
# file store for the VM running this notebook.

#!wget https://www.dropbox.com/s/ewfdwppopt3pild/glove.twitter.27B.100d.txt.zip?dl=1
#!unzip glove.twitter.27B.100d.txt.zip?dl=1


!wget https://www.dropbox.com/s/ewfdwppopt3pild/glove.twitter.27B.100d.txt.zip?dl=1
!unzip glove.twitter.27B.100d.txt.zip?dl=1
from google.colab import drive
drive.mount('/drive')


print("Loading word embeddings...")
embedding_vector = dict() # Initialize an empty dictionary
embedding_dir = '/drive/My Drive/Colab Notebooks/IST 664/glove.twitter.27B.100d.txt'



f = open(embedding_dir,encoding="utf8")
for line in f:
    values = line.split() # Split the line on white space
    word = values[0] # This is the word, so use it as the key
    coefs = np.asarray(values[1:], dtype='float32') # Here are the values for each dimension of the vector for this word
    embedding_vector[word] = coefs # Add to the dictionary
f.close()

print('Loaded %s word vectors.' % len(embedding_vector))


--2023-11-21 19:08:41--  https://www.dropbox.com/s/ewfdwppopt3pild/glove.twitter.27B.100d.txt.zip?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.3.18, 2620:100:6018:18::a27d:312
Connecting to www.dropbox.com (www.dropbox.com)|162.125.3.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /s/dl/ewfdwppopt3pild/glove.twitter.27B.100d.txt.zip [following]
--2023-11-21 19:08:42--  https://www.dropbox.com/s/dl/ewfdwppopt3pild/glove.twitter.27B.100d.txt.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 404 Not Found
2023-11-21 19:08:42 ERROR 404: Not Found.

unzip:  cannot find or open glove.twitter.27B.100d.txt.zip?dl=1, glove.twitter.27B.100d.txt.zip?dl=1.zip or glove.twitter.27B.100d.txt.zip?dl=1.ZIP.

No zipfiles found.
Drive already mounted at /drive; to attempt to forcibly remount, call drive.mount("/drive", force_remount=True).
Loading word embeddings...
Loaded 1193514 word vectors.


At this point we have all of the gloVe vectors for more than a million vocabularly words. In the code above, however, we have constrained the vocabulary we are using to train the embedding layer of our LSTM model. So if we are going to substitute these gloVe embedding weights instead of training our own weights, we need to grab just the weights we need for our vocab. We will use a little for loop to fill a weight structure that will later be passed into our model.

In [25]:
# Fill our matrix value with zeroes. A vector of zeroes will be the default
# if we don't match a token with the gloVe vocabulary.
embedding_matrix = np.zeros((vocab_size,embedding_width))

for i in range(1, (tokenizer.num_words-1)):

  word = tokenizer.index_word[i] # This is the string value we want to use for the lookup
  embedding_value = embedding_vector.get(word) # This does the dictionary lookup
  if embedding_value is not None:
    embedding_matrix[i] = embedding_value

embedding_matrix.shape

(2000, 100)

In [27]:
#Now let's add an Attention layer, code adapted from https://tinyurl.com/attention664
from keras import backend as K
from keras import initializers

class Attention(tf.keras.layers.Layer):
    def __init__(self):
        # Nothing special to be done here
     #   self.init = initializers.get('glorot_uniform')
        super(Attention, self).__init__()

    def build(self, input_shape):
        # Define the shape of the weights and bias in this layer
        #

# 12.1 Please explain here the shape size for the weights and for the bias
# 128+35

        self.w=self.add_weight(shape=(128,1), initializer= "random_normal", trainable = True)
        self.b=self.add_weight(shape=(35,1), initializer="zero", trainable = True)

#12.2. Please modify the two lines of code above to obtain the shape size information from input_shape

        super(Attention, self).build(input_shape)

    def call(self, x):
        # x is the input tensor (128 dimensions in our case here)
        # Below is the main processing done during training
        # K is the Keras Backend import
         # Alignment scores. Pass them through tanh function
        e = K.tanh(K.dot(x,self.w)+self.b)
        # Remove dimension of size 1
        e = K.squeeze(e, axis=-1)
        # Compute the weights
        alpha = K.softmax(e)
        # Reshape to tensorFlow format
        alpha = K.expand_dims(alpha, axis=-1)
        # Compute the context vector
        context = x * alpha
        context = K.sum(context, axis=1)

        return context



Now let's train a LSTM model using basically the same architecture as above, but in this case we will substitute our gloVe vectors into the embedding structure AND set the embedding layer to not be trainable. So whatever the vectors from gloVe happen to "say" about the meaning of a particular word, that's what we are sticking with for our model.

In [28]:

# lstm_out = 64 # Note that this value has been set above. Uncomment if you want
# to use a different value.

embed_dim = embedding_width # This needs to match the width of our gloVe vectors

model = keras.Sequential()

# The big difference here is that the embedding values are set to not trainable
model.add(Embedding(vocab_size, embed_dim, input_length =X.shape[1], weights = [embedding_matrix] , trainable = False))
model.add(Bidirectional(LSTM(lstm_out, dropout=0.2, return_sequences=True)))
model.add(Attention())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
#model.add(Dense(64, activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
model.summary()


#12.3 Please leave comments here explaining the size of the parameters in the output table

"""
Embedding Layer (embedding_1):

Output Shape: (None, 35, 100)
Param  200,000

Bidirectional LSTM Layer (bidirectional_1):

Output Shape: (None, 35, 128)
Param : 84,480

Attention Layer (attention):

Output Shape: (None, 128)
Param : 163

Dense Layer (dense_2):

Output Shape: (None, 128)
Param : 16,512

Dropout Layer (dropout_1):

Output Shape: (None, 128)
Param : 0

Dense Layer (dense_3):

Output Shape: (None, 2)
Param : 258


"""

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 35, 100)           200000    
                                                                 
 bidirectional_1 (Bidirecti  (None, 35, 128)           84480     
 onal)                                                           
                                                                 
 attention (Attention)       (None, 128)               163       
                                                                 
 dense_2 (Dense)             (None, 128)               16512     
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense_3 (Dense)             (None, 2)                 258       
                                                      

In [29]:
batch_size = 64
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
history = model.fit(X_train, Y_train, epochs = 5, batch_size=batch_size, verbose = 1, validation_data =(X_test, Y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [30]:
# Now repeat that same model training, but allow the gloVe embedding weights
# to be updated this time. What do you think may happen inside the model?

model = keras.Sequential()

model.add(Embedding(vocab_size, embid_dim, input_length =X.shape[1], weights = [embedding_matrix] , trainable = True))
model.add(Bidirectional(LSTM(lstm_out, dropout=0.2)))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
#model.add(Dense(64, activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
model.summary()

model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
history = model.fit(X_train, Y_train, epochs = 5, batch_size=batch_size, verbose = 1, validation_data =(X_test, Y_test))

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 35, 100)           200000    
                                                                 
 bidirectional_2 (Bidirecti  (None, 128)               84480     
 onal)                                                           
                                                                 
 dense_4 (Dense)             (None, 128)               16512     
                                                                 
 dropout_2 (Dropout)         (None, 128)               0         
                                                                 
 dense_5 (Dense)             (None, 2)                 258       
                                                                 
Total params: 301250 (1.15 MB)
Trainable params: 301250 (1.15 MB)
Non-trainable params: 0 (0.00 Byte)
__________________

At this point you have examined three different models. Before moving on, review the training history for each model including the training time per epoch, the movement of the loss function in the training and validation samples, and the final validation accuracy. Given everything you have learned about deep learning for text processing, you should be able to describe in some detail what is happening inside these models and give a clear explanation of the trade-offs that are illustrated here.

# Training with Gensim/Word2Vec Pre-trained and Trained Embeddings
Reference: https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-review-sentiment/

As you know from previous classes and labs **Word2Vec** is not one thing but rather a family of model architectures and optimizations that can be used to learn word embeddings from large datasets. In particular, there are two training variations that are important:

1. Continuous Bag-of-Words Model which predicts the middle word based on surrounding context words. The context consists of a few words before and after the current (middle) word. This architecture is called a bag-of-words model as the order of words in the context is not important.

2. Continuous Skip-gram Model which predict words within a certain range before and after the current word in the same sentence.

As with other pre-trained embeddings, we have to choose the dimensionality of the vectors we will use. For consistency, we will use d=100 in the example below.

More here:
1. https://jalammar.github.io/illustrated-word2vec/
2. https://www.tensorflow.org/tutorials/text/word2vec

In [31]:
# First, we will construct a long list out of all of the messages
# in our dataset. This will serve as the basis for training a custom
# word2vec model with Gensim.

sentences =[]

# tqdm is an iterator that provides an animated progress bar
for t in  tqdm(range(len(train['clean_text']))):
    text = nltk.word_tokenize(train['clean_text'][t])
    sentences.append(text)

100%|██████████| 50000/50000 [00:25<00:00, 1941.98it/s]


In [32]:
# Takes about 1 minute for 50,000 sentences; For the sg= argument, use either 0 or 1. Default is 0 or CBOW. One must explicitly define Skip-gram by passing 1.

w2v_model = Word2Vec(sentences, vector_size = 100, min_count=2, sg = 0)

In [33]:
words = list(w2v_model.wv.index_to_key)
print('Vocabulary size: %d' % len(words))

# We can optionally save out model to a file, though this is not
# need for the code below.

#filename = 'embedding_word2vec.txt'
#w2v_model.wv.save_word2vec_format(filename, binary=False)

Vocabulary size: 35010


In [34]:
type(w2v_model)

gensim.models.word2vec.Word2Vec

In [35]:
# Just as we did for the gloVe model, we will now fill a custom embedding
# matrix with weights from our custom word2vec model.
embedding_matrix = np.zeros((vocab_size,embedding_width))
oov_errors = 0

for i in range(1, (tokenizer.num_words-1)):

  word = tokenizer.index_word[i] # This is the string value we want to use for the lookup

  # Here we need to catch KeyErrors so that we can proceed with the loop
  # OOV words are simply left with weights of 0
  try:
    embedding_matrix[i] = w2v_model.wv[word]
  except KeyError:
    oov_errors += 1


print("Out of vocab errors: ", oov_errors)

Out of vocab errors:  2


In [36]:
# Now, just as previously, create a basic LSTM model where the embedding
# layer is populated with the weights from our custom word2vec model.

model = keras.Sequential()
model.add(Embedding(vocab_size, embid_dim, input_length =X.shape[1], weights = [ embedding_matrix] , trainable = False))
model.add(Bidirectional(LSTM(lstm_out, dropout=0.2)))
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
#model.add(Dense(64, activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 35, 100)           200000    
                                                                 
 bidirectional_3 (Bidirecti  (None, 128)               84480     
 onal)                                                           
                                                                 
 dense_6 (Dense)             (None, 128)               16512     
                                                                 
 dropout_3 (Dropout)         (None, 128)               0         
                                                                 
 dense_7 (Dense)             (None, 2)                 258       
                                                                 
Total params: 301250 (1.15 MB)
Trainable params: 101250 (395.51 KB)
Non-trainable params: 200000 (781.25 KB)
___________

As before, it is important to look at the details of this model configuration and compare to the previous models you have created in this notebook. In particular, make sure to compare the overall model size and the number of trainable parameters.

In [37]:
batch_size = 128
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
history = model.fit(X_train, Y_train, epochs = 5, batch_size=batch_size, verbose = 1, validation_data =(X_test, Y_test))

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Now give some thought to these custom word2vec vectors. In particular, think about generalizability of this model. How do you think this model would generalize to messages with different vocabulary?

In [None]:
#Whether the model can understand and work well with different words in messages
#depends on how good the custom word representations are, how well they understand the meanings of words,
#and how flexible the model is. To make it even better at understanding various words,
# we can use techniques like fine-tuning, adjusting data, and applying special methods during training.