## Aim:
To implement a model which takes a sentence as an input and finds the most appropriate emoji to be used along with it.

_Example:_

In [None]:
#This package will help us convert text to emoji
import emoji

sentence = "I'm tired of using tensorflow!"
print(f"{sentence} {emoji.emojize(':disappointed:',use_aliases=True)}")

In many emoji interfaces, you need to remember that ❤️ is the "heart" symbol rather than the "love" symbol. But using word vectors, you'll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate words in the test set to the same emoji even if those words don't even appear in the training set. This allows you to build an accurate classifier mapping from sentences to emojis, even using a small training set.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

In [None]:
#Loading in the data
train = pd.read_csv('data/train_data.csv', names = ["sentence","emoji","unk1","unk2"])
test = pd.read_csv('data/tesss.csv', names = ["sentence","emoji","unk1","unk2"])

X_train, Y_train = train['sentence'].values, np.asarray(train['emoji'].values,dtype=int)
X_test, Y_test = test['sentence'].values, np.asarray(test['emoji'].values, dtype=int)

We have a very tiny dataset. `X_train` contains 132 sentences and `Y_train` corresponding integers for suitable emojis. Let's build the emoji dictionary mapping, matching integer to appropriate emoji. 

In [None]:
emoji_dictionary = {"0": "\u2764\uFE0F",    # :heart: prints a black instead of red heart depending on the font
                    "1": ":baseball:",
                    "2": ":smile:",
                    "3": ":disappointed:",
                    "4": ":fork_and_knife:"}

Let's create a function to convert the integer to an emoji. 

In [None]:
def label_to_emoji(label):
    """
    Converts a label (int or string) into the corresponding emoji code (string) ready to be printed
    """
    return emoji.emojize(emoji_dictionary[str(label)], use_aliases=True)

In [None]:
for i in range(5):
    print(X_train[i], label_to_emoji(Y_train[i]))

## Baseline Model
The input of the model is a string corresponding to a sentence (e.g. "I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.

To get our labels into a format suitable for training a softmax classifier, lets convert $Y$ from its current shape current shape $(m, 1)$ into a "one-hot representation" $(m, 5)$, where each row is a one-hot vector giving the label of one example

In [None]:
#Function to convert to one-hot-encoding
def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)]
    return Y

In [None]:
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)

In [None]:
#How the one hot encoded vector looks like
index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])

### GloVe

The first step is to convert an input sentence into the word vector representation, which then get averaged together. We will use pretrained 50-dimensional GloVe embeddings on the Wikipedia2014-Gigabit dataset. Let's start by writing a function on how to parse the word embeddings.

In [None]:
def read_glove_vecs(glove_file):
    #opening the glove file
    with open(glove_file, "r") as f:
        #creating an empty set for unique words
        words = set()
        #creating a dictionary to map words to their vectors
        word_to_vec_map = {}
        #going through the text file line by line (each line is a word and its vectors)
        for line in f:
            #removing any white spaces and splitting the line into individual components
            line = line.strip().split()
            #extracting the word
            current_word = line[0]
            #adding to the set
            words.add(current_word)
            #adding to the dictionary along with its vector representation
            word_to_vec_map[current_word] = np.array(line[1:], dtype=np.float64)
            
        #starting an index counter
        i = 1
        #creating a dictionary to map words to an index
        words_to_index = {}
        #creating a dictionary to map index to the words
        index_to_words = {}
        #iterating through the list of unique words
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i+=1
    
    #returning word-index maps and the word2vec map
    return words_to_index, index_to_words, word_to_vec_map

In [None]:
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

Based on the above, we've ended up loading:
- word_to_index - map of 400,001 words to their valid integer indexes
- index_to_words - dictionary mapping of indexes to their valid words in the vocabulary
- word_to_vec_map - dictionary mapping of words to their GloVe vector representation

In [None]:
#Taking a look
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])

### Average Word2Vec Sentence Representation 
Let's write a function to extract the average of the GloVe vector map for a sentence. The function should do the following:
1. Convert every sentene to lower case and split it to individual words.
2. For every word, the function extracts the corresponding GloVe representation and then averages all these values.

In [None]:
def sentence_to_avg(sentence, word_to_vec_map):
    """
    Aim:
        Function to extract the average of the GloVe vector map for a sentence.
    Arguments:
        sentence - [str] one example of a sentence
        word_to_vec_map - dictionary of words mapped to their GloVe vectors
    Returns:
        average - average vector encoding information about the sentence, 
                  numpy array of shape (GloVe vector size,)
    """
    
    #splitting and standardizing the sentence words
    words = [i.lower() for i in sentence.split()]
    
    #size of the GloVe vector features for each word
    m = word_to_vec_map['the'].shape[0]
    
    average = np.zeros((m,))
    for w in words:
        #we check if the word is part of the vocabulary
        try:
            average += word_to_vec_map[w]
        #else we initialize a random weight vector from a normal distribution
        except KeyError:
            average += np.random.normal(scale=0.6,size=(m,))
    
    average /= len(words)
    
    return average

In [None]:
sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)

Let's create the actual model now.

In [None]:
#defining the softmax function
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

### Predict

In [None]:
#defining the function to predict the emoji from a sentence
def predict(X, Y, W, b, word_to_vec_map):
    '''
    Aim: 
        Given X (sentences) and Y (emoji indices), predict emojis and compute
        the accuracy of the model over the given set.
        
    Arguments:
        X - input data containing sentences, numpy array of shape (m, None)
        Y - labels, containing index of the label emoji, numpy array of shape (m,1)
        
    Returns:
        pred - numpy array of shape (m,1) with your predictions
    
    '''
    
    m = X.shape[0]
    pred = np.zeros((m,1))
    
    for j in range(m): #loop over the training examples
        #split the jth test example (sentence) into list of lower case words
        words = X[j].lower()
        
        #average words' vectors
        avg = sentence_to_avg(words, word_to_vec_map)
        
        #forward propogation
        z = np.dot(W, avg) + b
        a = softmax(z)
        
        pred[j] = np.argmax(a)
        
    print(f"Accuracy: {np.mean((pred[:] == Y.reshape(Y.shape[0],1)[:]))}")
    return pred

def print_predictions(X, pred):
    print()
    for i in range(X.shape[0]):
        print(X[i], label_to_emoji(int(pred[i])))

### Model

In [None]:
def model(X, Y, word_to_vec_map, learning_rate, num_iterations, print_every=100):
    
    #random seed
    np.random.seed(42)
    
    #number of training examples
    m = Y.shape[0]
    #number of classes
    n_y = len(emoji_dictionary)
    #dimensions of the glove vector
    n_h = word_to_vec_map['the'].shape[0]
    
    #Initialize the parameters using Xavier initialization
    W = np.random.randn(n_y, n_h)/np.sqrt(n_h)
    b = np.zeros((n_y,))
    
    #Convert Y to one hot encoded vector
    Y_oh = convert_to_one_hot(Y, n_y)
    
    costs = []
    #Optimization
    for t in range(num_iterations): #loop over the number of iterations
        for i in range(m): #loop over the training examples
            
            #get the sentence word2vec representation
            avg = sentence_to_avg(X[i], word_to_vec_map)
            
            #forward propogate the average through the softmax layer
            #linear step
            z = np.dot(W, avg) + b
            #non-linear step
            a = softmax(z)
            
            #compute the cross entropy loss 
            cost = -np.sum(np.multiply(Y_oh[i], np.log(a)))
            #compute the gradients
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1,n_h))
            db = dz
            
            #update parameters with stochastic gradient descent
            W = W - learning_rate * dW
            b = b - learning_rate * db
        
        costs.append(cost)
        if t % print_every == 0:
            print(f"Epoch {t} Cost: {cost}")
            pred = predict(X,Y,W,b,word_to_vec_map)
            
    plt.plot(range(num_iterations), costs)
    plt.xlabel("Iterations"); plt.ylabel("Cost")
    plt.show()
    return pred, W, b

In [None]:
pred, W, b = model(X_train, Y_train, word_to_vec_map, learning_rate=0.01, num_iterations=500, print_every=100)

Let's examine model performance on the test set.

In [None]:
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)

Random guessing would have had 20% accuracy given that there are 5 classes. This is pretty good performance after training on only 127 examples.

In the training set, the algorithm saw the sentence "I love you" with the label ❤️. You can check however that the word "adore" does not appear in the training set. Nonetheless, lets see what happens if you write "I adore you."

In [None]:
X_my_sentences = np.array(["i adore you", "i love you", "funny lol", "lets play with a ball", "food is ready", "not feeling happy",
                          "i miss elvis", "help save me!"])
Y_my_labels = np.array([[0], [0], [2], [1], [4],[3]])

pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)

Amazing! Because adore has a similar embedding as love, the algorithm has generalized correctly even to a word it has never seen before. Words such as heart, dear, beloved or adore have embedding vectors similar to love, and so might work too.

Note though that it doesn't get "not feeling happy" correct. This algorithm ignores word ordering, so is not good at understanding phrases like "not happy."

### Confusion Matrix
Printing the confusion matrix can also help understand which classes are more difficult for the model. A confusion matrix shows how often an example whose label is one class ("actual" class) is mislabeled by the algorithm with a different class ("predicted" class).

In [None]:
#function for the confusion matrix
import pandas as pd
def print_predictions(X, pred):
    print()
    for i in range(X.shape[0]):
        print(X[i], label_to_emoji(int(pred[i])))
        
        
def plot_confusion_matrix(y_actu, y_pred, title='Confusion matrix', cmap=plt.cm.gray_r):
    
    df_confusion = pd.crosstab(y_actu, y_pred.reshape(y_pred.shape[0],), rownames=['Actual'], colnames=['Predicted'], margins=True)
    
    df_conf_norm = df_confusion / df_confusion.sum(axis=1)
    
    plt.matshow(df_confusion, cmap=cmap) # imshow
    #plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(df_confusion.columns))
    plt.xticks(tick_marks, df_confusion.columns, rotation=45)
    plt.yticks(tick_marks, df_confusion.index)
    #plt.tight_layout()
    plt.ylabel(df_confusion.index.name)
    plt.xlabel(df_confusion.columns.name)


In [None]:
print(Y_test.shape)
print('           '+ label_to_emoji(0)+ '    ' + label_to_emoji(1) + '    ' +  label_to_emoji(2)+ '    ' + label_to_emoji(3)+'   ' + label_to_emoji(4))
print(pd.crosstab(Y_test, pred_test.reshape(56,), rownames=['Actual'], colnames=['Predicted'], margins=True))
plot_confusion_matrix(Y_test, pred_test)

- Even with a 127 training examples, you can get a reasonably good model for Emojifying. This is due to the generalization power word vectors gives you.

- This algorithm will perform poorly on sentences such as "This movie is not good and not enjoyable" because it doesn't understand combinations of words--it just averages all the words' embedding vectors together, without paying attention to the ordering of words.


## LSTM Model
Let's build an LSTM model that takes as input word sequences. This model will be able to take word ordering into account.

### Padding

Most deep learning frameworks require that all sequences in the same mini-batch have the same length. This is what allows vectorization to work: If you had a 3-word sentence and a 4-word sentence, then the computations needed for them are different (one takes 3 steps of an LSTM, one takes 4 steps) so it's just not possible to do them both at the same time.

The common solution to this is to use padding. Specifically, set a maximum sequence length, and pad all sequences to the same length. For example, of the maximum sequence length is 20, we could pad every sentence with "0"s so that each input sentence is of length 20. Thus, a sentence "i love you" would be represented as $(e_i, e_{love}, e_{you}, \vec{0}, \vec{0}, \vec{0})$. In this example, any sentences longer than 20 words would have to be truncated. One simple way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set.

### Embedding Layer and Building the Vocabulary
The embedding layer maps positive integers (indices corresponding to words) into dense vectors of fixed size (the embedding vectors). It can be trained or initialized with a pretrained embedding. Here we will create an embedding layer and initialize it with the GloVe 50-dimensional vectors loaded earlier in the notebook. Because our training set is quite small, we will not update the word embeddings but will instead leave their values fixed.

The `Embedding()` layer takes an integer matrix of size `(batch size, max input length)` as input. This corresponds to sentences converted into lists of indices (integers), as shown in the figure below.

The largest integer (i.e. word index) in the input should be no larger than the vocabulary size. The layer outputs an array of shape (batch size, max input length, dimension of word vectors).

The first step is to convert all the training sentences into lists of indices, and then zero-pad all these lists so that their length is the length of the longest sentence.

In [None]:
#Getting the length of the longest sentence
maxLen = len(max(X_train, key=len).split())
print(f"Length of the longest sentence: {maxLen} words")

In [None]:
def sentences_to_indices(X, word_to_index, max_len):
    '''
    Aim:
        Convert an array of sentences into an array of indices corresponding to words in the sentences.
        The output shape should be such that it can be fed to the embedding layer.
    '''
    
    m = X.shape[0] #number of training examples
    X_indices = np.zeros((m,max_len), dtype=int) #create an array of shape (batch_size, max_len)
    for i in range(m):
        sentence = [j.lower() for j in X[i].split()]
        
        k = 0
        
        for word in sentence:
            X_indices[i,k] = word_to_index[word]
            k+=1
            
    return X_indices

In [None]:
#Test
X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1,word_to_index, max_len = 5)
print("X1 =", X1)
print("X1_indices =", X1_indices)

Now we want to create the word2vec weight matrix for the embedding layer. We're going to use the GloVe pretrained word embeddings for our vocabulary.

In [None]:
#creating a zero matrix with rows = length of the vocabulary
#and columns = word embedding dimension
embedding_dimension = word_to_vec_map['the'].shape[0]
weights_matrix = np.zeros((len(word_to_index)+1, embedding_dimension))

words_not_found = {}

for word,index in word_to_index.items():
    try: 
        weights_matrix[index,:] = word_to_vec_map[word]
    except KeyError:
        weights_matrix[index,:] = np.random.normal(scale=0.6, size=(embedding_dimension, ))
        words_not_found[word] = 1
        
print(f"Number of words not in GloVe word embeddings map: {len(words_not_found)}")
print(f"Shape of Weights Matrix: {weights_matrix.shape}")

Now, we create the embedding layer. Since our vocabulary size is quite small, **and** and we dont have any words in the data which the embedding layer does not overlap with, we're going to make our word embeddings non trainable. 

In [None]:
def create_emb_layer(weights_matrix, non_trainable=False):
    weights_matrix = torch.from_numpy(weights_matrix)
    num_embeddings, embedding_dim = weights_matrix.size()
    emb_layer = nn.Embedding(num_embeddings, embedding_dim)
    emb_layer.load_state_dict({'weight': weights_matrix})
    if non_trainable:
        emb_layer.weight.requires_grad = False

    return emb_layer, num_embeddings, embedding_dim

Let's go over all the different mappings we have so far:

1. word_to_vec_map - map of all words in the GloVe word embeddings file with their corresponding vectors
2. word_to_index - GloVe vocabulary words (str) mapped to a unique index (int)
3. index_to_word - GloVe vocabulary word indices (int) mapped to the word (str)
4. vocab - vocabulary of unique words in GloVe data
5. weights_matrix - matrix of words in our vocabulary data mapped to their corresponding GloVe word embedding vectors

### Creating the PyTorch DataLoaders

In [None]:
#converting words in each sentence to indices from word_to_index
train_X = sentences_to_indices(X_train, word_to_index, maxLen)
test_X = sentences_to_indices(X_test, word_to_index, maxLen)
# train_y = convert_to_one_hot(Y=Y_train,C=5)
# test_y = convert_to_one_hot(Y=Y_test,C=5)
print(f"Shape of training set: {train_X.shape}")
print(f"Shape of test set: {test_X.shape}")

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader

# create Tensor datasets
train_data = TensorDataset(torch.from_numpy(train_X), torch.from_numpy(Y_train))
test_data = TensorDataset(torch.from_numpy(test_X), torch.from_numpy(Y_test))

batch_size = 33
train_loader = DataLoader(train_data, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)

print(f"Train tensor size: {len(train_loader.dataset)}")
print(f"Test tensor size: {len(test_loader.dataset)}")

In [None]:
# obtain one batch of training data
dataiter = iter(train_loader)
sample_x, sample_y = dataiter.next()

print('Sample input size: ', sample_x.size()) # batch_size, seq_length
print('Sample input: \n', sample_x)
print()
print('Sample label size: ', sample_y.size()) # batch_size
print('Sample label: \n', sample_y)

In [None]:
# First checking if GPU is available
train_on_gpu=torch.cuda.is_available()

if(train_on_gpu):
    print('Training on GPU.')
else:
    print('No GPU available, training on CPU.')

In [None]:
import torch.nn as nn
import torch.nn.functional as F
import torch

class EmojiRNN(nn.Module):
    """
    The RNN model that will be used to perform Sentiment analysis.
    """

    def __init__(self, weights_matrix, output_size, hidden_dim, n_layers, drop_prob=0.5):
        """
        Initialize the model by setting up the layers.
        """
        super(EmojiRNN, self).__init__()
        
        self.embedding, self.num_embeddings, self.embedding_dim = create_emb_layer(weights_matrix, True)
        
        self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        self.dropout = drop_prob
        #LSTM layers
        self.lstm1 = nn.LSTM(self.embedding_dim, self.hidden_dim, n_layers, 
                             dropout=self.dropout, batch_first=True)
        self.lstm2 = nn.LSTM(self.hidden_dim, self.hidden_dim, n_layers,
                            dropout=self.dropout, batch_first=True)
        
        #dropout layer
        self.dropout = nn.Dropout(0.5)
        
        # linear and softmax layers
        self.fc = nn.Linear(self.hidden_dim, self.output_size)        
        self.softmax = nn.Softmax(dim=1)
        
    def forward(self, x, hidden):
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        print(x.size())
        batch_size = x.size(0)
        # embeddings and lstm_out
        embeds = self.embedding(x)
        lstm1_out, hidden1 = self.lstm1(embeds, hidden)
        lstm2_out, hidden2 = self.lstm2(lstm1_out, hidden)
        lstm2_out = lstm2_out.contiguous().view(-1, self.hidden_dim)
        # fully-connected layer
        out = self.fc(lstm2_out)
        out = out.view(batch_size,-1,self.output_size)
        # sigmoid function
        softmax_out = F.softmax(out)
#         print(softmax_out.size())
        # reshape to (hidden_dim, batch_size, n_classes)
        softmax_out = softmax_out.view(-1, batch_size, self.output_size)
        softmax_out = softmax_out[0] # get last batch of labels
        print(softmax_out.size())
        # return last sigmoid output and hidden state
        return softmax_out, hidden2
    
    
    def init_hidden(self, batch_size):
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden



In [None]:
# Instantiate the model w/ hyperparams
vocab_size = len(word_to_index)+1 # +1 for the 0 padding + our word tokens
output_size=5
hidden_dim = 128
n_layers = 32

net = EmojiRNN(weights_matrix=weights_matrix, output_size=output_size, hidden_dim=hidden_dim, n_layers=n_layers)

print(net)

In [None]:
# loss and optimization functions
lr=0.001

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

In [None]:
# training params

epochs = 50 # 2 should be enough for this task

print_every = 3
clip=5 # gradient clipping

# move model to GPU, if available
if(train_on_gpu):
    net.cuda()

# train for some number of epochs
for e in range(epochs):
    counter = 0
    # initialize hidden state
    h = net.init_hidden(batch_size)

    # batch loop
    for inputs, labels in train_loader:
        counter += 1
        if(train_on_gpu):
            inputs, labels = inputs.cuda(), labels.cuda()

        # Creating new variables for the hidden state, otherwise
        # we'd backprop through the entire training history
        h = tuple([each.data for each in h])

        # zero accumulated gradients
        net.zero_grad()

        # get the output from the model
        output, h = net(inputs, h)

        # calculate the loss and perform backprop
        loss = criterion(output, labels)
        loss.backward()

        
        # `clip_grad_norm` helps prevent the exploding gradient problem in RNNs / LSTMs.
        nn.utils.clip_grad_norm_(net.parameters(), clip)
        optimizer.step()

        # loss stats
        if counter % print_every == 0:

            print("Epoch: {}/{}...".format(e+1, epochs),
                  "Step: {}...".format(counter),
                  "Loss: {:.4f}...".format(loss.item()))

## Keras Model

In [None]:
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)


In [None]:
def sentences_to_indices(X, word_to_index, max_len):
    """
    Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
    The output shape should be such that it can be given to `Embedding()` (described in Figure 4). 
    
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """
    
    m = X.shape[0]                                   # number of training examples
    
    ### START CODE HERE ###
    # Initialize X_indices as a numpy matrix of zeros and the correct shape (≈ 1 line)
    X_indices = np.zeros((m, max_len))
    
    for i in range(m):                               # loop over training examples
        
        # Convert the ith training sentence in lower case and split is into words. You should get a list of words.
        sentence_words = [w.lower() for w in X[i].split()]
        
        # Initialize j to 0
        j = 0
        
        # Loop over the words of sentence_words
        for w in sentence_words:
            # Set the (i,j)th entry of X_indices to the index of the correct word.
            X_indices[i, j] = word_to_index[w]
            # Increment j to j + 1
            j += 1
            
    ### END CODE HERE ###
    
    return X_indices


In [None]:
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 50)
    
    ### START CODE HERE ###
    # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
    emb_matrix = np.zeros((vocab_len, emb_dim))
    
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map[word]

    # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
    embedding_layer = Embedding(vocab_len, emb_dim, trainable=False)
    ### END CODE HERE ###

    # Build the embedding layer, it is required before setting the weights of the embedding layer. Do not modify the "None".
    embedding_layer.build((None,))
    
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])
    
    return embedding_layer


In [None]:
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])


In [None]:
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    
    ### START CODE HERE ###
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(input_shape, dtype='int32')
    
    # Create the embedding layer pretrained with GloVe Vectors (≈1 line)
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)   
    
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a batch of sequences.
    X = LSTM(128, return_sequences=True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    # Be careful, the returned output should be a single hidden state, not a batch of sequences.
    X = LSTM(128, return_sequences=False)(X)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation('softmax')(X)
    
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs=sentence_indices, outputs=X)
    
    ### END CODE HERE ###
    
    return model


In [None]:
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()


In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


In [None]:
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)


In [None]:
model.fit(X_train_indices, Y_train_oh, epochs = 50, batch_size = 32, shuffle=True)
