# Emojify! 

We are going to use word vector representations to build an Emojifier. 

Have you ever wanted to make your text messages more expressive? This emojifier app will help you do that. So rather than writing "Congratulations on the promotion! Lets go out for dinner. Play ball!" the emojifier can automatically turn this into "Congratulations on the promotion! 😄 Lets get dinner and talk. 🍴 Play ball! ⚾"

This model inputs a sentence (such as "Let's go see the baseball game tonight!") and finds the most appropriate emoji to be used with this sentence (⚾️). The advantage of using word vectors is that if the training set explicitly relates only a few words to a particular emoji, the algorithm will be able to generalize and associate words in the test set to the same emoji even if those words don't even appear in the training set. This allows us to build an accurate classifier mapping from sentences to emojis, even using a small training set. 

We first build a baseline model (Emojifier-V1) using word embeddings, then build a more sophisticated model (Emojifier-V2) that further incorporates an LSTM.

In [14]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import csv

%matplotlib inline

## 1 - Baseline model: Emojifier-V1

Let's start by building a simple baseline classifier. 

We have a tiny dataset (X, Y) where:
- X contains sentences (strings)
- Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence
- 0 - baseball, 1 - smile, 2 - disappointed, 3 - fork and knife, 4 - angry

First let us define some utility functions which we'll be using.

In [66]:
def read_csv(filename):
    """Read csv data, preprocess the data and return numpy arrays X for sentences and Y for labels"""
    phrase = []
    emoji = []

    with open (filename, encoding='utf-8') as csvDataFile:
        csvReader = csv.reader(csvDataFile)

        for row in csvReader:
            phrase.append(row[0].replace('\t','').replace(',','').replace('.','').replace("'",""))
            emoji.append(row[1])

    X = np.asarray(phrase)
    Y = np.asarray(emoji, dtype=int)

    return X, Y

In [16]:
def convert_to_one_hot(Y, C):
    """Converts vector Y to one-hot vector of dimension Y,C"""
    Y = np.eye(C)[Y.reshape(-1)]
    return Y

In [17]:
def read_glove_vecs(glove_file):
    with open(glove_file, 'r',encoding='UTF-8') as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
        
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

In [18]:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

In [19]:
def label_to_emoji(emo_find):
    """Converts the integers label emo_find to unicode emoticon"""
    emo_unicode, emo_id = read_csv('data/emoji_labels.csv')
    emo_dict=dict(zip(emo_id, emo_unicode))    
    return EMOJI_UNICODE[emo_dict[emo_find]]

In [20]:
EMOJI_UNICODE = {     
    u':baseball:': u'\U000026BE',
    u':smile:': u'\U0001F604',
    u':disappointed:': u'\U0001F61E',
    u':fork_and_knife:': u'\U0001F374',
    u':angry:': u'\U0001F620'
}

In [21]:
print(EMOJI_UNICODE[':baseball:'])

⚾


Let's load the dataset using the code below. We split the dataset between training (171 examples) and testing (48 examples).

In [22]:
X_train, Y_train = read_csv('data/train_emoji_data.csv')
X_test, Y_test = read_csv('data/test_emoji_data.csv')

Run the following cell to print sentences from X_train and corresponding labels from Y_train. Change `index` to see different examples. 

In [23]:
index = 1
print(X_train[index], label_to_emoji(Y_train[index]))

I am so happy for you 😄


The input of the model is a string corresponding to a sentence (e.g. "I love you). In the code, the output will be a probability vector of shape (1,5), that you then pass in an argmax layer to extract the index of the most likely emoji output.

To get our labels into a format suitable for training a softmax classifier, lets convert $Y$ from its current shape  current shape $(m, 1)$ into a "one-hot representation" $(m, 5)$, where each row is a one-hot vector giving the label of one example, You can do so using this next code snipper. Here, `Y_oh` stands for "Y-one-hot" in the variable names `Y_oh_train` and `Y_oh_test`: 


In [24]:
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)

Let's see what `convert_to_one_hot()` did. Feel free to change `index` to print out different values. 

In [25]:
index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])

1 is converted into one hot [ 0.  1.  0.  0.  0.]


The first step is to convert an input sentence into the word vector representation, which then get averaged together. We are using pretrained 200-dimensional GloVe embeddings to get the word vector representation. Please download the Glove embeddings from https://nlp.stanford.edu/projects/glove/ and add to data folder when you use as the file is bigger than github's upload size limit.

In [26]:
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.200d.txt')

We've loaded:
- `word_to_index`: dictionary mapping from words to their indices in the vocabulary (400,001 words, with the valid indices ranging from 0 to 400,000)
- `index_to_word`: dictionary mapping from indices to their corresponding words in the vocabulary
- `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.

To get a better idea of this check the below cell.


In [27]:
word = "cucumber"
index = 289846
print("the index of", word, "in the vocabulary is", word_to_index[word])
print("the", str(index) + "th word in the vocabulary is", index_to_word[index])

the index of cucumber in the vocabulary is 113317
the 289846th word in the vocabulary is potatos


In [28]:
def sentence_to_avg(sentence, word_to_vec_map):
    """
    Converts a sentence (string) into a list of words (strings). Extracts the GloVe representation of each word
    and averages its value into a single vector encoding the meaning of the sentence.
    
    Arguments:
    sentence -- string, one training example from X
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 200-dimensional vector representation
    
    Returns:
    avg -- average vector encoding information about the sentence, numpy-array of shape (200,)
    """
    words = [i.lower() for i in sentence.split()]
    avg = np.zeros((200,))
    
    for w in words:
        avg += word_to_vec_map[w]
    avg = avg / len(words)
    
    return avg

A sample avg sentence vector:

In [29]:
avg = sentence_to_avg("Morrocan couscous is my favorite dish", word_to_vec_map)
print("avg = ", avg)

avg =  [  5.02950000e-01  -2.20525000e-02   3.67925167e-01  -1.77369333e-01
   1.07486167e-01  -2.74276833e-01  -2.79050000e-01   1.33223833e-01
   1.96040000e-01  -5.13750000e-01  -2.43606300e-01  -3.19843333e-02
   3.22711667e-01   1.41161833e-01  -1.05260000e-01  -1.85755000e-01
   1.85501667e-01   1.58335000e-01   2.43765000e-02  -3.08233333e-03
   6.08083333e-02   1.26911500e+00   2.60788333e-01   1.37140433e-01
   2.91643000e-01  -1.62423167e-01   2.57763333e-02   2.90396500e-01
  -1.42969167e-01   8.32928333e-02  -2.85766667e-03   1.27052500e-01
   4.80670000e-02   4.10241667e-02  -3.22298333e-01   1.64100000e-01
  -1.87378833e-01  -1.83015500e-01  -1.31448667e-01  -1.53432500e-01
  -6.41886033e-02   1.17544500e-01   8.00196500e-02  -1.56911667e-01
  -4.21799333e-01   1.77143167e-01   2.62843333e-01  -1.83010000e-02
   1.39649167e-02   4.10620000e-01   9.92183333e-02  -1.77695000e-01
  -7.30453333e-02   4.29467108e-01   2.12993500e-01  -1.49915833e-01
  -2.47896667e-02  -1.04195

Assuming here that $Yoh$ ("Y one hot") is the one-hot encoding of the output labels, these are the equations we need to implement in the forward pass and to compute the cross-entropy cost:
$$ z^{(i)} = W . avg^{(i)} + b$$
$$ a^{(i)} = softmax(z^{(i)})$$
$$ \mathcal{L}^{(i)} = - \sum_{k = 0}^{n_y - 1} Yoh^{(i)}_k * log(a^{(i)}_k)$$

In [30]:
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):
    """
    Model to train word vector representations in numpy.
    
    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, numpy array of integers between 0 and 6, numpy-array of shape (m, 1)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    learning_rate -- learning_rate for the stochastic gradient descent algorithm
    num_iterations -- number of iterations
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    W -- weight matrix of the softmax layer, of shape (n_y, n_h)
    b -- bias of the softmax layer, of shape (n_y,)
    """
    
    np.random.seed(1)
    
    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 200                                # dimensions of the GloVe vectors 
    
    # Initialize parameters using Xavier initialization
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    
    # Optimization loop
    for t in range(num_iterations):                       # Loop over the number of iterations
        for i in range(m):                                # Loop over the training examples
            
            # Average the word vectors of the words from the i'th training example
            avg = sentence_to_avg(X[i], word_to_vec_map)

            # Forward propagate the avg through the softmax layer
            z = np.dot(W, avg) + b
            a = softmax(z)

            # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
            cost = -np.sum(np.multiply(Y_oh[i], np.log(a)))
            
            # Compute gradients 
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz

            # Update parameters with Stochastic Gradient Descent
            W = W - learning_rate * dW
            b = b - learning_rate * db
        
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)

    return pred, W, b

In [31]:
def predict(X, Y, W, b, word_to_vec_map):
    """
    Method to predict the check the accuracy of the model and return the predictions.
    
    Arguments:
    X -- input data, numpy array of sentences as strings, of shape (m, 1)
    Y -- labels, numpy array of integers between 0 and 6, numpy-array of shape (m, 1)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    learning_rate -- learning_rate for the stochastic gradient descent algorithm
    num_iterations -- number of iterations
    
    Returns:
    pred -- vector of predictions, numpy-array of shape (m, 1)
    """
    pred = []; acc = 0
    for i in range(Y.shape[0]):
        avg = sentence_to_avg(X[i], word_to_vec_map)
        z = np.dot(W, avg) + b
        a = softmax(z)
        a = np.where(a == np.max(a), 1, 0)
        a = np.flatnonzero(a)
        label = a[0]
        pred.append(label)
        if(label==Y[i]):
            acc = acc + 1
    print('Accuracy:' + str(acc/Y.shape[0]))
    return [pred]

Run the next cell to train the model and learn the softmax parameters (W,b). 

In [32]:
pred, W, b = model(X_train, Y_train, word_to_vec_map)

Epoch: 0 --- cost = 1.48729077389
Accuracy:0.42105263157894735
Epoch: 100 --- cost = 0.250806943323
Accuracy:0.9883040935672515
Epoch: 200 --- cost = 0.114635343064
Accuracy:0.9941520467836257
Epoch: 300 --- cost = 0.0711357803188
Accuracy:0.9941520467836257


Our model has pretty high accuracy on the training set. Lets now see how it does on the test set. 

In [33]:
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)

Training set:
Accuracy:1.0
Test set:
Accuracy:0.8958333333333334


Our test set also yielded accuracy close to 90% which is pretty good. Check some random sentences and see how the model performs.

In [34]:
X_my_sentences = np.array(["funny lol", "lets play with a ball", "food is ready", "not feeling happy", "Do not try to piss me off"])
Y_my_labels = np.array([[1], [0], [3],[2],[4]])

pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)

for i in range(Y_my_labels.shape[0]):
    print(X_my_sentences[i] +' '+ label_to_emoji(pred[0][i]))

Accuracy:1.0
funny lol 😄
lets play with a ball ⚾
food is ready 🍴
not feeling happy 😞
Do not try to piss me off 😠


Printing the confusion matrix can help us understand which classes are more difficult for your model. A confusion matrix shows how often an example whose label is one class ("actual" class) is mislabeled by the algorithm with a different class ("predicted" class). 

In [35]:
print('           '+ label_to_emoji(0)+ '  ' + label_to_emoji(1) + '  ' +  label_to_emoji(2)+ '  ' + label_to_emoji(3)+' ' + label_to_emoji(4))
print(pd.crosstab(Y_test, np.array(pred_test).reshape(Y_test.shape[0],), rownames=['Actual'], colnames=['Predicted'], margins=True))

           ⚾  😄  😞  🍴 😠
Predicted  0   1   2  3  4  All
Actual                         
0          8   0   0  0  0    8
1          0  12   0  0  1   13
2          0   0  11  0  1   12
3          0   0   1  7  0    8
4          0   0   2  0  5    7
All        8  12  14  7  7   48


Anger and sadness are being confused in our model

## 2 - Emojifier-V2: Using LSTMs in Keras: 

Let's build an LSTM model that takes as input word sequences. This model will be able to take word ordering into account. Emojifier-V2 will continue to use pre-trained word embeddings to represent words, but will feed them into an LSTM, whose job it is to predict the most appropriate emoji. 

Run the following cell to load the Keras packages.

In [36]:
import numpy as np
np.random.seed(0)
from keras.models import Model
from keras.layers import Dense, Input, Dropout, LSTM, Activation
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.initializers import glorot_uniform
np.random.seed(1)

Using TensorFlow backend.


Here is the Emojifier-v2  implemented:

<img src="images/emojifier-v2.png" style="width:700px;height:400px;"> <br>


In this model, we want to train in Keras using mini-batches. However, most deep learning frameworks require that all sequences in the same mini-batch have the same length. This is what allows vectorization to work: If you had a 3-word sentence and a 4-word sentence, then the computations needed for them are different (one takes 3 steps of an LSTM, one takes 4 steps) so it's just not possible to do them both at the same time.

The common solution to this is to use padding. Specifically, set a maximum sequence length, and pad all sequences to the same length. For example, of the maximum sequence length is 20, we could pad every sentence with "0"s so that each input sentence is of length 20. Thus, a sentence "i love you" would be represented as $(e_{i}, e_{love}, e_{you}, \vec{0}, \vec{0}, \ldots, \vec{0})$. In this example, any sentences longer than 20 words would have to be truncated. One simple way to choose the maximum sequence length is to just pick the length of the longest sentence in the training set. 


In [37]:
maxLen = max(len(max(X_train, key=len).split()), len(max(X_test, key=len).split()))
print(maxLen)

16


In Keras, the embedding matrix is represented as a "layer", and maps positive integers (indices corresponding to words) into dense vectors of fixed size (the embedding vectors). It can be trained or initialized with a pretrained embedding. In this part, we create an [Embedding()](https://keras.io/layers/embeddings/) layer in Keras, initialize it with the GloVe 200-dimensional vectors loaded earlier in the notebook. Because our training set is quite small, we will not update the word embeddings but will instead leave their values fixed. But in the code below, we'll show you how Keras allows you to either train or leave fixed this layer.  

The `Embedding()` layer takes an integer matrix of size (batch size, max input length) as input. This corresponds to sentences converted into lists of indices (integers).

The first step is to convert all your training sentences into lists of indices, and then zero-pad all these lists so that their length is the length of the longest sentence.  

In [38]:
def sentences_to_indices(X, word_to_index, max_len):
    """
    Converts an array of sentences (strings) into an array of indices corresponding to words in the sentences.
    The output shape should be such that it can be given to `Embedding()` (described in Figure 4). 
    
    Arguments:
    X -- array of sentences (strings), of shape (m, 1)
    word_to_index -- a dictionary containing the each word mapped to its index
    max_len -- maximum number of words in a sentence. You can assume every sentence in X is no longer than this. 
    
    Returns:
    X_indices -- array of indices corresponding to words in the sentences from X, of shape (m, max_len)
    """
    
    m = X.shape[0] 
    X_indices = np.zeros((m, max_len))
    
    for i in range(m):
        sentence_words = [w.lower() for w in X[i].split()]           
        j = 0 
        
        for w in sentence_words:
            X_indices[i, j] = word_to_index[w]
            j += 1
            
    return X_indices

Run the following cell to check what `sentences_to_indices()` does.

In [39]:
X1 = np.array(["funny lol", "lets play baseball", "food is ready for you"])
X1_indices = sentences_to_indices(X1,word_to_index, max_len = 6)
print("X1 =", X1)
print("X1_indices =", X1_indices)

X1 = ['funny lol' 'lets play baseball' 'food is ready for you']
X1_indices = [[ 155345.  225122.       0.       0.       0.       0.]
 [ 220930.  286375.   69714.       0.       0.       0.]
 [ 151204.  192973.  302254.  151349.  394475.       0.]]


Let's build the `Embedding()` layer in Keras, using pre-trained word vectors. After this layer is built, we will pass the output of `sentences_to_indices()` to it as an input, and the `Embedding()` layer will return the word embeddings for a sentence. 

In [40]:
def pretrained_embedding_layer(word_to_vec_map, word_to_index):
    """
    Creates a Keras Embedding() layer and loads in pre-trained GloVe 50-dimensional vectors.
    
    Arguments:
    word_to_vec_map -- dictionary mapping words to their GloVe vector representation.
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    embedding_layer -- pretrained layer Keras instance
    """
    
    vocab_len = len(word_to_index) + 1                  # adding 1 to fit Keras embedding (requirement)
    emb_dim = word_to_vec_map["cucumber"].shape[0]      # define dimensionality of your GloVe word vectors (= 200)
    
    # Initialize the embedding matrix as a numpy array of zeros of shape (vocab_len, dimensions of word vectors = emb_dim)
    emb_matrix = np.zeros((vocab_len, emb_dim))
    
    # Set each row "index" of the embedding matrix to be the word vector representation of the "index"th word of the vocabulary
    for word, index in word_to_index.items():
        emb_matrix[index, :] = word_to_vec_map[word]

    # Define Keras embedding layer with the correct output/input sizes, make it trainable. Use Embedding(...). Make sure to set trainable=False. 
    embedding_layer = Embedding(vocab_len, emb_dim, trainable=False)

    # Build the embedding layer, it is required before setting the weights of the embedding layer.
    embedding_layer.build((None,))
    
    # Set the weights of the embedding layer to the embedding matrix. Your layer is now pretrained.
    embedding_layer.set_weights([emb_matrix])
    
    return embedding_layer

In [41]:
embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
print("weights[0][1][3] =", embedding_layer.get_weights()[0][1][3])

weights[0][1][3] = -0.11844


Now let us feed the embedding layer we created and feed it to an LSTM network. We use the following in Keras: `Input(shape = ..., dtype = '...')`, [LSTM()](https://keras.io/layers/recurrent/#lstm), [Dropout()](https://keras.io/layers/core/#dropout), [Dense()](https://keras.io/layers/core/#dense), and [Activation()](https://keras.io/activations/).

In [42]:
def Emojify_V2(input_shape, word_to_vec_map, word_to_index):
    """
    Function creating the Emojify-v2 model's graph.
    
    Arguments:
    input_shape -- shape of the input, usually (max_len,)
    word_to_vec_map -- dictionary mapping every word in a vocabulary into its 50-dimensional vector representation
    word_to_index -- dictionary mapping from words to their indices in the vocabulary (400,001 words)

    Returns:
    model -- a model instance in Keras
    """
    
    # Define sentence_indices as the input of the graph, it should be of shape input_shape and dtype 'int32' (as it contains indices).
    sentence_indices = Input(input_shape, dtype='int32')
    
    # Create the embedding layer pretrained with GloVe Vectors
    embedding_layer = pretrained_embedding_layer(word_to_vec_map, word_to_index)
    
    # Propagate sentence_indices through your embedding layer, you get back the embeddings
    embeddings = embedding_layer(sentence_indices)   
    
    # Propagate the embeddings through an LSTM layer with 128-dimensional hidden state
    X = LSTM(128, return_sequences=True)(embeddings)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X trough another LSTM layer with 128-dimensional hidden state
    X = LSTM(128, return_sequences=False)(X)
    # Add dropout with a probability of 0.5
    X = Dropout(0.5)(X)
    # Propagate X through a Dense layer with softmax activation to get back a batch of 5-dimensional vectors.
    X = Dense(5)(X)
    # Add a softmax activation
    X = Activation('softmax')(X)
    
    # Create Model instance which converts sentence_indices into X.
    model = Model(inputs=sentence_indices, outputs=X)
          
    return model

In [60]:
model = Emojify_V2((maxLen,), word_to_vec_map, word_to_index)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 16)                0         
_________________________________________________________________
embedding_5 (Embedding)      (None, 16, 200)           80000200  
_________________________________________________________________
lstm_7 (LSTM)                (None, 16, 128)           168448    
_________________________________________________________________
dropout_7 (Dropout)          (None, 16, 128)           0         
_________________________________________________________________
lstm_8 (LSTM)                (None, 128)               131584    
_________________________________________________________________
dropout_8 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 5)                 645       
__________

We now need to compile it and define what loss, optimizer and metrics we are want to use. Compile your model using `categorical_crossentropy` loss, `adam` optimizer and `['accuracy']` metrics:

In [61]:
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

In [62]:
X_train_indices = sentences_to_indices(X_train, word_to_index, maxLen)
Y_train_oh = convert_to_one_hot(Y_train, C = 5)

We now fit the Keras model on `X_train_indices` and `Y_train_oh`. We will use `epochs = 30` and `batch_size = 32`.

In [63]:
model.fit(X_train_indices, Y_train_oh, epochs = 30, batch_size = 32, shuffle=True)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<keras.callbacks.History at 0xaa3d71a1d0>

In [64]:
X_test_indices = sentences_to_indices(X_test, word_to_index, max_len = maxLen)
Y_test_oh = convert_to_one_hot(Y_test, C = 5)
loss, acc = model.evaluate(X_test_indices, Y_test_oh)
print("Test accuracy = ", acc)

Test accuracy =  0.916666666667


Our test accuracy is over 91.5% which is a really good number. With more training data and hyperparameter tuning this can be further improved. Let us see the mislabelled sentences.

In [65]:
C = 5
y_test_oh = np.eye(C)[Y_test.reshape(-1)]
X_test_indices = sentences_to_indices(X_test, word_to_index, maxLen)
pred = model.predict(X_test_indices)
for i in range(len(X_test)):
    x = X_test_indices
    num = np.argmax(pred[i])
    if(num != Y_test[i]):
        print('Expected emoji:'+ label_to_emoji(Y_test[i]) + ' prediction: '+ X_test[i] + label_to_emoji(num).strip())

Expected emoji:😠 prediction: This girl is messing with me😞
Expected emoji:😠 prediction: This stupid grader is not working 😞
Expected emoji:😞 prediction: go away😠
Expected emoji:😠 prediction: Get lost now !😄


Looks like we are having trouble with very small sentences and between distinguishing anger and sadness. 
We can test any random sentence and see the emoticon now.

In [550]:
# Change the sentence below to see your prediction. Make sure all the words are in the Glove embeddings.  
x_test = np.array(['lets play'])
X_test_indices = sentences_to_indices(x_test, word_to_index, maxLen)
print(x_test[0] +' '+  label_to_emoji(np.argmax(model.predict(X_test_indices))))

lets play ⚾


## Conclusion

As you can see vector representations of words(word embeddings) in LSTM's can be used with good accuracy for emoticon generation. This model can be used for any emoticon generation even with less data but handling ambiguity might be a problem in that case as we have observed.

## Acknowledgments

Inspired by Coursera Sequence Models course assigment