# Emoji generation

In this project we implemented a model which inputs a sentence and finds the most appropriate emoji to be used with the sentence. The input sentence is first classified on the basis of the sentence's sentiment which later helps in finding the appropriate wmoji to be used with that sentence.
For example:

**Input:** <i>Lets get coffee and talk.</i>

**Output:**(☕️)

We have used the pretrained glove embedding matrix of size (400000, 50). It means, the matrix have the embeddings for 4,00,000 words and each word embedding is a vector of length 50. Using the embeddings we saw that even if we have used the small training set our algorithm will be able to generalize and associate words in the test set to the same emoji even if those words don't appear in the training set. 


### Dataset Used

We have a tiny dataset (X, Y)

- X contains 188 sentences (strings)
- Y contains a integer label between 0 and 4 corresponding to an emoji for each sentence

Out of which we have used 132 for training our model. And 56 labeled strings for the testing purpose. 

In this project we have worked on two ways to generate emoji. First model is our baseline model( i.e. ***emoji_gen.ipynb***) where we converted every single sentence into its embedding representation form( i.e. every word in a sentence will be represented in a vector of size 50). And then, we have averaged on every word embedding vector in a sentence and train this on softmax classifier.

<img src="images/image_1.png" style="width:900px;height:300px;">
<caption><center> <strong>Figure 2:</strong> Baseline model.</center></caption>


In this Second Model( i.e ***emoji_gen_lstm.ipynb***) we have used LSTM to build such approach. W irst converted every sentence into their corresponding embedding word vectors then trained the model. We provided each input word to a particular LSTM cell. The LSTM cell then passed its activation value (its learning) to the next LSTM cell in the same layer and to the next LSTM layer. 

Here we have used 2 layers of LSTM. Layer parameters for the first LSTM layer is ***LSTM( 128, return_sequence = True)*** and for second LSTM layer is ***LSTM( 128, return_sequence = False)***. We have used dropout as a regularization technique which reduces overfitting of the training data by spreading out the weights over the network.

Final is the softmax layer for output( i.e. sentence classification). We have used a function ***label_to_emoji()*** which uses the keras emoji library to automatically convert the integer label to its corresponding emoji.

<img src="images/emojifier-v2.png" style="width:700px;height:400px;"> <br>

<caption><center> <strong>Figure 3:</strong> A 2-layer LSTM sequence classifier. </center></caption>

LSTM model is better over the baseline model(i.e. where we train the softmax on the averaged word embedding vectors) because the baseline model is trained only on the basis of the word sentiment in the sentence. Rather, it should also pay emphesis on the word sequence used in the sentence. This problem gets solved in LSTM model. For example,

<b><u>For Baseline Model:</u></b>

***Input:*** <i>Not feeling happy.</i>

***Output:***(😄)

<b><u>For LSTM Model:</u></b>

***Input:*** <i>Not feeling happy.</i>

***Output:***(😞)

<b><i>Importing python libraries</i></b>

In [27]:
import numpy as np
from emo_utils import *
import emoji
import matplotlib.pyplot as plt

%matplotlib inline

<b><i>Reading dataset in train and test varibles</i></b>

In [15]:
X_train, Y_train = read_csv('data/train_emoji.csv')
X_test, Y_test = read_csv('data/tesss.csv')

In [16]:
print(X_train.shape)
print(Y_train.shape)
print(X_test.shape)
print(Y_test.shape)

(132,)
(132,)
(56,)
(56,)


<b><i>Maximum length of the longest sentence in the dataset</i></b>

In [17]:
maxLen = len(max(X_train, key=len).split())
maxLen

10

<b><i>Converting the train and test label to their one hot vector form matrix</i></b>

In [18]:
Y_oh_train = convert_to_one_hot(Y_train, C = 5)
Y_oh_test = convert_to_one_hot(Y_test, C = 5)
print(Y_oh_train.shape)

(132, 5)


In [19]:
index = 50
print(Y_train[index], "is converted into one hot", Y_oh_train[index])

0 is converted into one hot [1. 0. 0. 0. 0.]


<b><i>Function to read and store the glove embedding matrix</i></b>

In [20]:
def read_glove_vecs(glove_file):
    with open(glove_file, 'r', encoding="utf8") as f:
        words = set()
        word_to_vec_map = {}
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
        
        i = 1
        words_to_index = {}
        index_to_words = {}
        for w in sorted(words):
            words_to_index[w] = i
            index_to_words[i] = w
            i = i + 1
    return words_to_index, index_to_words, word_to_vec_map

<b><i>Reading the glove embedding matrix</i></b>

In [21]:
word_to_index, index_to_word, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

In [23]:
word = "cucumber"
index = 289846
print( word_to_index[word])
print(index_to_word[index])

113317
potatos


In [24]:
print (len(word_to_vec_map))

400000


<b><i>Function which average every embedding vector of words in a sentence </i></b>

In [25]:
def sentence_to_avg(sentence, word_to_vec_map):

    words = sentence.lower().split()

    # Initialize the average word vector, should have the same shape as your word vectors.
    avg = np.zeros((50,))
    
    for w in words:
        avg += word_to_vec_map[w]
    avg = avg/len(words)
    
    return avg

<b><i>Prediction function for predicting the labels once the weights ans bias are trained</i></b>

In [38]:
def predict(X, Y, W, b, word_to_vec_map):

    m = X.shape[0]
    pred = np.zeros((m, 1))
    
    for j in range(m):                       # Loop over training examples
        
        # Split jth test example (sentence) into list of lower case words
        words = X[j].lower().split()
        
        # Average words' vectors
        avg = np.zeros((50,))
        for w in words:
            avg += word_to_vec_map[w]
        avg = avg/len(words)

        # Forward propagation
        Z = np.dot(W, avg) + b
        A = softmax(Z)
        pred[j] = np.argmax(A)
        
    print("Accuracy: "  + str(np.mean((pred[:] == Y.reshape(Y.shape[0],1)[:]))))
    
    return pred

<b><i>Model function</i></b>

In [31]:
def model(X, Y, word_to_vec_map, learning_rate = 0.01, num_iterations = 400):

    np.random.seed(1)

    m = Y.shape[0]                          # number of training examples
    n_y = 5                                 # number of classes  
    n_h = 50                                # dimensions of the GloVe vectors 
    
    # Initialize parameters using Xavier initialization
    W = np.random.randn(n_y, n_h) / np.sqrt(n_h)
    b = np.zeros((n_y,))
    
    # Convert Y to Y_onehot with n_y classes
    Y_oh = convert_to_one_hot(Y, C = n_y) 
    # Optimization loop
    for t in range(num_iterations):                       
        for i in range(m):                                
            
            # Average the word vectors of the words from the i'th training example
            avg = sentence_to_avg(X[i], word_to_vec_map)

            # Forward propagate the avg through the softmax layer
            z = np.dot(W, avg) + b
            a = softmax(z)

            # Compute cost using the i'th training label's one hot representation and "A" (the output of the softmax)
            cost = -sum(Y_oh[i,:]*np.log(a))
            
            # Compute gradients 
            dz = a - Y_oh[i]
            dW = np.dot(dz.reshape(n_y,1), avg.reshape(1, n_h))
            db = dz

            # Update parameters with Stochastic Gradient Descent
            W = W - learning_rate * dW
            b = b - learning_rate * db
        
        if t % 100 == 0:
            print("Epoch: " + str(t) + " --- cost = " + str(cost))
            pred = predict(X, Y, W, b, word_to_vec_map)

    return pred, W, b

In [32]:
pred, W, b = model(X_train, Y_train, word_to_vec_map)

Epoch: 0 --- cost = 1.9520498812810072
Accuracy: 0.3484848484848485
Epoch: 100 --- cost = 0.07971818726014807
Accuracy: 0.9318181818181818
Epoch: 200 --- cost = 0.04456369243681402
Accuracy: 0.9545454545454546
Epoch: 300 --- cost = 0.03432267378786059
Accuracy: 0.9696969696969697


In [33]:
print("Training set:")
pred_train = predict(X_train, Y_train, W, b, word_to_vec_map)
print('Test set:')
pred_test = predict(X_test, Y_test, W, b, word_to_vec_map)

Training set:
Accuracy: 0.9772727272727273
Test set:
Accuracy: 0.8571428571428571


<b><i>Here, we can see that know the below example is giving the wrong emoji</i></b>

In [37]:
X_my_sentences = np.array(["not feeling happy"])
Y_my_labels = np.array([[3]])

pred = predict(X_my_sentences, Y_my_labels , W, b, word_to_vec_map)
print_predictions(X_my_sentences, pred)


not feeling happy 😄
