## Narural Language Processing

### Import Librarires

In [0]:
import numpy as np
from tqdm import tqdm_notebook as tn

    Steps:
    1. First we need to initialize the parameters based on the input, output and hidden layer dimension.
    2. Then we calculate the cost or loss by forward propagation.
    3. Then we calculate the gradients of the cost function w.r.t the parameters in backward propagation.
    4. Then we update the parameters based on the gradients and learning rate. (Gradient Descent algorithm)

### Parameter Initialization 

In [0]:
def initialize_parameters(input_dim, hidden_dim, output_dim):

  ## Two weight matrix and two bias vectors
  W1 = np.random.randn(input_dim, hidden_dim)
  b1 = np.zeros((1,hidden_dim))
  W2 = np.random.randn(hidden_dim, output_dim)
  b2 = np.zeros((1,output_dim))

  ## Create a dictionary for the parameters
  parameters = {"W1": W1, "b1" : b1, "W2": W2, "b2" : b2}

  return parameters

### Forward Propagation

In [0]:
def forward_propagation(X, parameters):

    ## We need the parameters to calculate the values
    W1, b1, W2, b2  = parameters["W1"], parameters["b1"], parameters["W2"], parameters["b2"]

    ## Forward Proapagation

    Z1 = X.dot(W1) + b1

    ## tanh activation for hidden layer
    A1 = np.tanh(Z1)

    Z2 = A1.dot(W2) + b2
    
    ## Softmax activation for output layer
    exp_scores = np.exp(Z2)
    A2 = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)            
 
    fp_output = {"A1": A1,"A2": A2}
    return fp_output

### Backward Propagation

In [0]:
def backward_propagation(X, Y, fp_output, parameters):

    ## Closed form solution is used
    ## Need A1 and A2 obtained in forward prop
    A1 = fp_output["A1"]
    A2 = fp_output["A2"]

    W2 = parameters["W2"]

    ## Calculated Gradients
    dZ2 = A2 - Y
    dW2 = (A1.T).dot(dZ2)
    db2 = np.sum(dZ2, axis=0, keepdims=True)
    dZ1 = dZ2.dot(W2.T) * (1 - np.power(A1, 2))
    dW1 = np.dot(X.T, dZ1)
    db1 = np.sum(dZ1, axis=0)

    ## Dictionary to keep gradients for different parameters
    gradients = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}

    return gradients


### Update Parameters (Gradient Descent)

In [0]:
def update_parameters(parameters, gradients, learning_rate):

  ## Learning rate is provided for GD
  lr = learning_rate

  ## Need parameters and gradient values
  W1, b1, W2, b2  = parameters["W1"], parameters["b1"], parameters["W2"], parameters["b2"]
  dW1, db1, dW2, db2  = gradients["dW1"], gradients["db1"], gradients["dW2"], gradients["db2"]

  ## Update the parameters
  W1 = W1 - lr*dW1
  b1 = b1 - lr*db1
  W2 = W2 - lr*dW2
  b2 = b2 - lr*db2
  
  ## Create a dictionary for updated parameters
  updated_parameters = {"W1": W1, "W2": W2, "b1" : b1, "b2" : b2}

  return updated_parameters

    Here we created a function to calculate cost.
    Cost function is categorical cross-entropy.
    It is calculated using the predicted values by the model and true labels.

### Cost Function

In [0]:
def calculate_cost(A2, Y):

  ## Cross entropy loss for a single input
  cost = -np.sum(np.multiply(Y, np.log(A2)))
  cost = np.squeeze(cost)

  return cost

    Here we  created a function to calculate accuarcy.
    It returns mismatches along with accuracy if the print_mismatch is set to be true.

### Accuracy Metric

In [0]:
def accuracy_metric(actual, predicted, print_mismatch=False):
  
  mismatch={}
  correct = 0
  for i in range(len(actual)):
    if actual[i] == predicted[i]:
        correct += 1
    else:
        if print_mismatch == True:
          actual_word= list_of_words[np.argmax(actual[i])]
          predicted_word= list_of_words[np.argmax(predicted[i])]
          mismatch[actual_word] = predicted_word
          
  accuracy= correct / float(len(actual)) * 100.0
  if print_mismatch==True:
    return(accuracy, mismatch)
  else:
    return(accuracy)  

### Model Function

    The model function takes input, output, parameters and learning rate as input.
    It just assembles the functions.
    1. Forward Propagation
       Get the predicted output and the cost.
    2. Backward Propagation
       Get the gradients.
    3. Update the parameters.

    Returns the parameters and cost as output.

In [0]:
def model(X, Y, parameters,learning_rate):
   
    fp_output = forward_propagation(X, parameters)
    A2= fp_output["A2"]

    cost = calculate_cost(A2, Y)

    gradients = backward_propagation(X, Y, fp_output, parameters)

    parameters = update_parameters(parameters, gradients, learning_rate)

    return parameters,cost

### Function for Predicting

    Here we write a function to predict the output for an unknown case.

    It returns the probability corresponding to each output class if we set prob as true.

    It returns the index where the we get the maxumum probabiliy if we set index as true.

    It returns predicted word if we set word as true.

    It returns the output vector if all the above is set to false as in the default.

In [0]:
def predict(X, parameters,index= False, word=False,prob=False):
    fp_output = forward_propagation(X, parameters)
    A2= fp_output["A2"]
    yhat = A2
    yhat = np.squeeze(yhat)
    
    out=np.argmax(yhat)

    output=[0 for i in range(len(yhat))]
    output[out]=1

    if index==True:
      return out

    if prob==True:
        return(yhat)

    if word==True:
        return(list_of_words[out])

    return output

### Function for training
(Using Stochastic Gradient Descent)

    This function update parameters after calculating loss after feeding each input.

    One epoch gets completed after all the input is fed to the network.

    Then we do the above step for given no of iterations.

    The function prints cost and accuracy after 100 epochs.

    Here if we set the print_mismatch argument true, then it prints the cases where the predicted word doesn't match with the actual word.

    The function finally returns the learned parameters which will be used during testing

In [0]:
def train(encoded_words,label,hidden_layer_size,num_of_iters,learning_rate, print_mismatch= False):

    input_size = len(encoded_words[0])
    output_size= len(label[0])
    parameters= initialize_parameters(input_size, hidden_layer_size, output_size)
    actual = label
  
    ##  Loop for no of iterations
    for k in tn(range(0, num_of_iters+1), unit= " Epoch"):
        index=0
        cost=0
        
        predicted =[]

        for i in encoded_words:

            X=np.array([i])
            Y=np.array([label[index]])
              
            trained_parameters,cost = model(X, Y, parameters,learning_rate)
            parameters= trained_parameters  

            cost+=cost
            index+=1

        for i in encoded_words:
          i=np.array([i])
          predicted.append(predict(i, trained_parameters))

        if print_mismatch== True:
            if k==num_of_iters :
              accuracy, mismatch = accuracy_metric(actual,predicted,print_mismatch=True)
            else:
              accuracy = accuracy_metric(actual,predicted,print_mismatch=False)

        else:
            accuracy = accuracy_metric(actual,predicted,print_mismatch=False)

        ## Print the result after each 100 iterations
        if(k%100 == 0):
            print('Cost after iteration, {:d}: {:f}'.format(k, cost))
            print('Accuracy after iteration, {:d}: {:f}'.format(k, accuracy))

        if (k==num_of_iters) and print_mismatch== True:
          print("\n")
          print("Mismatches:",mismatch)
          
    return trained_parameters      

### Example with a simple case


    One simple example is shown to show whether the model works or not.

    This is nothing but the XOR gate.

    When we give (0,0),(0,1),(1,0),(1,1) as input we expect the output as 0,1,1,0 respectively.

    But here I have kept two nodes in the output. First node correspond to zero and second node correspond to 1 i.e. if we get [1,0] as ouput then in true sense the output is zero and 1 if we get [0,1] as output. We just need to take argmax for the desired result

In [0]:
inp=[[0,0],[0,1],[1,0],[1,1]]
out=[[1,0],[0,1],[0,1],[1,0]] 

In [0]:
learned_parameters = train(inp, out, 4, 1000,0.1)

HBox(children=(IntProgress(value=0, max=1001), HTML(value='')))

Cost after iteration, 0: 8.295265
Accuracy after iteration, 0: 25.000000
Cost after iteration, 100: 0.082929
Accuracy after iteration, 100: 100.000000
Cost after iteration, 200: 0.036422
Accuracy after iteration, 200: 100.000000
Cost after iteration, 300: 0.022591
Accuracy after iteration, 300: 100.000000
Cost after iteration, 400: 0.016132
Accuracy after iteration, 400: 100.000000
Cost after iteration, 500: 0.012438
Accuracy after iteration, 500: 100.000000
Cost after iteration, 600: 0.010066
Accuracy after iteration, 600: 100.000000
Cost after iteration, 700: 0.008421
Accuracy after iteration, 700: 100.000000
Cost after iteration, 800: 0.007218
Accuracy after iteration, 800: 100.000000
Cost after iteration, 900: 0.006302
Accuracy after iteration, 900: 100.000000
Cost after iteration, 1000: 0.005583
Accuracy after iteration, 1000: 100.000000



### Prediction for the test case of XOR

In [0]:
X_test=[[0,0],[0,1],[1,0],[1,1]]
for i in X_test:
  X = np.array([i])  
  y_predict = predict(X, learned_parameters,prob=False)
  # Print the result
  print("Xor value for input {} is: {:d}".format(X,np.argmax(y_predict)))

Xor value for input [[0 0]] is: 0
Xor value for input [[0 1]] is: 1
Xor value for input [[1 0]] is: 1
Xor value for input [[1 1]] is: 0


    The model works. We don't have other test examples here.

### Training and testing for actual task
For Five length input(26 dimensional vector) and 128 length output(one hot vector for words)

    Here we do the main task.
    Steps:

    1. First We take 128 five length words.
       We make all the letters in each word in lower case.
    2. Then we create 26 dimensional vector for each word based on the letters.
       Here we create a 26 dimensional zero vector. Then we put 1 in the position corresponding to a letter.
    3. Then we create a input matrix (which will be fed as input in the main function) which keeps encoded vectors for each word
    4. Then we create the 128 dimensional output vector.
       It is one hot encoding of the 128 words.
  

## Create Input

### List of Five letter words

In [0]:
list_of_words=["seven","world","about","again","heart","pizza","water","happy","sixty","board","month","Angel","death","green","music","fifty","three","party","piano","Kelly","mouth","woman","sugar","amber","dream","apple","laugh","tiger","faith","earth","river","money","peace","forty","words","smile","abate","house","alone","watch","lemon","South","erica","anime","after","santa","women","admin","Jesus","stone","blood","megan","thing","light","David","cough","story","power","India","point","today","Sarah","anger","Night","glory","April","candy","puppy","above","phone","chris","vegan","forum","Jason","Irish","birth","other","grace","queen","pasta","plant","Jacob","smart","knife","magic","jelly","black","media","honor","cycle","truth","zebra","train","bully","chain","brain","mango","under","dirty","Eight","fruit","kevin","panda","truck","field","bible","radio","dance","voice","smith","sorry","Paris","being","lover","never","royal","Venus","metal","Henry","penny","north","bread","daily","paper","beard","alive","place","chair"]

In [0]:
print("No of words:",len(list_of_words))

No of words: 128


In [0]:
alphabet= "abcdefghijklmnopqrstuvwxyz"
dic_alphabet= {alphabet[i]:i for i in range(len(alphabet))}

In [0]:
print(dic_alphabet)

{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 5, 'g': 6, 'h': 7, 'i': 8, 'j': 9, 'k': 10, 'l': 11, 'm': 12, 'n': 13, 'o': 14, 'p': 15, 'q': 16, 'r': 17, 's': 18, 't': 19, 'u': 20, 'v': 21, 'w': 22, 'x': 23, 'y': 24, 'z': 25}


### Encoding of a word

In [0]:
def encoded_word(word):

    output=[0 for i in range(26)]
    
    for i in word:
        if i not in dic_alphabet.keys():
            print("Wrong Input")
            print(i)
        else:
            output[dic_alphabet[i]]+=1
    
    output= [x / len(word) for x in output]      
    
    return output        

### Convert words to lower case

In [0]:
for i in range(len(list_of_words)):
        list_of_words[i]=list_of_words[i].lower()

In [0]:
input_dic = {i:list_of_words[i] for i in range(len(list_of_words))}

### Input to pass in the train function

In [0]:
input_matrix= []
for i in range(len(list_of_words)):
    input_matrix.append(encoded_word(list_of_words[i]))

### Create Label

In [0]:
label=[[0 for i in range(128)] for j in range(128)]
for i in range(len(list_of_words)):
    label[i][i]=1

### Main model train

    Here we train our model.
    Input matrix and label which are created in the previous step are given as input. 
    We also give no of iterations as input and learning rate.
    It also print the mismatches in the training if print_mismatch is set to be true.

In [0]:
trained_parameters= train(input_matrix, label, 10, 2000, 0.1, print_mismatch=True)

HBox(children=(IntProgress(value=0, max=2001), HTML(value='')))

Cost after iteration, 0: 13.816395
Accuracy after iteration, 0: 0.781250
Cost after iteration, 100: 0.746036
Accuracy after iteration, 100: 97.656250
Cost after iteration, 200: 0.231450
Accuracy after iteration, 200: 97.656250
Cost after iteration, 300: 0.129274
Accuracy after iteration, 300: 97.656250
Cost after iteration, 400: 0.087786
Accuracy after iteration, 400: 97.656250
Cost after iteration, 500: 0.065785
Accuracy after iteration, 500: 97.656250
Cost after iteration, 600: 0.052307
Accuracy after iteration, 600: 97.656250
Cost after iteration, 700: 0.043263
Accuracy after iteration, 700: 97.656250
Cost after iteration, 800: 0.036802
Accuracy after iteration, 800: 97.656250
Cost after iteration, 900: 0.031969
Accuracy after iteration, 900: 97.656250
Cost after iteration, 1000: 0.028226
Accuracy after iteration, 1000: 97.656250
Cost after iteration, 1100: 0.025247
Accuracy after iteration, 1100: 97.656250
Cost after iteration, 1200: 0.022823
Accuracy after iteration, 1200: 97.6562

### Testing

    Steps:
    1. We take the test examples.
    2. We take the actual words (that we want as output) corresponding to the examples taken.
    3. Then we predict the words using the predict function based on the learned parameters.
    4. Then we check where the correct and wrong prediction occured.
    5. Then we calculate the test accuracy.
       The test accuracy highly depends on the test data.

### Test Data

In [0]:
test_words= ["hapqy","todey","chrid","applo","knofe","porty","matel","chais","redio","naver","cykle","pland","hodor","dence","poyer","piaza","momth","grean","rivar","kight","traen"]
actual_words = ["happy","today","chris","apple","knife","forty","metal","chain","radio","never","cycle","plant","honor","dance","power","pizza","month","green","river","night","train"]

### For Predicting Words

In [0]:
predicted_words=[]
for i in test_words:
  encode_word=encoded_word(i)
  X=np.array([encode_word])
  predicted= predict(X, trained_parameters,index= True)
  predicted_words.append(input_dic[predicted])


### Check for the Correct and wrong prediction

In [0]:
for i in range(len(actual_words)):
  if actual_words[i]== predicted_words[i]:
    print("Test word: {:s} , Actual: {:s} , predicted: {:s} ---- Correct Prediction".format(test_words[i],actual_words[i],predicted_words[i]))
  else:
    print("Test word: {:s} , Actual: {:s} , predicted: {:s} ---- Wrong Prediction".format(test_words[i],actual_words[i],predicted_words[i]))

Test word: hapqy , Actual: happy , predicted: happy ---- Correct Prediction
Test word: todey , Actual: today , predicted: lover ---- Wrong Prediction
Test word: chrid , Actual: chris , predicted: chris ---- Correct Prediction
Test word: applo , Actual: apple , predicted: apple ---- Correct Prediction
Test word: knofe , Actual: knife , predicted: knife ---- Correct Prediction
Test word: porty , Actual: forty , predicted: forty ---- Correct Prediction
Test word: matel , Actual: metal , predicted: metal ---- Correct Prediction
Test word: chais , Actual: chain , predicted: jesus ---- Wrong Prediction
Test word: redio , Actual: radio , predicted: voice ---- Wrong Prediction
Test word: naver , Actual: never , predicted: never ---- Correct Prediction
Test word: cykle , Actual: cycle , predicted: kelly ---- Wrong Prediction
Test word: pland , Actual: plant , predicted: alone ---- Wrong Prediction
Test word: hodor , Actual: honor , predicted: honor ---- Correct Prediction
Test word: dence , Act

### Calculate the test accuracy

In [0]:
test_accuracy= accuracy_metric(actual_words, predicted_words)
print("Test Accuracy:",test_accuracy)

Test Accuracy: 66.66666666666666


    Test accuracy depends on the given test examples. For some simple cases the model fails. For some cases it is able to predict the output correctly.
    