### TASK-3 NLP 

<br><b>Filename: <font color='red'>svm_model.ipynb</font></b> ---> defines the implementation pipeline from splitting the dataset into train and test sets( input features and output labels rspectively for each ) and predicting the labels of records in test using the SVM model defined over the training set.
<hr/>
This notebook specifies the following functions: ( the sequence of description is same as the sequence of their definition in the notebook cells below )
<ol>
    <li><b>feed_to_svm( x,y,x_test ): </b> Given the training set features 'x' and training set output labels 'y', an SVM model 'model' is defined which isthen deployed over the test set features 'x_test' in order to generate the output labels.</li>
    <li><b>make_train_test_sets( x,y ):</b> Given the dataset features for each record stored in 'x' and their corresponding labels in 'y', this function splits them into training and test sets as per the split size passed as parameter to the pre-defined scikit-learn split function</li>
    <li><b>svm_main( x,Y,categories ):</b> The driver function for the SVM model implementation pipeline. This function accepts the dataset pre-bifurcated into features and labels, then follows by splitting them into train and test sets and predicting test set results using the functions described above.</li>
</ol>
<b>NOTE:</b>Training set features or test set features refer to the 50-dimensional feature representation (obtained in compute_embeddings.ipynb) for each preprocessed dataset record description. The labels refer to their respective primary categories.



### CELL #1: importing required modules

In [2]:
from ipynb.fs.full.bag_of_words import *
import numpy as np
import pandas as pd
import re
import scipy as sp
from sklearn import svm
from sklearn.model_selection import train_test_split

### CELL #2: defining feed_to_svm( x,y,x_test ):
<br>Function description in the top cell
<br>This function does the following sequence of operations:
<ol>
    <li>Define an SVM model</li>
    <li>Fit the SVM according to the given training data features.</li>
    <li>Compute the labels for records in test set.</li>
    <li>Return these computed results</li>
</ol>

In [None]:
def feed_to_svm(x,y,x_test):
    
    model = svm.SVC() #--------------------------------------- STEP-1
    
    model.fit(x, y) #--------------------------------------- STEP-2
    
    r = model.predict(x_test) #--------------------------------------- STEP-3
    
    return r #--------------------------------------- STEP-4

### CELL #3: defining make_train_test_split( x,y ):
<br>Function description in the top cell
<br>This function does the following sequence of operations:
<ol>
    <li>Store the train set features and their corresponding labels in a single dataframe.</li>
    <li>Split the dataframe into train and test dataframes using sklearn function</li>
    <li>Reset indices in train and test indices</li>
    <li>Return train set features, train set labels, test set features and test set labels.</li>
</ol>

In [None]:
def make_train_test_sets(x,y):
    
    #--------------------------------------- STEP-1 STARTS HERE
    
    df=pd.DataFrame()
    df['x'] = x
    df['y'] = y
    
    #--------------------------------------- STEP-1 ENDS HERE
    
    train, test = train_test_split(df, test_size=0.10, shuffle=True) #----------------------- STEP-2
    
    #--------------------------------------- STEP-3 STARTS HERE
    
    train = train.reset_index(drop=True)
    test = test.reset_index(drop=True)
    
    #--------------------------------------- STEP-3 ENDS HERE
    
    '''print("----------- TRAIN SET ----------")
    print(len(train))
    print(train)
    print("----------- TEST SET ----------")
    print(len(test))
    print(test)'''
    return train['x'],train['y'].tolist(),test['x'],test['y'].tolist() #------------------------ STEP-4


### CELL #4: defining svm_main( x,Y,categories ):
<br>The driver function for the SVM model implementation pipeline
<br>Function description in the top cell
<br>This function does the following sequence of operations:
<ol>
    <li>Create 2 lists 'l' and 'l_test' to store the train and test set labels.</li>
    <li>Sanity check the records whether their categories are the primary categories shortlisting for the task :)</li>
    <li>Call make_train_test_sets() to split the dataset into train and test sets.</li>
    <li>Pass the training set and test set features to feed_to_svm() model</li>
    <li>Prepare the results and store in a CSV file svm_results.csv</li>
</ol>

In [None]:
def svm_main(x,Y,categories):
    '''print("========================= BEFORE: ")
    print(len(x))
    print(len(Y))'''
    #print(x)
    #print(Y)
    
    print("----------------------------------- SVM MODEL STARTS.....")
    
    #--------------------------------------- STEP-1 STARTS HERE
    l=[]
    l_test = []
    #--------------------------------------- STEP-1 STARTS HERE
    
    y = []
    
    #--------------------------------------- STEP-2 STARTS HERE
    for i in range(len(Y)):
        for j in range(len(categories)):
            if categories[j] == Y[i]:
                y.append(j)
    #--------------------------------------- STEP-2 ENDS HERE
    
    '''print("========================== AFTER: ")
    print(len(x))
    print(len(Y))'''
    
    y = np.array(y)
    
    x,y,test_x,test_y = make_train_test_sets(x.tolist(),y.tolist()) #-------------- STEP-3
    
    #--------------------------------------- POPULATING THE LABEL LISTS
    for i in range(len(x)):
        #print("i = ",i)
        l.append(x[i].tolist())
    for i in range(len(test_x)):
        l_test.append(test_x[i].tolist())
    #---------------------------------------
    
    X = np.array(l)
    y = np.array(y)
    x_test = np.array(l_test)
    
    r = feed_to_svm(X,y,x_test) #--------------------------------------- STEP-4
    
    #print("r: ",type(r))
    
    #--------------------------------------- STEP-5 STARTS HERE
    r = r.tolist()
    results = pd.DataFrame(columns=['actual','predicted'])
    results['actual'] = test_y
    results['predicted'] = r
    results.to_csv("output_files/svm_results.csv",index=True)
    
    print("----------------------------------- SVM MODEL ENDS.....")
    return results
    
    #--------------------------------------- STEP-5 ENDS HERE