# Lab 10 - Part 2 - Deploying and Serving Models
In this lab we will experiment with deploying a model as a pipiline with Flask.
This lab was adopted from: https://www.analyticsvidhya.com/blog/2020/04/how-to-deploy-machine-learning-model-flask/

We’ll work with a Twitter dataset in this section. Our aim is to detect hate speech in Tweets. For the sake of simplicity, we say a Tweet contains hate speech if it has a racist or sexist sentiment associated with it. We will create a web page that will contain a text box like this (users will be able to search for any text).

### Please note that sentiment analysis is a text classification problem, if you adapt this code base for your coursework - you front-end interface will need to adapt for showing the tags obtained for the labelled sequence of tokens in the test input. 

Let’s start by importing some of the required libraries.

In [None]:
# importing required libraries
import pandas as pd
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS, TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split

Next, we will read the dataset and view the top rows

In [None]:
data = pd.read_csv('dataset/twitter_sentiments.csv')

In [None]:
data.head()

## Challenge 01

In [None]:
# TODO: Print the shape of imported dataset.


In [None]:
data.label.value_counts()

Now, we will divide the data into train and test using the scikit-learn train_test_split function. We will take only 20 percent of the data for testing purposes. We will stratify the data on the label column so that the distribution of the target label will be the same in both train and test data:

In [None]:
# TODO: Divide the dataset into train and test set.
train, test = ...

In [None]:
train.shape, test.shape

In [None]:
train.label.value_counts(normalize=True)

In [None]:
test.label.value_counts(normalize=True)

## Challenge 02

Now, we will create a TF-IDF vector of the tweet column using the TfidfVectorizer and we will pass the parameter lowercase as True so that it will first convert text to lowercase. We will also keep max features as 1000 and pass the predefined list of stop words present in the scikit-learn library.

First, create the object of the TFidfVectorizer, build your model and fit the model with the training data tweets:

In [None]:
tfidf_vectorizer = TfidfVectorizer(lowercase= True, max_features=1000, stop_words=ENGLISH_STOP_WORDS)

In [None]:
# TODO: Fit the TF-IDF Vectorizer on the training data tweets.


Use the model and transform the train and test data tweets:

In [None]:
# TODO: Transform the train and test data tweets.
train_idf = ...
test_idf  = ...

## Challenge 03

Now, we will create an object of the Logistic Regression model.

Remember – our focus is not on building a very accurate classification model but instead to see how we can deploy this predictive model to get the results.

In [None]:
model_LR = LogisticRegression()

In [None]:
# TODO: Fit LR model on train_df and pass training labels.


In [None]:
# TODO: Predict train labels
predict_train = ...

In [None]:
# TODO: Predict test labels
predict_test = ...

In [None]:
# f1 score on train data
f1_score(y_true = train.label, y_pred = predict_train)

In [None]:
# f1 score on test data
f1_score(y_true= test.label, y_pred= predict_test)

Let’s define the steps of the pipeline:

Step 1: Create a TF-IDF vector of the tweet text with 1000 features as defined above

Step 2: Use a logistic regression model to predict the target labels

When we use the fit() function with a pipeline object, both steps are executed. Post the model training process, we use the predict() function that uses the trained model to generate the predictions.

Read more about sci-kit learn pipelines in this comprehensive article: [Build your first Machine Learning pipeline using scikit-learn](https://www.analyticsvidhya.com/blog/2020/01/build-your-first-machine-learning-pipeline-using-scikit-learn/)!

In [None]:
pipeline = Pipeline(steps= [('tfidf', TfidfVectorizer(lowercase=True,
                                                      max_features=1000,
                                                      stop_words= ENGLISH_STOP_WORDS)),
                            ('model', LogisticRegression())])

In [None]:
pipeline.fit(train.tweet, train.label)

In [None]:
pipeline.predict(train.tweet)

Now, we will test the pipeline with a sample tweet:

In [None]:
text = ["Virat Kohli, AB de Villiers set to auction their 'Green Day' kits from 2016 IPL match to raise funds"]

In [None]:
pipeline.predict(text)

We have successfully built the machine learning pipeline and we will save this pipeline object using the dump function in the joblib library. You just need to pass the pipeline object and the file name:

In [None]:
from joblib import dump

In [None]:
dump(pipeline, filename="text_classification.joblib")

It will create a file name “text_classification.joblib“. Now, we will open another Python file and use the load function of the joblib library to load the pipeline model.

Let’s see how to use the saved model:

In [None]:
import pandas as pd
from joblib import load

In [None]:
text = ["Virat Kohli, AB de Villiers set to auction their 'Green Day' kits from 2016 IPL match to raise funds"]

In [None]:
pipeline = load("text_classification.joblib")

In [None]:
pipeline.predict(text)

Its now time to run the pipeline (i.e. data featurisation and model prediction) and make calls from a web page!

The following command will start the flask app as a python command... but ideally you would run this from a command line, not from the notebook.

In [None]:
!python get_sentiment.py

Now that this is running go to  http://127.0.0.1:5000 or http://localhost:5000 and try it out

To stop the process just interrupt the kernel.