# Lab 10 - Deploying and Serving Models
In this lab we will experiment with deploying a model as a pipiline with Flask.
This lab was adopted from [here](https://www.analyticsvidhya.com/blog/2020/04/how-to-deploy-machine-learning-model-flask/).

We’ll work with a Twitter dataset in this section. Our aim is to detect hate speech in Tweets. For the sake of simplicity, we say a Tweet contains hate speech if it has a racist or sexist sentiment associated with it. We will create a web page that will contain a text box like this (users will be able to search for any text).

### Please note that sentiment analysis is a text classification problem, if you adapt this code base for your coursework - you front-end interface will need to adapt for showing the tags obtained for the labelled sequence of tokens in the test input. 

Let’s start by importing some of the required libraries.

In [1]:
# importing required libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split

Next, we will read the dataset and view the top rows

In [2]:
data = pd.read_csv('dataset/twitter_sentiments.csv')

In [3]:
data.head()

Unnamed: 0,id,label,tweet
0,1,0,@user when a father is dysfunctional and is s...
1,2,0,@user @user thanks for #lyft credit i can't us...
2,3,0,bihday your majesty
3,4,0,#model i love u take with u all the time in ...
4,5,0,factsguide: society now #motivation


In [4]:
data.shape

(31962, 3)

In [5]:
data.label.value_counts()

label
0    29720
1     2242
Name: count, dtype: int64

Now, we will divide the data into train and test using the scikit-learn train_test_split function. We will take only 20 percent of the data for testing purposes. We will stratify the data on the label column so that the distribution of the target label will be the same in both train and test data:

In [6]:
train, test = train_test_split(data, test_size = 0.2, stratify = data['label'], random_state=21)

In [7]:
train.shape, test.shape

((25569, 3), (6393, 3))

In [8]:
train.label.value_counts(normalize=True)

label
0    0.929837
1    0.070163
Name: proportion, dtype: float64

In [9]:
test.label.value_counts(normalize=True)

label
0    0.929923
1    0.070077
Name: proportion, dtype: float64

Now, we will create a TF-IDF vector of the tweet column using the TfidfVectorizer and we will pass the parameter lowercase as True so that it will first convert text to lowercase. We will also keep max features as 1000 and pass the predefined list of stop words present in the scikit-learn library.

First, create the object of the TFidfVectorizer, build your model and fit the model with the training data tweets:

In [13]:
tfidf_vectorizer = TfidfVectorizer(lowercase= True, max_features=1000, stop_words='english')

In [14]:
tfidf_vectorizer.fit(train.tweet)

Use the model and transform the train and test data tweets:

In [15]:
train_idf = tfidf_vectorizer.transform(train.tweet)
test_idf  = tfidf_vectorizer.transform(test.tweet)

Now, we will create an object of the Logistic Regression model.

Remember – our focus is not on building a very accurate classification model but instead to see how we can deploy this predictive model to get the results.

In [16]:
model_LR = LogisticRegression()

In [17]:
model_LR.fit(train_idf, train.label)

In [18]:
predict_train = model_LR.predict(train_idf)

In [19]:
predict_test = model_LR.predict(test_idf)

In [20]:
# f1 score on train data
f1_score(y_true= train.label, y_pred= predict_train)

0.4865731462925852

In [21]:
f1_score(y_true= test.label, y_pred= predict_test)

0.45499181669394434

Let’s define the steps of the pipeline:

Step 1: Create a TF-IDF vector of the tweet text with 1000 features as defined above

Step 2: Use a logistic regression model to predict the target labels

When we use the fit() function with a pipeline object, both steps are executed. Post the model training process, we use the predict() function that uses the trained model to generate the predictions.

Read more about sci-kit learn pipelines in this comprehensive article: [Build your first Machine Learning pipeline using scikit-learn](https://www.analyticsvidhya.com/blog/2020/01/build-your-first-machine-learning-pipeline-using-scikit-learn/)!

In [24]:
pipeline = Pipeline(steps= [('tfidf', TfidfVectorizer(lowercase=True,
                                                      max_features=1000,
                                                      stop_words= 'english')),
                            ('model', LogisticRegression())])

In [25]:
pipeline.fit(train.tweet, train.label)

In [26]:
pipeline.predict(train.tweet)

array([0, 0, 0, ..., 0, 0, 0], dtype=int64)

Now, we will test the pipeline with a sample tweet:

In [27]:
text = ["Virat Kohli, AB de Villiers set to auction their 'Green Day' kits from 2016 IPL match to raise funds"]

In [28]:
pipeline.predict(text)

array([0], dtype=int64)

We have successfully built the machine learning pipeline and we will save this pipeline object using the dump function in the joblib library. You just need to pass the pipeline object and the file name:

In [29]:
from joblib import dump

In [30]:
dump(pipeline, filename="text_classification.joblib")

['text_classification.joblib']

It will create a file name “text_classification.joblib“. Now, we will open another Python file and use the load function of the joblib library to load the pipeline model.

Let’s see how to use the saved model:

In [31]:
import pandas as pd
from joblib import load

In [32]:
text = ["Virat Kohli, AB de Villiers set to auction their 'Green Day' kits from 2016 IPL match to raise funds"]

In [33]:
pipeline = load("text_classification.joblib")

In [34]:
pipeline.predict(text)

array([0], dtype=int64)

In [35]:
data[data.label == 1]

Unnamed: 0,id,label,tweet
13,14,1,@user #cnn calls #michigan middle school 'buil...
14,15,1,no comment! in #australia #opkillingbay #se...
17,18,1,retweet if you agree!
23,24,1,@user @user lumpy says i am a . prove it lumpy.
34,35,1,it's unbelievable that in the 21st century we'...
...,...,...,...
31934,31935,1,lady banned from kentucky mall. @user #jcpenn...
31946,31947,1,@user omfg i'm offended! i'm a mailbox and i'...
31947,31948,1,@user @user you don't have the balls to hashta...
31948,31949,1,"makes you ask yourself, who am i? then am i a..."


Its now time to run the pipeline (i.e. data featurisation and model prediction) and make calls from a web page!

The following command will start the flask app as a python command... but ideally you would run this from a command line, not from the notebook.

In [37]:
!pip install flask

Collecting flask
  Downloading flask-3.1.0-py3-none-any.whl.metadata (2.7 kB)
Collecting Werkzeug>=3.1 (from flask)
  Downloading werkzeug-3.1.3-py3-none-any.whl.metadata (3.7 kB)
Collecting itsdangerous>=2.2 (from flask)
  Downloading itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting blinker>=1.9 (from flask)
  Downloading blinker-1.9.0-py3-none-any.whl.metadata (1.6 kB)
Downloading flask-3.1.0-py3-none-any.whl (102 kB)
   ---------------------------------------- 0.0/103.0 kB ? eta -:--:--
   ----------- ---------------------------- 30.7/103.0 kB 1.3 MB/s eta 0:00:01
   ---------------------------------------- 103.0/103.0 kB 1.5 MB/s eta 0:00:00
Downloading blinker-1.9.0-py3-none-any.whl (8.5 kB)
Downloading itsdangerous-2.2.0-py3-none-any.whl (16 kB)
Downloading werkzeug-3.1.3-py3-none-any.whl (224 kB)
   ---------------------------------------- 0.0/224.5 kB ? eta -:--:--
   --------------------------------------- 224.5/224.5 kB 13.4 MB/s eta 0:00:00
Installing collect

In [40]:
!python get_sentiment.py

^C


Now that this is running go to  http://127.0.0.1:5000 or http://localhost:5000 and try it out

#### To stop the process just interrupt the kernel.

### Alternatives for Flask: 
[Streamlit](https://streamlit.io/)

[Sample Code - Git Repo](https://github.com/alphagov/govuk-datalabs-streamlit-NER)

[Sample Code - TDS tutorial](https://towardsdatascience.com/build-a-named-entity-recognition-app-with-streamlit-f157672f867f)

or 

[Mercury](https://runmercury.com/)

[Sample Project](https://towardsdatascience.com/build-elegant-web-apps-right-from-jupyter-notebook-with-mercury-78d9ebcbbcaf)