Overview : Deploying your ML Model : Paas vs. Iaas
--

The deployment of machine learning models is the process for making your models available in production environments, where they can provide predictions to other software systems. It is only once models are deployed to production that they start adding value, making deployment a crucial step. 

> When it comes to deployments, you need to decide if you’re going to go with a Platform as a Service **(PaaS)** or Infrastructure as a Service **(IaaS)**. 

> A **PaaS** can be great for prototyping and businesses with lower traffic. Eventually, once the business grows and/or traffic increases, you’re going to move towards IaaS. 

> w.r.t **IaaS** : There are plenty of solutions from AWS, Google, Microsoft, etc.. . If you’ve never deployed anything before, I’d recommend starting with Flask or Heroku*. (* Django projects are hosted on Heroku for free)

If your applications are containerized, deployments on most platforms/infrastructure tend to be easier. 

> Containerization also gives you the option to use a container orchestration platform (Kubernetes is now the standard) to rapidly scale the number of containers as demand shifts.

**Be sure that your deployments occur via a Continuous Deployment platform.**

An example set of components involved in the whole deployment lifecycle:

![Paas_vs_Iaas](images/Paas_vs_Iaas.png)

Simple Example :
--
Why and How should I deploy a ML model ?
--

Consider the following situation:

You have built a super cool machine learning model that can predict if a particular transaction is fraudulent or not. Now, a friend of yours is developing an android application for general banking activities and wants to integrate your machine learning model in their application for its super objective.

But your friend found out that, you have coded your model in Python while your friend is building his application in Java. So? Won't it be possible to integrate your machine learning model into your friend's application?

Fortunately enough, you have the power of APIs. And the above situation is one of the many where the need of turning your machine learning models into APIs is extremely important.(i.e in simple words you are deploying your ML model as WEB API) Many of the industries are now looking for Data Scientists who can do this.

**Options to implement Machine Learning models**

Most of the times, the real use of your machine learning model lies at the heart of an intelligent product – that may be a small component of a recommender system or an intelligent chat-bot. These are the times when the barriers seem very difficult to overcome.

For example, the majority of the ML practitioners use R/Python for their experiments. But consumers of those ML models would be software engineers who use a completely different technology stack. There are two ways via which this problem can be solved:

1. **Rewriting the whole code in the language that the software engineering folks work**. The above seems like a good idea, but the time & energy required to get those intricate models replicated would be utterly waste. Majority of languages like JavaScript, do not have great libraries to perform ML. One would be wise to stay away from it.

2. **API-first approach** – Web APIs have made it easy for cross-language applications to work well. If a frontend developer needs to use your ML Model to create an ML powered web application, they would just need to get the URL Endpoint from where the API is being served.

> **What are APIs?**

> Essentially, APIs are very much like web applications, but instead of giving you a nicely styled HTML page, APIs tend to return data in a standard data-exchange format such as JSON, XML, etc. Once a developer has the desired output they can style it whatever the way they want. 

There are many popular ML APIs as well for example - 

> IBM Watson's ML API which is capable of the following:

1. Machine Translation - Helps translate text in different language pairs.

2. Message Resonance – To find out the popularity of a phrase or word with a predetermined audience.

3. Question and Answers - This service provides direct answers to the queries that are triggered by primary document sources.

4. User Modelling – To make predictions about social characteristics of someone from a given text.

> Google Vision API is also an excellent example which provides dedicated services for Computer Vision tasks.

Basically what happens is a majority of the cloud providers, and smaller machine learning focused companies provide ready-to-use APIs. They cater to the needs of developers/businesses that do not have expertise in ML, who want to implement ML in their processes or product suites.

Popular examples of machine learning APIs suited explicitly for web development stuff are DialogFlow, Microsoft's Cognitive Toolkit, TensorFlow.js, etc.

**Web-app architecture**  --- very important to understand this first

![basic_ML_deployment_diagram](images/basic_ML_deployment_diagram.png)

1. The user (on the left here) is using a browser that runs only Javascript, HTML, and CSS. That’s the frontend. 


2. It can make calls to a backend server to get results, which it then maybe processes and displays. The backend server should respond ASAP to the frontend’s requests; but the backend may need to talk to databases, third party APIs, and microservices. The backend may also produce slow jobs — such as ML jobs.  Hence to always be responsive to the FrontEnd the webserver puts the user request into a Queue, which is picked up by the worker threads and executed. This way we seperate the ML jobs from Web Server logic. 

**Now, let’s talk distributed web app architecture.**

3. In general, we want to run as many backend instances as possible, for scalability. That’s why there are bubbles coming out of ‘server’ in the diagram above; they represent ‘more of these’. So, each instance has to remain stateless: finish handling the HTTP request and exit. Don’t keep anything in memory between requests, because **a client’s first request might go to one server, and a subsequent request to another.**


4. It’s bad if we have a long running endpoint: it would tie up one of our servers (say… doing some ML task), leaving it unable to handle other users’ requests. We need to keep the web server responsive and have it hand off long running tasks, with some kind of shared persistence so that when the user checks progress or requests the result, any server can report. Also, jobs, and parts-of-jobs, should be able to be done in parallel by as many workers as there are resources for.


5. The answer is a first-in, first-out (FIFO) queue. The backend simply enqueues jobs. Workers pick and process jobs out of the queue, performing training or inference, and storing models or predictions to the database when done.

The (above) architecture (deploying an ML model) works like this:
--

1. Backend server receives a request from user’s web browser. It’s in JSON format but semantically would be something like: “Tomorrow is Wednesday and we sold 10 units today. **How many customer support calls should we expect tomorrow?”**

2. Backend pushes the job {Wednesday, 10} into a queue (some place decoupled from the backend itself). The queue replies with “Thanks, let’s refer to that as Job ID 562”.

3. Backend replies to the user: “I’ll do that calculation. It has ID 562. Please wait”. **Backend is then free to serve other users.**

4. The user’s web browser starts displaying a ‘please wait’ spinner.

5. Workers — at least, ones that are not currently processing another job — are constantly polling the queue for jobs. Probably, the workers exist on another server/computer, but they can also be different threads/processes on the same computer. Workers might have GPUs, whereas the backend server probably does not need to.

6. Eventually, a worker will pick up the job, removing it from the queue, and process it (e.g. run {Wednesday, 10} through the ML model). It’ll save the prediction to a database. Imagine this step takes 5 minutes.

7. Meanwhile, the user’s web browser is polling the backend every 30 seconds to ask if job 562 is done yet. The backend checks if the database has a result stored at id=562 and replies accordingly.

8. After five minutes plus a bit, the user polls for a result, and we are able to serve it up.

**Finally : How to implement all this stuff ?**

Flask - A web services' framework in Python:
---

Web service is a form of API only that assumes that an API is hosted over a server and can be consumed. Web API, Web Service - these terms are generally used interchangeably.

Coming to Flask, it is a web service development framework in Python. It is not the only one in Python, there couple others as well such as Django, Falcon, Hug, etc.

> Note : Flask is easiest to start delopment.

**Major Steps :**

1. pip install flask


2. According to your business problem , code your ML model. Say you want to predict weather(Regression problem) or classify mails as good or spam (Classification problem).


3. You will now save(means persist) this model. Technically speaking, you will serialize this model. In Python, you call this **Pickling.**

> **from sklearn.externals import joblib**

> **joblib.dump(lr, 'model.pickle')**

> **this saves the lr ML model to a file called model.pickle**

4. The Logistic Regression model is now persisted. You can load this model into memory with a single line of code. Loading the model back into your workspace is known as Deserialization.

> lr = joblib.load('model.pickle')

5. Now, use Flask to serve your persisted model. You will do the following two things:

**a. Load the already persisted model into memory when the application starts.**

**b. Create an API endpoint that takes input variables, transforms them into the appropriate format, and returns predictions.**


In [5]:
# More specifically, your sample input to the API will look like the following:
# assume we are building a classifier on the titantic dataset

[
    {"Age": 85, "Sex": "male", "Embarked": "S"},
    {"Age": 24, "Sex": '"female"', "Embarked": "C"},
    {"Age": 3, "Sex": "male", "Embarked": "C"},
    {"Age": 21, "Sex": "male", "Embarked": "S"}
]

[{'Age': 85, 'Sex': 'male', 'Embarked': 'S'},
 {'Age': 24, 'Sex': '"female"', 'Embarked': 'C'},
 {'Age': 3, 'Sex': 'male', 'Embarked': 'C'},
 {'Age': 21, 'Sex': 'male', 'Embarked': 'S'}]

In [6]:
# (which is a JSON list of inputs)
# and your API will output like the following:

{"prediction": [0, 1, 1, 0]}

{'prediction': [0, 1, 1, 0]}

The predictions denote the survival statuses where 0 represents No and 1 represents Yes.

**Let's write a function predict() which will do:**

In [7]:
from flask import Flask, jsonify
app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
     json_ = request.json
     query_df = pd.DataFrame(json_)
     query = pd.get_dummies(query_df)
     prediction = lr.predict(query)
     return jsonify({'prediction': list(prediction)})

**Trainer will discuss each and every step of the above code in class**

**Putting it all together:**
( Assuming that you are working Titanic dataset )

1. You loaded Titanic dataset and selected the four features.


2. You did the necessary data preprocessing.


3. You built a Logistic Regression classifier and serialized it.


4. You also serialized all the columns from training as a solution to the less than expected number of columns is to persist the list of columns from training.


5. You then wrote a simple API using Flask that would predict if a person had survived in the shipwreck given there age, sex and embarked information.


6. Let's put all the code in one place so that we undertstand better. Also, it is a good programming practice if you separate your ML model code and your Flask API code into separate .py files.

In [None]:
# model.py should look like this:

# Import dependencies
import pandas as pd
import numpy as np

# Load the dataset in a dataframe object and include only four features as mentioned
url = "TitanticTraining.csv"
df = pd.read_csv(url)
include = ['Age', 'Sex', 'Embarked', 'Survived']   # Only four features
df_ = df[include]

# Data Preprocessing
# applying One Hot encoding over all categoricals
# .. some code here for finding categoricals
df_ohe = pd.get_dummies(df_, columns=categoricals)



# Logistic Regression classifier
from sklearn.linear_model import LogisticRegression
dependent_variable = 'Survived'
x = df_ohe[df_ohe.columns.difference([dependent_variable])]
y = df_ohe[dependent_variable]
lr = LogisticRegression()
lr.fit(x, y)



# Save your model
from sklearn.externals import joblib
joblib.dump(lr, 'model.pickle')
print("Model dumped!")


# Load the model that you just saved
lr = joblib.load('model.pickle')


# Saving the data columns from training
model_columns = list(x.columns)
joblib.dump(model_columns, 'model_columns.pickle')
print("Models columns dumped!")

In [None]:
# Your api.py should look like this:

# Dependencies
from flask import Flask, request, jsonify
from sklearn.externals import joblib
import pandas as pd
import numpy as np

# Your API definition
app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    if lr:
        try:
            json_ = request.json
            print(json_)
            query = pd.get_dummies(pd.DataFrame(json_))
            query = query.reindex(columns=model_columns, fill_value=0)

            prediction = list(lr.predict(query))

            return jsonify({'prediction': str(prediction)})

        except:

            return jsonify({'Error': 'Seems input is incomplete or not in JSON'})
    else:
        print ('Train the model first')
        return ('No model here to use')

if __name__ == '__main__':
    try:
        port = int(sys.argv[1]) # This is for a command-line input
    except:
        port = 12345 # If you don't provide any port the port will be set to 12345

    lr = joblib.load("model.pickle") # Load "model.pkl"
    print ('Model loaded')
    model_columns = joblib.load("model_columns.pickle") # Load "model_columns.pkl"
    print ('Model columns loaded')

    app.run(port=port, debug=True)

Now we could test this API through **a API client called Postman**. Make sure that model.py and api.py are in the same directory and also make sure that you have compiled them both before testing.

From **POSTMAN** we would send a **POST**  request to localhost:12345
and provide the JSON data as :

In [8]:
[{'Age': 85, 'Sex': 'male', 'Embarked': 'S'},
 {'Age': 24, 'Sex': '"female"', 'Embarked': 'C'},
 {'Age': 3, 'Sex': 'male', 'Embarked': 'C'},
 {'Age': 21, 'Sex': 'male', 'Embarked': 'S'}]

[{'Age': 85, 'Sex': 'male', 'Embarked': 'S'},
 {'Age': 24, 'Sex': '"female"', 'Embarked': 'C'},
 {'Age': 3, 'Sex': 'male', 'Embarked': 'C'},
 {'Age': 21, 'Sex': 'male', 'Embarked': 'S'}]

In [9]:
# (which is a JSON list of inputs)
# and your API will outputs following:

{"prediction": [0, 1, 1, 0]}

{'prediction': [0, 1, 1, 0]}

References : ( recommended reading)
--
For Generic Overview see this blog before the Interview :
https://christophergs.github.io/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/

Trying to Deploy a simple ML model , see steps here :

https://medium.com/analytics-vidhya/how-to-deploy-simple-machine-learning-models-for-free-56cdccc62b8d


or here :

https://www.datacamp.com/community/tutorials/machine-learning-models-api-python