# Machine Learning - Deployment


## Question 1 - Install Pipenv

What's the version of pipenv you installed?
Use --version to find out

In [1]:
!pipenv --version

[1mpipenv[0m, version 2023.10.3


## Question 2 - Use Pipenv to install Scikit-Learn version 1.3.1

What's the first hash for scikit-learn you get in Pipfile.lock?

- sha256:0c275a06c5190c5ce00af0acbb61c06374087949f643ef32d355ece12c4db043

## Get the dictionary vectrorizer and a logistic regression model


In [5]:
!PREFIX=https://raw.githubusercontent.com/DataTalksClub/machine-learning-zoomcamp/master/cohorts/2023/05-deployment/homework
!wget $PREFIX/model1.bin
!wget $PREFIX/dv.bin

/model1.bin: Scheme missing.
/dv.bin: Scheme missing.


## Question 3 - Let's use these models!

- Write a script for loading these models with pickle
- Score this client:

```
{"job": "retired", "duration": 445, "poutcome": "success"}
```

What's the probability that this client will get a credit?

- 0.162
- 0.392
- 0.652
- 0.902

In [28]:
import pickle

# Load the DictVectorizer
with open('dv.bin', 'rb') as f:
    dv = pickle.load(f)

# Load the Logistic Regression model
with open('model1.bin', 'rb') as f:
    model = pickle.load(f)

# New data for prediction
new_data = {"job": "retired", "duration": 445, "poutcome": "success"}

# Transform the new data using the DictVectorizer
transformed_data = dv.transform([new_data])

# Predict using the loaded model
probabilities = model.predict_proba(transformed_data)

# break down the probabilities into yes or no score
no_score, yes_score = probabilities[0]

# get the class labels from the model
no_label, yes_label = model.classes_

for i, probs in enumerate(probabilities):
    print(f"Prediction {i+1}:")
    for label, prob in zip(model.classes_, probs):
        print(f"Class {label}: Probability {prob:.3f}")

# Print the probability scores for each class
print(f'Probability the client would get a credit: {yes_label} - {yes_score:.3f}   {no_label} - {no_score:.3f}')

Prediction 1:
Class no: Probability 0.098
Class yes: Probability 0.902
Probability the client would get a credit: yes - 0.902   no - 0.098


https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations


## Question 4 - Now let's serve this model as a web service

- Install Flask and gunicorn (or waitress, if you're on Windows)
- Write Flask code for serving the model
- Now score this client using requests:

```python 
url = "YOUR_URL"
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
requests.post(url, json=client).json()
```
What's the probability that this client will get a credit?

- 0.140
- 0.440
- 0.645
- 0.845

## Run pipenv shell

- Activate the pipenv shell in the working directory

```bash
cd your_project_directory
pipenv shell

```

- Install Flask and Gunicorn

```bash
pipenv install flask gunicorn
```

- Make the API request to make the prediction


#### API Code

```python
# app.py - Your Flask application

from flask import Flask, request, jsonify
import pickle

app = Flask(__name__)

# Load the DictVectorizer
with open('dv.bin', 'rb') as f:
    dv = pickle.load(f)

file_path = 'model1.bin' if os.path.exists('model1.bin') else 'model2.bin' 

# Load the Logistic Regression model
with open(file_path, 'rb') as f:
    model = pickle.load(f)

# Define the prediction endpoint and route
@app.route('/predict', methods=['POST'])
def predict():
    # get the json payload
    data = request.get_json()
    print("data",data)
    
    # Transform the new data using the DictVectorizer
    transformed_data = dv.transform([data])

    # Process the data, make predictions using the model, and return the results
    probabilities = model.predict_proba(transformed_data)

    # break down the probabilities into yes or no score
    no_score, yes_score = probabilities[0]

    # get the class labels from the model
    no_label, yes_label = model.classes_    
    return jsonify({'yes': yes_score, 'no': no_score})
    
# load the app
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8000)
```

In [43]:
# Make the request to make the prediction
import requests

url = 'http://0.0.0.0:8000/predict'
client = {"job": "unknown", "duration": 270, "poutcome": "failure"}
response = requests.post(url, json=client)

# Check the response status code
if response.status_code == 200:
    # If the response status is 200 (OK), print the JSON response
    json_response = response.json()
    print(f"JSON Response: {json_response}")    
else:
    # If the response status is not 200, print an error message
    print("Error:", response.status_code)


JSON Response: {'no': 0.8603105294764318, 'yes': 0.13968947052356817}


## Question 5 - Docker Image

- Download the base image svizor/zoomcamp-model:3.10.12-slim. You can easily make it by using docker pull command.

So what's the size of this base image?

47 MB
147 MB
374 MB
574 MB

You can get this information when running docker images - it'll be in the "SIZE" column.

```
svizor/zoomcamp-model       3.10.12-slim   08266c8f0c4b   4 days ago      147MB
```

### Dockerfile

Now create your own Dockerfile based on the image we prepared.

It should start like that:

```python
FROM svizor/zoomcamp-model:3.10.12-slim
# add your stuff here
```

Now complete it:

- Install all the dependencies form the Pipenv file
- Copy your Flask script
- Run it with Gunicorn

After that, you can build your docker image.


## Question 6 Add the solution on a Docker Container
Let's run your docker container!

After running it, score this client once again:

```python 

url = "YOUR_URL"
client = {"job": "retired", "duration": 445, "poutcome": "success"}
requests.post(url, json=client).json()
```
What's the probability that this client will get a credit now?

- 0.168
- 0.530
- 0.730
- 0.968

### Docker file code

```bash
# Use the base image
FROM svizor/zoomcamp-model:3.10.12-slim

# Set the working directory
WORKDIR /app

# Copy the Pipenv files to the container
COPY Pipfile Pipfile.lock /app/

# Install pipenv and dependencies
RUN pip install pipenv
RUN pipenv install --system --deploy

# Copy the Flask script to the container
COPY app.py /app/

# Expose the port your Flask app runs on
EXPOSE 8000

# Run the Flask app with Gunicorn
CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:8000", "--workers", "4"]

```

In [45]:
# run the API from a docker container

client =  {"job": "retired", "duration": 445, "poutcome": "success"}
response = requests.post(url, json=client)

# Check the response status code
if response.status_code == 200:
    # If the response status is 200 (OK), print the JSON response
    json_response = response.json()
    print(f"JSON Response: {json_response}")    
else:
    # If the response status is not 200, print an error message
    print("Error:", response.status_code)


JSON Response: {'no': 0.27306305364457695, 'yes': 0.726936946355423}
