## **Activity 18.01**

# Train and Deploy a glass based Model Using Flask

### Import the pandas, pickle, joblib, and RandomForestClassifier packages from sklearn.ensemble, as well as train_test_split from sklearn.model_selection:

In [29]:
import pandas as pd
import joblib
import pickle
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

In [30]:
file_url = 'https://raw.githubusercontent.com/fenago/DSBook/main/Chapter%204/glass.csv'

In [31]:
df = pd.read_csv(file_url)

In [32]:
df = pd.read_csv(file_url)

###  Extract the 'type' response variable using the .pop() method and save it into a variable called y:

In [34]:
y = df.pop('Type')

### Create a list called cat_columns containing only the columns of type 'object' using the dtype attribute and print its content

In [35]:
cat_columns = [col for col in df.columns if df[col].dtype == 'object']
cat_columns

[]

### Split the df and y DataFrames into training and test sets using the train_test_split function with the parameters test_size=0.33 and random_state=8:

In [37]:
X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.33, random_state=8)

In [38]:
column_categories = {}

### Iterate through cat_columns and populate the dictionary with the column name and the list of categories using the .astype() method and the .cat.categories attribute:

In [39]:
for col in cat_columns:
  column_categories[col] = X_train[col].astype('category').cat.categories

### Save column_categories and cat_columns into files called categories_data.pkl and categorical_columns.pkl respectively using the pickle.dump() method:

In [40]:
pickle.dump(column_categories, open("categories_data.pkl", "wb"))
pickle.dump(cat_columns, open("categorical_columns.pkl", "wb"))

### Create a function called apply_categories that takes a DataFrame and a dictionary as inputs and will import CategoricalDtype from pandas.api.types, iterate through this dictionary,  and convert each column (keys) with the list of categories (values) using the .astype() method and CategoricalDtype:

In [41]:
def apply_categories(input_df, cat_dict):
  from pandas.api.types import CategoricalDtype
  for col, cat in cat_dict.items():
    input_df[col] = input_df[col].astype(CategoricalDtype(categories=cat))
  return input_df

### Apply this function on X_train and column_categories and save the result in a new DataFrame called X_train_cat. Print the data type of its columns using the .dtypes attribute:

In [42]:
X_train_cat = apply_categories(X_train, column_categories)
X_train_cat.dtypes

RI    float64
Na    float64
Mg    float64
Al    float64
Si    float64
K     float64
Ca    float64
Ba    float64
Fe    float64
dtype: object

## Perform one-hot encoding on the categorical columns using the .get_dummies() method and save the result into a new variable called X_train_final:

In [43]:
X_train_final = pd.get_dummies(X_train_cat, columns=cat_columns)

### Instantiate a RandomForestClassifier with random_state=8 and train it with the training sets using the .fit() method. Save the model into a file called model.pkl using the joblib.dump() method:

In [None]:
rf_model = RandomForestClassifier(random_state=8)
rf_model.fit(X_train_final, y_train)
joblib.dump(rf_model, "model.pkl")

### Import the socket, threading, requests, json, and numpy packages, the Flask class, and the jsonify and request functions from the flask package:

In [46]:
import socket
import threading
import requests
import json
from flask import Flask, jsonify, request
import numpy as np

### Create a new Flask app and save it into a variable called app:

In [47]:
app = Flask(__name__)

### Load the pre-trained model from the model.pkl file using joblib.load() and save it into a variable called trained_model. Load the saved dictionary from categories_data.pkl using pickle.load() and save it into a variable called var_means:

In [None]:
trained_model = joblib.load("model.pkl")
var_means = pickle.load(open("categories_data.pkl", "rb"))
cat_cols = pickle.load(open("categorical_columns.pkl", "rb"))

### Create an API endpoint for the api path that accepts only POST requests and will call a function called predict().This function will read the JSON received using the request.get_json() method, transform it into a DataFrame,  apply the apply_categories() function on it with var_means, perform one-hot encoding with .get_dummies(), predict the outcome with trained_model, convert the prediction from a numpy array to a string with array2string(), and then convert to JSON with jsonify():

In [49]:
@app.route('/api', methods=['POST']) 
def predict(): 
  data = request.get_json() 
  df_test = pd.DataFrame(data, index=[0]) 
  df_test_clean = apply_categories(df_test, var_means) 
  df_test_final = pd.get_dummies(df_test_clean, columns=cat_cols) 
  prediction = trained_model.predict(df_test_final) 
  str_pred = np.array2string(prediction) 
  return jsonify(str_pred) 

### Create a new thread for running your Flask app using the threading. Thread method with the following parameters: target=app.run, kwargs={'host':'0.0.0.0','port':80}:

In [None]:
flask_thread = threading.Thread(target=app.run, kwargs={'host':'0.0.0.0','port':80})
flask_thread.start()

In [None]:
record = X_test.iloc[0,].to_json()
record

### Create a dictionary called headers with the following key-value pairs: 'content-type': 'application/json', 'Accept-Charset': 'UTF-8'. Extract into a new variable called ip_address the IP address of the host using the socket.gethostname() and socket.gethostbyname() methods:

In [52]:
headers = {'content-type': 'application/json', 'Accept-Charset': 'UTF-8'}
ip_address = socket.gethostbyname(socket.gethostname())

In [None]:
r = requests.post(f"http://{ip_address}/api", data=record, headers=headers)
r.text