## Import required Libraries

In [1]:
pip install kmodes

Collecting kmodes
  Downloading kmodes-0.11.1-py2.py3-none-any.whl (19 kB)
Installing collected packages: kmodes
Successfully installed kmodes-0.11.1


In [10]:
import pickle
from kmodes.kprototypes import KPrototypes
import pandas as pd
import pickle

## Load and review  test data

In [5]:
X=pd.read_csv("test.csv")

### Data Preprocessing starts

In [6]:
#Remove following columns
X.drop(labels=['User_ID','Product_ID','Product_Category_2','Product_Category_3'], axis=1,inplace=True)


In [7]:
#Divide into labels and features
X.head()

Unnamed: 0,Gender,Age,Occupation,City_Category,Stay_In_Current_City_Years,Marital_Status,Product_Category_1
0,M,46-50,7,B,2,1,1
1,M,26-35,17,C,0,0,3
2,F,36-45,1,B,4+,1,5
3,F,36-45,1,B,4+,1,4
4,F,26-35,1,C,1,0,4


In [8]:
#Assigning numerical values to Age Categories
dic_to_replace = {"Age": {"0-17": 1, "18-25": 2,"26-35": 3,"36-45": 4,"46-50":5,"51-55":6,"55+": 7},
                  "Stay_In_Current_City_Years":{"4+":5}}
X.replace(dic_to_replace, inplace=True)

In [9]:
X['Stay_In_Current_City_Years']=X['Stay_In_Current_City_Years'].astype('int64')
#only run below for clustering
X['Marital_Status']=X['Marital_Status'].astype('object')
X['Occupation']=X['Occupation'].astype('object')
X['Product_Category_1']=X['Product_Category_1'].astype('object')

### Data Preprocessing ends

### Determine cluster No of each data and then using the respective model for prediction

In [11]:
loaded_model = pickle.load(open('clusteringModel.sav', 'rb'))
clusters_label = loaded_model.predict(X, categorical=[0, 2, 3,5, 6])

In [12]:
X['clusterNo']=clusters_label
for col in ['Gender','Marital_Status','City_Category','Occupation','Product_Category_1']:
        X=pd.get_dummies(X, columns=[col], prefix=[col], drop_first=True)
X.head()

Unnamed: 0,Age,Stay_In_Current_City_Years,clusterNo,Gender_M,Marital_Status_1,City_Category_B,City_Category_C,Occupation_1,Occupation_2,Occupation_3,Occupation_4,Occupation_5,Occupation_6,Occupation_7,Occupation_8,Occupation_9,Occupation_10,Occupation_11,Occupation_12,Occupation_13,Occupation_14,Occupation_15,Occupation_16,Occupation_17,Occupation_18,Occupation_19,Occupation_20,Product_Category_1_2,Product_Category_1_3,Product_Category_1_4,Product_Category_1_5,Product_Category_1_6,Product_Category_1_7,Product_Category_1_8,Product_Category_1_9,Product_Category_1_10,Product_Category_1_11,Product_Category_1_12,Product_Category_1_13,Product_Category_1_14,Product_Category_1_15,Product_Category_1_16,Product_Category_1_17,Product_Category_1_18
0,5,2,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,3,0,2,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,4,5,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,4,5,1,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,3,1,2,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [13]:
#Our test data does not contain categories such as 19 and 20 ,So explicitly adding this category and assigning its value as 0
X['Product_Category_1_19']=0
X['Product_Category_1_20']=0  

### Loading model for each cluster and using them for prediction

In [14]:
#Prediction for data points belonging to cluster 0
cluster0=X[X['clusterNo']==0]
cluster0=cluster0.drop(['clusterNo'],axis=1)
model_cluster0 = pickle.load(open('model_cluster0.sav','rb'))
pred_cluster0=model_cluster0.predict(cluster0)
print("Predicted Values are : ")
pd.DataFrame(pred_cluster0,columns=["Purchase"]).head()

Predicted Values are : 


Unnamed: 0,Purchase
0,16358.629551
1,11725.354842
2,13060.377765
3,11725.354842
4,15293.746316


In [15]:
#Prediction for data points belonging to cluster 1
cluster1=X[X['clusterNo']==1]
cluster1=cluster1.drop(['clusterNo'],axis=1)
model_cluster1 = pickle.load(open('model_cluster1.sav','rb'))
pred_cluster1=model_cluster1.predict(cluster1)
print("Predicted Values are : ")
pd.DataFrame(pred_cluster1,columns=["Purchase"]).head()

Predicted Values are : 


Unnamed: 0,Purchase
0,7269.301781
1,2612.524831
2,5750.47045
3,12089.469671
4,11103.859964


In [17]:
#Prediction for data points belonging to cluster 2
cluster2=X[X['clusterNo']==2]
cluster2=cluster2.drop(['clusterNo'],axis=1)
model_cluster2 = pickle.load(open('model_cluster2.sav','rb'))
pred_cluster2=model_cluster2.predict(cluster2)
print("Predicted Values are : ")
pd.DataFrame(pred_cluster2,columns=["Purchase"]).head()

Predicted Values are : 


Unnamed: 0,Purchase
0,10183.97087
1,2638.525531
2,16255.805711
3,13230.849107
4,5440.034011


### Cloud Deployment (AWS Elastic Beanstalk)

Once the training is completed, we need to expose the trained model as an API for the user to consume it. For prediction, the saved model is loaded first and then the predictions are made using it. The same app is deployed to the cloud platform. The 

###### Flask App

As we’ll expose the created model as a web Application  to be consumed by the client, we’d do it using the flask framework. 


Create the project structure, as shown below:
<img src="flask.PNG" width= "1000">

#### Deployment to AWS Elastic Beanstalk


Web UI  where user can input the feature values.

Application URL is :http://blackfridaypurchaseprediction-env.eba-qxmu82rp.us-east-1.elasticbeanstalk.com/

<img src="Inputs.PNG" width= "1000">

Prediction of above inputs:

<img src="Output.PNG" width= "1000">