<a href="https://colab.research.google.com/github/psar0006/Reluv/blob/main/Reluv_End_to_End_Price_Estimator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Preface
This notebook serves as a demonstration on how to clean/preprocess data which can then be ready to use as training data. Additionally, the notebook will also demonstrate how to create a model and add more data to the model. Lastly will show how to export a model which can be implemented onto a website. 

# Setup
Note: The first part of the setup demonstrates how to load files onto this notebook so it is important that files are placed in the right directory (folder) and to know the file path.

In [1]:
# Firstly Mount Drive

from google.colab import drive
drive.mount('/content/gdrive')

# What this does is that it allows the user to import files that are stored on their google drive

Mounted at /content/gdrive


In [2]:
# Install ML library and dependecies

!pip install Jinja2==3.1.2
!pip install pycaret 

# Note: When installing pycaret library, there will restart runtime button, press that

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting Jinja2==3.1.2
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Installing collected packages: Jinja2
  Attempting uninstall: Jinja2
    Found existing installation: Jinja2 2.11.3
    Uninstalling Jinja2-2.11.3:
      Successfully uninstalled Jinja2-2.11.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
flask 1.1.4 requires Jinja2<3.0,>=2.10.1, but you have jinja2 3.1.2 which is incompatible.[0m
Successfully installed Jinja2-3.1.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting jinja2<3.2,>=2.11.1
  Using cached Jinja2-2.11.3-py2.py3-none-any.whl (125 kB)
Installing collected packages: jinja2
  Attempting uninstall: jinja2
    Found existing installation: Jinja2 3.1.2
    Uninstalli

In [3]:
# Next is to import essential libraries

import pandas as pd # Allows for the data to be read in and allows for data manipulation
import numpy as np # Adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import matplotlib.pyplot as plt #Used to make plots
import seaborn as sns #Used to make fancy plots

In [4]:
# Read the file

df = pd.read_excel('gdrive/Shared drives/Reluv/Products-Export-ALL-PRODUCTS.xlsx', sheet_name=0)


#Note: For this notebook I will only be using this file to create a model. The original model I have developed incorparated other datasets.

# Cleaning/Pre-processing

In [5]:
#Check column names
for column_headers in df.columns: 
    print(column_headers)

Title
Content
Short Description
SKU
Categories
Regular Price
Product Type
Product Brand
Product Colour
Product Condition
Product Fabric
Product Length
Product Occasion
Product Size
Product Sleeve Length
Parent Product ID
Stock Status
Stock
Image Filename
Unnamed: 19
Product Visibility
ID
Product categories
Unnamed: 23
Unnamed: 24


In [6]:
#Dropping unneccessary columns

df = df.drop(columns=['Content', 'Short Description', 'SKU', 'Product Type', 'Product Colour', 'Parent Product ID', 'Stock Status', 'Image Filename', 
                      'Unnamed: 19', 'Product Visibility', 'Unnamed: 23', 'Unnamed: 24','Product Fabric', 'Product Length', 'Product Size', 'Product Sleeve Length', 
                      'ID', 'Title', 'Stock', 'Product categories'])

#These columns are dropped simply because they are not used in the model I have created

In [7]:
#Rename Column Names
df = df.rename(columns={"Regular Price": "Price", "Product Brand":"Brand", "Product Condition":"Condition", "Product Occasion":"Occasion", "Categories":"Category"})

In [8]:
#Fixing typo error in condtition column
df['Condition'] = df['Condition'].replace({'New Without Tags':'Like New', 'Long':np.nan})

In [9]:
#Checking for null values

df.isnull().sum()

#Checking these values are important because it show whether any data is missing

Category       2
Price          9
Brand        242
Condition     26
Occasion     332
dtype: int64

In [10]:
#Drop null values
df = df.dropna()

#This is done so that the model has no null values 

In [11]:
#Merging Category Names
df['Category'] = df['Category'].replace({'Tops|Plus':'Tops +', 'Handbag':'Handbags', 'Shorts And Skirts':'Shorts', '#REF!':'Dresses', 'Dress':'Dresses',
                                                         'Dresses|Plus':'Dresses +', 'Pants|Clothing':'Pants', 'Jackets - Premium|Clothing':'Jackets - Premium', 'Dresses|Clothing':'Dresses', 
                                                         'Beach wear|Clothing':'Beach Wear', 'Belts|Clothing':'Belts', 'Jumpers|Clothing':'Jumpers', 'Activewear|Clothing':'Activewear', 
                                                         'Crossbody bags|Handbags':'Crossbody Bags', 'Shoulder bags|Handbags':'Shoulder Bags', 'Clutch|Handbags':'Clutch', 
                                                         'Dresses|Premium':'Dresses - Premium', 'Jeans|Premium':'Jeans - Premium', 'Tops|Premium':'Tops - Premium'})

df['Category'] = df['Category'].replace({'Tops +':'Tops', 'Pants +':'Pants', 'Dresses +':'Dresses', 'Jumpers +':'Jumpers', 'PlusTops +':'Tops', 'Jeans +':'Jeans', 'Skirts +':'Skirts'
, 'Jackets - Premium':'Jackets', 'Tops - Premium':'Tops', 'Crossbody bags':'Bags', 'Shoulder bags':'Bags', 'Jackets +':'Jackets', 'Tote bags':'Bags', 'Clutch':'Bags', 'Clutch ':'Bags',
'Dresses - Premium':'Dresses', 'Coats and Jackets +':'Outerwear', 'Shorts +':'Shorts', 'Jeans - Premium':'Jeans', 'Bucket bags':'Bags', 'Satchels':'Bags', 'Backpacks':'Bags', 'Messenger bags':'Bags',
'outerwear':'Outerwear', 'tops':'Tops'})


In [12]:
# Removing Brands that are no longer to be listed

df = df[~df['Brand'].isin(['Angel Biba', 'Best and Less', 'Big W', 'Boohoo', 'Cotton On', 'Crossroads', 'Dotti', 'Factorie', 'Harris Scarfe', 'Katies', 'K Mart', 'Millers', 'Nasty Gal',
                              'Rivers', 'Rockmans', 'Seduce', 'Shein', 'Sunny Girl', 'Supre', 'Suzanne Grae', 'Target', 'Temt'])]

In [21]:
#Fixing Brand Names and Occasion Names
df_2 = df.copy()
df_2['Brand'] = df_2['Brand'].replace({'Banna Republic':'Banana Republic'})
df_2['Brand'] = df_2['Brand'].replace({'& other stories':'& Other Stories', 'Atmos&Here':'Atmos & Here'})
df_2['Brand'] = df_2['Brand'].replace('&','and', regex = True)
df_2['Brand'] = df_2['Brand'].replace({'Maxand;Co':'Max and Co', 'Mand;S Collectio blouse':'M and S Collection', 'Honeyand;Beau':'Honey and Beau', 'Atmosand;here':'Atmos and Here',
                                               'Milkand;Honey':'Milk and Honey', 'Hand;M':'H and M'})
df_2['Brand'] = df_2['Brand'].replace('and;','and', regex = True)
df_2['Brand'] = df_2['Brand'].replace("Don'T Ask Amanda","Don't Ask Amanda")
df_2['Brand'] = df_2['Brand'].replace({"SASS and BIDE":"Sass and Bide","and Other Stories":"And Other Stories","and Standard":"And Standard", 'Jbrand': 'J Brand'})
df_2['Occasion'] = df_2['Occasion'].replace({'Cocktail &party':'Cocktail and Party'})
df_2['Occasion'] = df_2['Occasion'].replace({'Cocktail &amp;party':'Cocktail and Party'})
df_2['Brand'] = df_2['Brand'].replace({'grace and co':'Grace and Co', 'Grace and CO':'Grace and Co', 'ts':'TS', 'Addidas':'Adidas'})
df_2['Brand'] = df_2['Brand'].replace('andamp;',' and ', regex = True)
df_2['Brand']=df_2.Brand.str.title()
df_2['Occasion']=df_2.Occasion.str.title()
df_2


#Retain Brands with more than 10 items of clothing listed
df_main = df_2.groupby('Brand').filter(lambda x : len(x)>10)
df_main

Unnamed: 0,Category,Price,Brand,Condition,Occasion
0,Tops,25.0,Ts,Like New,Smart Casual
1,Tops,15.0,Ts,Like New,Smart Casual
2,Tops,17.5,Ts,Like New,Smart Casual
3,Pants,25.0,Ts,Gently Used,Smart Casual
4,Tops,10.0,Ts,Like New,Smart Casual
...,...,...,...,...,...
6630,Jumpers,20.0,Anthea Crawford,Like New,Cocktail And Party
6633,Outerwear,30.0,Anthea Crawford,Like New,Work
6634,Tops,25.0,Trenery,Like New,Casual
6636,Skirts,30.0,Alannah Hill,Like New,Cocktail And Party


# EDA (Exploratory Data Analysis)

This section helps us to understand the data more, try to find underlying assumptions and relationships between variables.

In [None]:
#Importing ploty for interactive plot
!pip install plotly
import plotly.express as px
import plotly.figure_factory as ff

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
#Plotting Price Distribution
Price = df_main['Price']
Price_list = Price.values.tolist()
hist_data = [Price_list]
group_labels = ['Price']
fig = ff.create_distplot(hist_data, group_labels)
fig.show()

In [None]:
#Plotting the average price of each brand
avg_price = df_main.groupby('Brand')['Price'].agg(np.mean)
avg_price.to_frame()
avg_price = avg_price.reset_index(level=0)
#avg_price[avg_price['Brand']=='TS']
fig = px.bar(avg_price, x='Brand', y='Price', color='Brand', text_auto='.2s', labels={'Price':'Average Price $'}, height=400)
fig.show()

In [None]:
#Plotting the average Price of each type of clothing
avg_price_category = df_main.groupby('Category')['Price'].agg(np.mean)
avg_price_category.to_frame()
avg_price_category = avg_price_category.reset_index(level=0)
fig1 = px.bar(avg_price_category, x='Category', y='Price', color='Category', text_auto='.2s', labels={'Price':'Average Price $'}, height=400)
fig1.show()

In [None]:
#Plotting the average Price of each type of Occasion
avg_price_occasion = df_main.groupby('Occasion')['Price'].agg(np.mean)
avg_price_occasion.to_frame()
avg_price_occasion = avg_price_occasion.reset_index(level=0)
fig2 = px.bar(avg_price_occasion, x='Occasion', y='Price', color='Occasion', text_auto='.2s', labels={'Price':'Average Price $'}, height=400)
fig2.show()

In [None]:
#Plotting the average Price of each type of Condition
avg_price_condition= df_main.groupby('Condition')['Price'].agg(np.mean)
avg_price_condition.to_frame()
avg_price_condition = avg_price_condition.reset_index(level=0)
fig3 = px.bar(avg_price_condition, x='Condition', y='Price', color='Condition', text_auto='.2s', labels={'Price':'Average Price $'}, height=400)
fig3.show()

# Machine Learning/Building Model

In this section there will be two parts. The first part demonstrates how to build, optimise and save a machine learning model utilising pycaret (an automated ml library). The second part, looks into how to add more data and re-train the model.

## Part 1

In [13]:
#Import PyCaret an automated ML library
from pycaret.regression import *

In [22]:
#Spilt the dataset into training and test set
df_train = df_main.sample(frac=0.8, random_state=786) # Will spilit data 80% train and 20% test
df_test = df_main.drop(df_train.index)

df_train['Price']=np.sqrt((df_train['Price'])) # Note: The price on both training and testing set has a had sqaure root transformation
df_test['Price']=np.sqrt((df_test['Price']))

df_train.reset_index(drop=True, inplace=True)
df_test.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(df_train.shape))
print('Unseen Data For Predictions ' + str(df_test.shape))

Data for Modeling: (2521, 5)
Unseen Data For Predictions (630, 5)


In [23]:
#Intialises and does further pre-processing of data
session_1 = setup(df_train, target = 'Price', session_id=1, log_experiment=False, experiment_name='Cases_1')

Unnamed: 0,Description,Value
0,session_id,1
1,Target,Price
2,Original Data,"(2521, 5)"
3,Missing Values,False
4,Numeric Features,0
5,Categorical Features,4
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(1764, 120)"


INFO:logs:create_model_container: 0
INFO:logs:master_model_container: 0
INFO:logs:display_container: 1
INFO:logs:Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=True, features_todrop=[],
                                      id_columns=[], ml_usecase='regression',
                                      numerical_features=[], target='Price',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_numerical=None,
                                numeric_strategy='...
                ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                ('cluster_all', 'passthrough'),
              

In [16]:
# Adding additional models that are not present in current library
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
from sklearn.neighbors import KNeighborsRegressor

svr_model = SVR()
NN_model = MLPRegressor()
KNN = KNeighborsRegressor()

In [24]:
#Compares all models and outlines which are the best models
best_model = compare_models()

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
br,Bayesian Ridge,0.5486,0.6057,0.7718,0.5915,0.1257,0.1125,0.039
ridge,Ridge Regression,0.549,0.6158,0.7775,0.5884,0.1256,0.1124,0.02
huber,Huber Regressor,0.5368,0.6123,0.7764,0.5868,0.1259,0.1096,0.195
rf,Random Forest Regressor,0.5589,0.6844,0.8179,0.5532,0.1332,0.1149,1.535
gbr,Gradient Boosting Regressor,0.563,0.6804,0.8169,0.5516,0.1325,0.116,0.303
omp,Orthogonal Matching Pursuit,0.6233,0.78,0.8777,0.4731,0.1431,0.1287,0.022
dt,Decision Tree Regressor,0.6074,0.8527,0.916,0.4447,0.1473,0.1246,0.037
et,Extra Trees Regressor,0.6073,0.8576,0.9196,0.4362,0.1471,0.1247,1.617
knn,K Neighbors Regressor,0.688,1.0947,1.0254,0.3142,0.1573,0.1395,0.084
lightgbm,Light Gradient Boosting Machine,0.6702,1.2938,1.1069,0.2515,0.1566,0.1331,0.144


INFO:logs:create_model_container: 17
INFO:logs:master_model_container: 17
INFO:logs:display_container: 2
INFO:logs:BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
              compute_score=False, copy_X=True, fit_intercept=True,
              lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=300,
              normalize=False, tol=0.001, verbose=False)
INFO:logs:compare_models() succesfully completed......................................


In [25]:
#Create Bayesian Ridge Model
br  = create_model('br')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.6098,0.6994,0.8363,0.7687,0.1287,0.1197
1,0.6154,0.9446,0.9719,0.6331,0.1824,0.1182
2,0.5543,0.5518,0.7428,0.202,0.1274,0.1237
3,0.5113,0.4764,0.6902,0.7432,0.1106,0.1026
4,0.5432,0.7473,0.8645,0.709,0.1238,0.1083
5,0.5607,0.6101,0.7811,0.532,0.1189,0.1127
6,0.5879,0.6914,0.8315,0.6342,0.1253,0.1153
7,0.5014,0.3963,0.6295,0.763,0.1069,0.1049
8,0.4963,0.475,0.6892,0.3857,0.1132,0.105
9,0.5057,0.4643,0.6814,0.544,0.1195,0.115


INFO:logs:create_model_container: 18
INFO:logs:master_model_container: 18
INFO:logs:display_container: 3
INFO:logs:BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, alpha_init=None,
              compute_score=False, copy_X=True, fit_intercept=True,
              lambda_1=1e-06, lambda_2=1e-06, lambda_init=None, n_iter=300,
              normalize=False, tol=0.001, verbose=False)
INFO:logs:create_model() succesfully completed......................................


In [26]:
# Create Ridge Model
ridge = create_model('ridge')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.6141,0.718,0.8473,0.7626,0.1276,0.1193
1,0.6181,0.9813,0.9906,0.6188,0.1828,0.1183
2,0.551,0.5451,0.7383,0.2116,0.1271,0.1232
3,0.5138,0.4874,0.6982,0.7372,0.1101,0.1023
4,0.5495,0.7915,0.8897,0.6918,0.1246,0.1083
5,0.5617,0.6182,0.7863,0.5258,0.1194,0.1131
6,0.5824,0.6859,0.8282,0.6371,0.1254,0.1149
7,0.4997,0.3944,0.628,0.7642,0.1069,0.1047
8,0.4927,0.4686,0.6845,0.394,0.1124,0.1044
9,0.5069,0.467,0.6834,0.5414,0.1201,0.1155


INFO:logs:create_model_container: 19
INFO:logs:master_model_container: 19
INFO:logs:display_container: 4
INFO:logs:Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=1, solver='auto', tol=0.001)
INFO:logs:create_model() succesfully completed......................................


In [27]:
# Creates Huber Model 
huber = create_model('huber')

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.5682,0.6502,0.8064,0.785,0.1255,0.1126
1,0.5998,0.9528,0.9761,0.6299,0.1839,0.1149
2,0.5429,0.5542,0.7444,0.1986,0.1268,0.1208
3,0.5196,0.5111,0.7149,0.7245,0.1133,0.103
4,0.5189,0.7499,0.866,0.708,0.1225,0.1031
5,0.5433,0.6309,0.7943,0.5161,0.1193,0.108
6,0.581,0.7222,0.8498,0.6179,0.127,0.113
7,0.4982,0.4147,0.644,0.752,0.1082,0.1036
8,0.4862,0.4622,0.6799,0.4022,0.1117,0.1022
9,0.5098,0.4743,0.6887,0.5343,0.1206,0.1151


INFO:logs:create_model_container: 20
INFO:logs:master_model_container: 20
INFO:logs:display_container: 5
INFO:logs:HuberRegressor(alpha=0.0001, epsilon=1.35, fit_intercept=True, max_iter=100,
               tol=1e-05, warm_start=False)
INFO:logs:create_model() succesfully completed......................................


In [28]:
#Optimising Bayesian Ridge Model
tuned_br = tune_model(br)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.6112,0.7033,0.8386,0.7675,0.1291,0.1201
1,0.6145,0.94,0.9695,0.6349,0.1823,0.1181
2,0.5541,0.5504,0.7419,0.2041,0.1271,0.1235
3,0.5121,0.4781,0.6915,0.7423,0.111,0.1029
4,0.5407,0.731,0.855,0.7153,0.1213,0.1071
5,0.5573,0.6057,0.7783,0.5354,0.118,0.1116
6,0.5873,0.6936,0.8328,0.633,0.1252,0.115
7,0.502,0.397,0.6301,0.7626,0.1068,0.1047
8,0.497,0.4767,0.6904,0.3835,0.1135,0.1051
9,0.5104,0.4708,0.6861,0.5377,0.121,0.1162


INFO:logs:create_model_container: 21
INFO:logs:master_model_container: 21
INFO:logs:display_container: 6
INFO:logs:BayesianRidge(alpha_1=0.001, alpha_2=0.05, alpha_init=None, compute_score=True,
              copy_X=True, fit_intercept=False, lambda_1=0.15, lambda_2=0.1,
              lambda_init=None, n_iter=300, normalize=False, tol=0.001,
              verbose=False)
INFO:logs:tune_model() succesfully completed......................................


In [29]:
#Optimising Ridge Model
tuned_ridge = tune_model(ridge)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.6124,0.7034,0.8387,0.7674,0.1281,0.1197
1,0.6148,0.955,0.9773,0.6291,0.1823,0.1178
2,0.552,0.5451,0.7383,0.2117,0.1267,0.1232
3,0.5143,0.4826,0.6947,0.7399,0.1107,0.1028
4,0.5454,0.753,0.8678,0.7068,0.1217,0.1072
5,0.5575,0.6104,0.7813,0.5318,0.1182,0.1115
6,0.5833,0.6892,0.8302,0.6354,0.1251,0.1145
7,0.4973,0.3933,0.6271,0.7648,0.1066,0.1041
8,0.4954,0.4736,0.6882,0.3875,0.1131,0.1049
9,0.511,0.4714,0.6866,0.5371,0.1212,0.1164


INFO:logs:create_model_container: 22
INFO:logs:master_model_container: 22
INFO:logs:display_container: 7
INFO:logs:Ridge(alpha=0.59, copy_X=True, fit_intercept=False, max_iter=None,
      normalize=False, random_state=1, solver='auto', tol=0.001)
INFO:logs:tune_model() succesfully completed......................................


In [30]:
#Optimising Huber Mode;
tuned_huber = tune_model(huber)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.5788,0.6578,0.8111,0.7825,0.1263,0.1146
1,0.5977,0.9288,0.9638,0.6392,0.1823,0.1148
2,0.5497,0.5614,0.7493,0.1881,0.1274,0.1225
3,0.5192,0.5018,0.7084,0.7295,0.1129,0.1037
4,0.5167,0.7159,0.8461,0.7212,0.1197,0.1026
5,0.547,0.6154,0.7845,0.528,0.118,0.1088
6,0.5758,0.6882,0.8296,0.6359,0.1246,0.1124
7,0.5027,0.4086,0.6392,0.7556,0.1077,0.1046
8,0.4845,0.4568,0.6758,0.4092,0.1112,0.1021
9,0.5104,0.4723,0.6872,0.5362,0.1209,0.1154


INFO:logs:create_model_container: 23
INFO:logs:master_model_container: 23
INFO:logs:display_container: 8
INFO:logs:HuberRegressor(alpha=0.0001, epsilon=1.9, fit_intercept=False, max_iter=100,
               tol=1e-05, warm_start=False)
INFO:logs:tune_model() succesfully completed......................................


In [31]:
# Evaluates Optimise BR Model, providing residuals, predicition error etc.
evaluate_model(tuned_br)

INFO:logs:Initializing evaluate_model()
INFO:logs:evaluate_model(estimator=BayesianRidge(alpha_1=0.001, alpha_2=0.05, alpha_init=None, compute_score=True,
              copy_X=True, fit_intercept=False, lambda_1=0.15, lambda_2=0.1,
              lambda_init=None, n_iter=300, normalize=False, tol=0.001,
              verbose=False), fold=None, fit_kwargs=None, plot_kwargs=None, feature_name=None, groups=None, use_train_data=False)


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

In [32]:
# Evaluates Optimised Ridge Model, providing residuals, predicition error etc.
evaluate_model(tuned_ridge)

INFO:logs:Initializing evaluate_model()
INFO:logs:evaluate_model(estimator=Ridge(alpha=0.59, copy_X=True, fit_intercept=False, max_iter=None,
      normalize=False, random_state=1, solver='auto', tol=0.001), fold=None, fit_kwargs=None, plot_kwargs=None, feature_name=None, groups=None, use_train_data=False)


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

In [33]:
# Evaluates Optimised Huber Model, providing residuals, predicition error etc.
evaluate_model(tuned_huber)

INFO:logs:Initializing evaluate_model()
INFO:logs:evaluate_model(estimator=HuberRegressor(alpha=0.0001, epsilon=1.9, fit_intercept=False, max_iter=100,
               tol=1e-05, warm_start=False), fold=None, fit_kwargs=None, plot_kwargs=None, feature_name=None, groups=None, use_train_data=False)


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

In [34]:
#This creates a voting regressor model, this model is to be used for the price predictor
blend = blend_models(estimator_list = [tuned_br, tuned_ridge, tuned_huber])

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.6,0.6785,0.8237,0.7756,0.1275,0.118
1,0.6088,0.937,0.968,0.6361,0.1821,0.1168
2,0.5515,0.5503,0.7418,0.2042,0.1269,0.123
3,0.5146,0.482,0.6943,0.7402,0.1111,0.103
4,0.534,0.7292,0.8539,0.716,0.1207,0.1056
5,0.5511,0.6075,0.7794,0.534,0.1178,0.1101
6,0.5815,0.6861,0.8283,0.637,0.1247,0.1138
7,0.5001,0.394,0.6277,0.7644,0.1066,0.1044
8,0.4903,0.4664,0.6829,0.3968,0.1123,0.1037
9,0.5094,0.4692,0.685,0.5392,0.1208,0.1158


INFO:logs:create_model_container: 24
INFO:logs:master_model_container: 24
INFO:logs:display_container: 9
INFO:logs:VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegresso

In [35]:
# provides metrics into how the blended model performs
pred = predict_model(blend, data=df_test);


INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
             

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Voting Regressor,0.4959,0.4552,0.6747,0.7297,0.114,0.1071


In [36]:
# Shows the actual price vs estimated price
pred['Price']=np.square((pred['Price'])) # Note: The price on both training and testing set has a had sqaure root transformation
pred['Label']=np.square((pred['Label']))
pred

Unnamed: 0,Category,Price,Brand,Condition,Occasion,Label
0,Pants,25.0,Ts,Gently Used,Smart Casual,23.495079
1,Tops,15.0,Ts,Like New,Smart Casual,20.661192
2,Tops,15.0,Ts,Like New,Smart Casual,20.661192
3,Jumpers,20.0,Zara,Like New,Smart Casual,20.529598
4,Jeans,25.0,Bettina Liano,Gently Used,Casual,21.779738
...,...,...,...,...,...,...
625,Tops,20.0,Trenery,Like New,Casual,25.734209
626,Tops,40.0,Lululemon,Like New,Casual,22.528701
627,Tops,10.0,Decjuba,Like New,Casual,19.794023
628,Jumpers,20.0,Anthea Crawford,Like New,Cocktail And Party,55.502893


In [37]:
#finalizes the model
final_blend = finalize_model(blend)

INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
           

In [38]:
#Saves the model pipeline which can now be ready to be used for deployment
save_model(final_blend,'Base_Model_Pipeline')

INFO:logs:Initializing save_model()
INFO:logs:save_model(model=VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
                       

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[], ml_usecase='regression',
                                       numerical_features=[], target='Price',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_strategy='...
                                                             lambda_init=None,
                                                             n_iter=300,
                                                             normalize=False,
                                                             tol=0.001,
           

In [39]:
# Saves the model so that it can be later re-trained
import pickle
filename = 'Base_Model'
pickle.dump(final_blend, open(filename, 'wb'))

## Part 2

### Pre-Processing

In [40]:
# New data has been provided
stock_database_April = pd.read_excel('gdrive/My Drive/Reluv Project/Stock Database April 22.xlsx', sheet_name=0)
stock_database_May = pd.read_excel('gdrive/My Drive/Reluv Project/Stock Database May 22.xlsx', sheet_name=0)
stock_database_July = pd.read_excel('gdrive/My Drive/Reluv Project/Stock Database July 22.xlsx', sheet_name=0)
stock_database_April_Shoes = pd.read_excel('gdrive/My Drive/Reluv Project/Stock Database_SHOES_ APRIL-22.xlsx', sheet_name=0)
# Merges all the new databases
stock_database = pd.concat([stock_database_April, stock_database_May, stock_database_July])


In [None]:
#Check column names
for column_headers in stock_database.columns: 
    print(column_headers)

Product Code
Seller  
Bin 
Title
SECTION
Sale Price
Commission [%]
Cost 
Category
Style
Brand
Size
Colour
Condition
Length
Occasion
Bust (cm)
Length (cm)
Waist (cm)
Rise (cm)
Thigh diam. (cm)
Long Description
Season
Materials
Material (shell) 
Material (inner)
Condition.1
Tags, separate by comma
Tags
Concatenate Tags (automated don’t touch)
Unnamed: 30
Unnamed: 31
Unnamed: 32
Unnamed: 33
Unnamed: 34
Unnamed: 35
Unnamed: 36
Unnamed: 37
Unnamed: 38
Unnamed: 39
Unnamed: 40
Unnamed: 41
Unnamed: 42
Unnamed: 43
Unnamed: 44
Unnamed: 45
Unnamed: 46
Unnamed: 47
Unnamed: 48
Unnamed: 49
Unnamed: 50
Unnamed: 51
Unnamed: 52
Unnamed: 53
Images
Unnamed: 55
Unnamed: 56


In [41]:
# Keep columns that are important
stock_database = stock_database[['Sale Price', 'Category', 'Condition', 'Occasion', 'Brand']]

In [42]:
# Check Null values of new data
stock_database.isnull().sum()

Sale Price    854
Category      853
Condition     855
Occasion      856
Brand         892
dtype: int64

In [43]:
#Drop null values
stock_database = stock_database.dropna()

#This is done so that the model has no null values 

In [44]:
#Rename Column Variables
stock_database = stock_database.rename(columns={"Sale Price": "Price"})
# Merge df and stock_database
merged_df = pd.concat([df_2, stock_database]) # Note: df_2 was established earlier

In [45]:
merged_df['Occasion'].unique()


array(['Smart Casual', 'Casual', 'Summer Dress', 'Cocktail And Party',
       'Little Black Dresses', 'Work', 'Formal', 'Casual|Work', 'Mid',
       'Short', 'Long', 'Cocktail &party', 'work', 'casual',
       'Summer Dress '], dtype=object)

In [46]:
merged_df


Unnamed: 0,Category,Price,Brand,Condition,Occasion
0,Tops,25.0,Ts,Like New,Smart Casual
1,Tops,15.0,Ts,Like New,Smart Casual
2,Tops,17.5,Ts,Like New,Smart Casual
3,Pants,25.0,Ts,Gently Used,Smart Casual
4,Tops,10.0,Ts,Like New,Smart Casual
...,...,...,...,...,...
239,Dresses,25.0,ts,Like New,casual
240,Dresses,30.0,ts,Like New,Cocktail &party
241,Shorts,25.0,ts,Like New,casual
242,Skirts,25.0,virtuelle,Like New,casual


In [148]:
#Merging Category Names
merged_df['Category'] = merged_df['Category'].replace({'Tops|Plus':'Tops +', 'Handbag':'Handbags', 'Shorts And Skirts':'Shorts', '#REF!':'Dresses', 'Dress':'Dresses',
                                                         'Dresses|Plus':'Dresses +', 'Pants|Clothing':'Pants', 'Jackets - Premium|Clothing':'Jackets - Premium', 'Dresses|Clothing':'Dresses', 
                                                         'Beach wear|Clothing':'Beach Wear', 'Belts|Clothing':'Belts', 'Jumpers|Clothing':'Jumpers', 'Activewear|Clothing':'Activewear', 
                                                         'Crossbody bags|Handbags':'Crossbody Bags', 'Shoulder bags|Handbags':'Shoulder Bags', 'Clutch|Handbags':'Clutch', 
                                                         'Dresses|Premium':'Dresses - Premium', 'Jeans|Premium':'Jeans - Premium', 'Tops|Premium':'Tops - Premium'})

merged_df['Category'] = merged_df['Category'].replace({'Tops +':'Tops', 'Pants +':'Pants', 'Dresses +':'Dresses', 'Jumpers +':'Jumpers', 'PlusTops +':'Tops', 'Jeans +':'Jeans', 'Skirts +':'Skirts'
, 'Jackets - Premium':'Jackets', 'Tops - Premium':'Tops', 'Crossbody bags':'Bags', 'Shoulder bags':'Bags', 'Jackets +':'Jackets', 'Tote bags':'Bags', 'Clutch':'Bags', 'Clutch ':'Bags',
'Dresses - Premium':'Dresses', 'Coats and Jackets +':'Outerwear', 'Shorts +':'Shorts', 'Jeans - Premium':'Jeans', 'Bucket bags':'Bags', 'Satchels':'Bags', 'Backpacks':'Bags', 'Messenger bags':'Bags',
'outerwear':'Outerwear', 'tops':'Tops', 'Shorts and skirts':'Shorts', 'Crossbody Bags':'Bags', 'Handbags':'Bags'})


merged_df['Occasion'] = merged_df['Occasion'].replace({'casual':'Casual', 'work':'Work', 'Mid':'Casual', 'Long':'Casual', 'Casual|Work':'Smart Casual', 'Cocktail &party':'Cocktail And Party', 
                                                       'Short':'Casual', 'Summer Dress ':'Summer Dress'}) 

In [149]:
# Removing Brands that are no longer to be listed

merged_df = merged_df[~merged_df['Brand'].isin(['Angel Biba', 'Best and Less', 'Big W', 'Boohoo', 'Cotton On', 'Crossroads', 'Dotti', 'Factorie', 'Harris Scarfe', 'Katies', 'K Mart', 'Millers', 'Nasty Gal',
                              'Rivers', 'Rockmans', 'Seduce', 'Shein', 'Sunny Girl', 'Supre', 'Suzanne Grae', 'Target', 'Temt'])]

In [150]:
merged_df['Brand'] = merged_df['Brand'].replace({'Banna Republic':'Banana Republic'})
merged_df['Brand'] = merged_df['Brand'].replace({'& other stories':'& Other Stories', 'Atmos&Here':'Atmos & Here'})
merged_df['Brand'] = merged_df['Brand'].replace('&','and', regex = True)
merged_df['Brand'] = merged_df['Brand'].replace({'Maxand;Co':'Max and Co', 'Mand;S Collectio blouse':'M and S Collection', 'Honeyand;Beau':'Honey and Beau', 'Atmosand;here':'Atmos and Here',
                                               'Milkand;Honey':'Milk and Honey', 'Hand;M':'H and M'})
merged_df['Brand'] = merged_df['Brand'].replace('and;','and', regex = True)
merged_df['Brand'] = merged_df['Brand'].replace("Don'T Ask Amanda","Don't Ask Amanda")
merged_df['Brand'] = merged_df['Brand'].replace({"SASS and BIDE":"Sass and Bide","and Other Stories":"And Other Stories","and Standard":"And Standard", 'Jbrand': 'J Brand'})
merged_df['Occasion'] = merged_df['Occasion'].replace({'Cocktail &party':'Cocktail and Party'})
merged_df['Occasion'] = merged_df['Occasion'].replace({'Cocktail &amp;party':'Cocktail and Party'})
merged_df['Brand'] = merged_df['Brand'].replace({'grace and co':'Grace and Co', 'Grace and CO':'Grace and Co', 'ts':'TS', 'Addidas':'Adidas'})
merged_df['Brand'] = merged_df['Brand'].replace('andamp;',' and ', regex = True)
merged_df['Brand']=merged_df.Brand.str.title()
merged_df['Occasion']=merged_df.Occasion.str.title()
merged_df


#Retain Brands with more than 10 items of clothing listed
df_main_merged = merged_df.groupby('Brand').filter(lambda x : len(x)>10)

### Re-training Model

In [151]:
# Loading model
import pickle
loaded_model = pickle.load(open(filename, 'rb'))

In [152]:
#Spilt the new merged dataset into training and test set
df_train_new = df_main_merged.sample(frac=0.8, random_state=786) # Will spilit data 80% train and 20% test
df_test_new = df_main_merged.drop(df_train_new.index)

df_train_new['Price']=np.sqrt((df_train_new['Price'])) # Note: The price on both training and testing set has a had sqaure root transformation
df_test_new['Price']=np.sqrt((df_test_new['Price']))

df_train_new.reset_index(drop=True, inplace=True)
df_test_new.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(df_train_new.shape))
print('Unseen Data For Predictions ' + str(df_test_new.shape))

Data for Modeling: (3126, 5)
Unseen Data For Predictions (675, 5)


In [153]:
session_2 = setup(df_train_new, target = 'Price', session_id=2, log_experiment=False, experiment_name='Cases_1')

Unnamed: 0,Description,Value
0,session_id,2
1,Target,Price
2,Original Data,"(3126, 5)"
3,Missing Values,False
4,Numeric Features,0
5,Categorical Features,4
6,Ordinal Features,False
7,High Cardinality Features,False
8,High Cardinality Method,
9,Transformed Train Set,"(2188, 138)"


INFO:logs:create_model_container: 0
INFO:logs:master_model_container: 0
INFO:logs:display_container: 1
INFO:logs:Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=True, features_todrop=[],
                                      id_columns=[], ml_usecase='regression',
                                      numerical_features=[], target='Price',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_numerical=None,
                                numeric_strategy='...
                ('scaling', 'passthrough'), ('P_transform', 'passthrough'),
                ('binn', 'passthrough'), ('rem_outliers', 'passthrough'),
                ('cluster_all', 'passthrough'),
              

In [154]:
loaded_model


VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
                                            fit_intercept=False, max_iter=100,
       

In [155]:
new_trained = create_model(loaded_model)

Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.5491,0.643,0.8019,0.7638,0.1152,0.107
1,0.5971,0.8416,0.9174,0.5671,0.1296,0.1232
2,0.5861,0.7468,0.8642,0.6419,0.1364,0.1196
3,0.5632,0.8205,0.9058,0.4285,0.1339,0.1086
4,0.5195,0.5117,0.7154,0.6989,0.1179,0.1095
5,0.4762,0.3995,0.632,0.7455,0.1059,0.1001
6,0.5075,0.4625,0.6801,0.6904,0.115,0.1063
7,0.4685,0.4152,0.6444,0.6992,0.1067,0.0976
8,0.6154,1.0693,1.0341,0.4254,0.1417,0.1238
9,0.5323,0.4912,0.7009,0.5499,0.1173,0.1158


INFO:logs:create_model_container: 1
INFO:logs:master_model_container: 1
INFO:logs:display_container: 2
INFO:logs:VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(

In [156]:
# See predictions of re-trained model against new test dataset
predictions = predict_model(new_trained, data=df_test_new)

INFO:logs:Initializing predict_model()
INFO:logs:predict_model(estimator=VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
             

Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE
0,Voting Regressor,0.5137,0.4813,0.6937,0.5928,0.1136,0.1075


In [157]:
predictions

Unnamed: 0,Category,Price,Brand,Condition,Occasion,Label
0,Jumpers,4.472136,Zara,Like New,Smart Casual,4.728873
1,Tops,3.162278,Just Jeans,Like New,Smart Casual,3.892606
2,Tops,3.535534,Sportsgirl,Gently Used,Casual,3.413125
3,Dresses,3.872983,Zara,Gently Used,Summer Dress,4.534515
4,Pants,5.196152,Zara,New with Tags,Work,4.952230
...,...,...,...,...,...,...
670,Jeans,5.477226,Country Road,Like New,Casual,5.107291
671,Pants,5.000000,Blue Illusion,Like New,Casual,5.414536
672,Pants,6.324555,Vigorella,Like New,Casual,5.515553
673,Dresses,6.324555,Vigorella,Like New,Casual,5.613494


In [158]:
#finalizes the new retrain model
final_retrain = finalize_model(new_trained)

INFO:logs:Initializing finalize_model()
INFO:logs:finalize_model(estimator=VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
           

In [159]:
#Saves the model pipeline which can now be ready to be used for deployment
save_model(final_retrain,'Retrain_Model_Pipeline')

INFO:logs:Initializing save_model()
INFO:logs:save_model(model=VotingRegressor(estimators=[('br',
                             BayesianRidge(alpha_1=0.001, alpha_2=0.05,
                                           alpha_init=None, compute_score=True,
                                           copy_X=True, fit_intercept=False,
                                           lambda_1=0.15, lambda_2=0.1,
                                           lambda_init=None, n_iter=300,
                                           normalize=False, tol=0.001,
                                           verbose=False)),
                            ('ridge',
                             Ridge(alpha=0.59, copy_X=True, fit_intercept=False,
                                   max_iter=None, normalize=False,
                                   random_state=1, solver='auto', tol=0.001)),
                            ('huber',
                             HuberRegressor(alpha=0.0001, epsilon=1.9,
                       

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[], ml_usecase='regression',
                                       numerical_features=[], target='Price',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_strategy='...
                                                             lambda_init=None,
                                                             n_iter=300,
                                                             normalize=False,
                                                             tol=0.001,
           

In [160]:
# Saves the model so that it can be later re-trained
import pickle
pickle.dump(final_retrain, open('Retrain_Model', 'wb'))

# Web-Development

In [161]:
preds = predictions.copy()
preds['Diff'] = abs(np.square(preds['Price']) - np.square(preds['Label']))
preds['Diff'].mean()

5.231893431226705

In [162]:
# Export database
df_main_merged.to_csv('sample_data.csv')

In [163]:
df_main.to_csv('main_data.csv')

In [164]:
!pip install -q streamlit

In [165]:
%%writefile reluv.py
from pycaret.regression import load_model, predict_model
import streamlit as st
import pickle
import numpy as np
import pandas as pd
from PIL import Image

def payout_percentage(x):
  if x <= 20:
    return x
  elif x > 20 and x <= 50:
    return round(float((x/(0.95+(0.024*(x-20))))),1)
  elif x > 50 and x <= 80:
    return round(float((x/(1.61+(0.013*(x-50))))),1)
  elif x > 80 and x <= 100:
    return round(float((x/(1.95+(0.0025*(x-100))))),1)
  else:
    return 60  


database = pd.read_csv('/content/sample_data.csv')
model = load_model('Retrain_Model_Pipeline')
d = {'Reluv Listed Price': ['$0.00-$20.00', '$20.01-$50.00', '$50.01-$80.00', '$80.01-$100', '$100+'], 'Payout': ['5-20%', '21-30%', '31-40%', '41-50%', '60%']}
table = pd.DataFrame(data=d)

def predict(model, input_df):
    predictions_df = predict_model(estimator=model, data=input_df)
    predictions = predictions_df['Label'][0]
    return predictions
image = Image.open('/content/Reluv-logo1-resized.png')
url = "https://reluv.com.au/learn-more/brands-we-do-not-accept/"

st.image(image, use_column_width =False)
st.header("Reluv Payout Estimator: ")
st.markdown("Type in or select from dropdowns and click the estimate button")
st.markdown("Note: There are a number of brands we cannot resell. Please check out the list of brands we currently do not accept [here](%s)" % url)
#Now we will take user input one by one as per our dataframe
#Brand
Brand = st.selectbox('Brand', database['Brand'].sort_values().unique())
#Type of clothing
Category = st.selectbox("Category", database['Category'].sort_values().unique())
#Condtition
condition = st.selectbox("Condition", database['Condition'].sort_values().unique())
#Occasion
occasion = st.selectbox("Occasion", database['Occasion'].sort_values().unique())
#Prediction
st.markdown("Disclaimer: The amount displayed is an estimate only. Please be aware that the item will be priced after physical inspection and at Reluv's discretion.")
user_inputs = {'Brand': Brand, 'Category': Category,
            'Condition': condition, 'Occasion': occasion
            }
user_inputs_df = pd.DataFrame([user_inputs])

hide_table_row_index = """
            <style>
            thead tr th:first-child {display:none}
            tbody th {display:none}
            </style>
            """

# Inject CSS with Markdown
st.markdown(hide_table_row_index, unsafe_allow_html=True)

# Display a static table
st.table(table)


if st.button('Estimate Price'):
  prediction = predict(model, user_inputs_df)
  true_val = np.square(prediction)
  lower_val = round(true_val - 2.5)
  upper_val = round(true_val + 2.5)
  payout_per = payout_percentage(true_val)
  upper_payout_val = str(round(float(((payout_per/100)*upper_val)),2))
  lower_payout_val = str(round(float(((payout_per/100)*lower_val)),2))
  money = str(true_val)
  upper_money = str(upper_val)
  lower_money = str(lower_val)
  st.subheader("Estimated Price Range: " + "\$"+ lower_money + " - " + "\$"+ upper_money)
  st.subheader("Approximate payout: " + "\$" +lower_payout_val + ' - '+ "\$" + upper_payout_val)

Overwriting reluv.py


In [166]:
!streamlit run reluv.py & npx localtunnel --port 8501

2022-10-14 06:14:33.486 INFO    numexpr.utils: NumExpr defaulting to 2 threads.
[K[?25hnpx: installed 22 in 2.966s
your url is: https://honest-dancers-lie-34-133-7-100.loca.lt

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.2:8501[0m
[34m  External URL: [0m[1mhttp://34.133.7.100:8501[0m
[0m
[34m  Stopping...[0m
^C


# Deploying Web App

In [167]:
%%shell
mkdir -p ~/.streamlit/
echo “\
[server]\n\
headless = true\n\
port = $PORT\n\
enableCORS = false\n\
\n\
“ > ~/.streamlit/config.toml



In [169]:
web: sh setup.sh && streamlit run reluv.py

SyntaxError: ignored