# 12302 ; Harshvardhan Joshi

# Problem Statement: 
Your client is a meal delivery company which operates in multiple cities. They have various fulfillment centers in these cities for dispatching meal orders to their customers. The client wants you to help these centers with demand forecasting for upcoming weeks so that these centers will plan the stock of raw materials accordingly.

The replenishment of majority of raw materials is done on weekly basis and since the raw material is perishable, the procurement planning is of utmost importance. Secondly, staffing of the centers is also one area wherein accurate demand forecasts are really helpful. Given the following information, the task is to predict the demand for the next 10 weeks (Weeks: 146-155) for the center-meal combinations in the test set:

Historical data of demand for a product-center combination (Weeks: 1 to 145) Product(Meal) features such as category, sub-category, current price and discount Information for fulfillment center like center area, city information etc.

In [48]:
import pandas as pd 
import numpy as np 

import os 
import matplotlib.pyplot as plt 
# %matplotlib notebook
%matplotlib widget

import seaborn as sns 
from sklearn.model_selection import train_test_split 
from sklearn.feature_selection import VarianceThreshold
import warnings
warnings.filterwarnings("ignore")
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import tree
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
from math import sqrt


# About the Dataset :

**Weekly Demand data (train.csv): Contains the historical demand data for all centers, test.csv contains all the following features except the target variable.**

Variable               Definition

id                     Unique ID

week                    Week No

center_id               Unique ID for fulfillment center

meal_id                 Unique ID for Meal

checkout_price          Final price including discount, taxes & delivery charges

base_price              Base price of the meal

emailer_for_promotion   Emailer sent for promotion of meal

homepage_featured       Meal featured at homepage

num_orders              (Target) Orders Count
   

**fulfilment_center_info.csv: Contains information for each fulfilment center**
 

Variable                 Definition

center_id                Unique ID for fulfillment center

city_code                Unique code for city

region_code              Unique code for region

center_type              Anonymized center type

op_area                  Area of operation (in km^2)
 

**meal_info.csv: Contains information for each meal being served**
 

Variable          Definition

meal_id           Unique ID for the meal

category          Type of meal (beverages/snacks/soups….)

cuisine           Meal cuisine (Indian/Italian/…)
 

In [49]:
train = pd.read_csv('../EDA_food_data/train.csv')
test = pd.read_csv('../EDA_food_data/test_QoiMO9B.csv')
meal_info = pd.read_csv('../EDA_food_data/meal_info.csv')
fulfilment_center_info = pd.read_csv('../EDA_food_data/fulfilment_center_info.csv')
print(train.shape)
print(test.shape)
print(meal_info.shape)
print(fulfilment_center_info.shape)


(456548, 9)
(32573, 8)
(51, 3)
(77, 5)


# Printing the datasets indivisually 


In [50]:
train.head(5)

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders
0,1379560,1,55,1885,136.83,152.29,0,0,177
1,1466964,1,55,1993,136.83,135.83,0,0,270
2,1346989,1,55,2539,134.86,135.86,0,0,189
3,1338232,1,55,2139,339.5,437.53,0,0,54
4,1448490,1,55,2631,243.5,242.5,0,0,40


In [51]:
(test.head(5))


Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured
0,1028232,146,55,1885,158.11,159.11,0,0
1,1127204,146,55,1993,160.11,159.11,0,0
2,1212707,146,55,2539,157.14,159.14,0,0
3,1082698,146,55,2631,162.02,162.02,0,0
4,1400926,146,55,1248,163.93,163.93,0,0


In [52]:
(meal_info.head(5))


Unnamed: 0,meal_id,category,cuisine
0,1885,Beverages,Thai
1,1993,Beverages,Thai
2,2539,Beverages,Thai
3,1248,Beverages,Indian
4,2631,Beverages,Indian


In [53]:
fulfilment_center_info.head(5)

Unnamed: 0,center_id,city_code,region_code,center_type,op_area
0,11,679,56,TYPE_A,3.7
1,13,590,56,TYPE_B,6.7
2,124,590,56,TYPE_C,4.0
3,66,648,34,TYPE_A,4.1
4,94,632,34,TYPE_C,3.6


# Merging train the fulfilment_center_info csv and meal_info csv together

In [54]:
train = pd.merge(train,fulfilment_center_info, on='center_id')
test =  pd.merge(test,fulfilment_center_info, on='center_id')

train = pd.merge(train,meal_info, on='meal_id')
test =  pd.merge(test,meal_info, on='meal_id')

In [55]:
train.columns

Index(['id', 'week', 'center_id', 'meal_id', 'checkout_price', 'base_price',
       'emailer_for_promotion', 'homepage_featured', 'num_orders', 'city_code',
       'region_code', 'center_type', 'op_area', 'category', 'cuisine'],
      dtype='object')

In [56]:
test.columns

Index(['id', 'week', 'center_id', 'meal_id', 'checkout_price', 'base_price',
       'emailer_for_promotion', 'homepage_featured', 'city_code',
       'region_code', 'center_type', 'op_area', 'category', 'cuisine'],
      dtype='object')

In [57]:
train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 456548 entries, 0 to 456547
Data columns (total 15 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   id                     456548 non-null  int64  
 1   week                   456548 non-null  int64  
 2   center_id              456548 non-null  int64  
 3   meal_id                456548 non-null  int64  
 4   checkout_price         456548 non-null  float64
 5   base_price             456548 non-null  float64
 6   emailer_for_promotion  456548 non-null  int64  
 7   homepage_featured      456548 non-null  int64  
 8   num_orders             456548 non-null  int64  
 9   city_code              456548 non-null  int64  
 10  region_code            456548 non-null  int64  
 11  center_type            456548 non-null  object 
 12  op_area                456548 non-null  float64
 13  category               456548 non-null  object 
 14  cuisine                456548 non-nu

In [58]:
train.describe()

Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,city_code,region_code,op_area
count,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0,456548.0
mean,1250096.0,74.768771,82.105796,2024.337458,332.238933,354.156627,0.081152,0.1092,261.87276,601.553399,56.614566,4.08359
std,144354.8,41.524956,45.975046,547.42092,152.939723,160.715914,0.273069,0.31189,395.922798,66.195914,17.641306,1.091686
min,1000000.0,1.0,10.0,1062.0,2.97,55.35,0.0,0.0,13.0,456.0,23.0,0.9
25%,1124999.0,39.0,43.0,1558.0,228.95,243.5,0.0,0.0,54.0,553.0,34.0,3.6
50%,1250184.0,76.0,76.0,1993.0,296.82,310.46,0.0,0.0,136.0,596.0,56.0,4.0
75%,1375140.0,111.0,110.0,2539.0,445.23,458.87,0.0,0.0,324.0,651.0,77.0,4.5
max,1499999.0,145.0,186.0,2956.0,866.27,866.27,1.0,1.0,24299.0,713.0,93.0,7.0


In [59]:
print(train.shape)
train.head(5)


(456548, 15)


Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,city_code,region_code,center_type,op_area,category,cuisine
0,1379560,1,55,1885,136.83,152.29,0,0,177,647,56,TYPE_C,2.0,Beverages,Thai
1,1018704,2,55,1885,135.83,152.29,0,0,323,647,56,TYPE_C,2.0,Beverages,Thai
2,1196273,3,55,1885,132.92,133.92,0,0,96,647,56,TYPE_C,2.0,Beverages,Thai
3,1116527,4,55,1885,135.86,134.86,0,0,163,647,56,TYPE_C,2.0,Beverages,Thai
4,1343872,5,55,1885,146.5,147.5,0,0,215,647,56,TYPE_C,2.0,Beverages,Thai


In [60]:
print(test.shape)
test.head(5)

(32573, 14)


Unnamed: 0,id,week,center_id,meal_id,checkout_price,base_price,emailer_for_promotion,homepage_featured,city_code,region_code,center_type,op_area,category,cuisine
0,1028232,146,55,1885,158.11,159.11,0,0,647,56,TYPE_C,2.0,Beverages,Thai
1,1262649,147,55,1885,159.11,159.11,0,0,647,56,TYPE_C,2.0,Beverages,Thai
2,1453211,149,55,1885,157.14,158.14,0,0,647,56,TYPE_C,2.0,Beverages,Thai
3,1262599,150,55,1885,159.14,157.14,0,0,647,56,TYPE_C,2.0,Beverages,Thai
4,1495848,151,55,1885,160.11,159.11,0,0,647,56,TYPE_C,2.0,Beverages,Thai


In [61]:
#another was of checking null coloumns 
[col for col in train.columns if train[col].isnull().sum() > 0]

# the emty list tells the no column has null value in it

[]

In [62]:
train = train.drop(['center_id', 'meal_id'], axis=1)
train.head()

Unnamed: 0,id,week,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,city_code,region_code,center_type,op_area,category,cuisine
0,1379560,1,136.83,152.29,0,0,177,647,56,TYPE_C,2.0,Beverages,Thai
1,1018704,2,135.83,152.29,0,0,323,647,56,TYPE_C,2.0,Beverages,Thai
2,1196273,3,132.92,133.92,0,0,96,647,56,TYPE_C,2.0,Beverages,Thai
3,1116527,4,135.86,134.86,0,0,163,647,56,TYPE_C,2.0,Beverages,Thai
4,1343872,5,146.5,147.5,0,0,215,647,56,TYPE_C,2.0,Beverages,Thai


In [63]:
# x = VarianceThreshold(threshold=0)
# x.fit(train)

In [64]:
# sum(x.get_support()) # no constant coloumns in the dataset 

In [65]:
train.isnull().sum()
# we can see that there are no null coloumn so no need to do null imutation 

id                       0
week                     0
checkout_price           0
base_price               0
emailer_for_promotion    0
homepage_featured        0
num_orders               0
city_code                0
region_code              0
center_type              0
op_area                  0
category                 0
cuisine                  0
dtype: int64

In [66]:
train.apply(lambda x: len(x.unique()))

# telling us no of unique values in the colooumns 
#total items or rows are 456,548

id                       456548
week                        145
checkout_price             1992
base_price                 1907
emailer_for_promotion         2
homepage_featured             2
num_orders                 1250
city_code                    51
region_code                   8
center_type                   3
op_area                      30
category                     14
cuisine                       4
dtype: int64

# Lable Encoding on Catagorical features

So as we can see we have 3 categorical features with us namely : 
    1. center type
    2. category
    3. cuisine

In [67]:
from sklearn.preprocessing import LabelEncoder

In [68]:
l1 = LabelEncoder()
train['center_type'] = l1.fit_transform(train['center_type'])

l2 = LabelEncoder()
train['category'] = l2.fit_transform(train['category'])

l3 = LabelEncoder()
train['cuisine'] = l3.fit_transform(train['cuisine'])


In [69]:
train.head()

Unnamed: 0,id,week,checkout_price,base_price,emailer_for_promotion,homepage_featured,num_orders,city_code,region_code,center_type,op_area,category,cuisine
0,1379560,1,136.83,152.29,0,0,177,647,56,2,2.0,0,3
1,1018704,2,135.83,152.29,0,0,323,647,56,2,2.0,0,3
2,1196273,3,132.92,133.92,0,0,96,647,56,2,2.0,0,3
3,1116527,4,135.86,134.86,0,0,163,647,56,2,2.0,0,3
4,1343872,5,146.5,147.5,0,0,215,647,56,2,2.0,0,3


# Data Visualisation 

In [70]:
# plt.figure(figsize=None)

# sns.barplot(train['emailer_for_promotion'])
# plt.show()
plt.figure(figsize=None)

train.emailer_for_promotion.value_counts(normalize=True).plot(kind='bar',title='emailer_for_promotion')
plt.show()

''' we can see that the Emailer for production has maximum values = 0 so we can drop this coloumn'''

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

' we can see that the Emailer for production has maximum values = 0 so we can drop this coloumn'

In [71]:
plt.figure(figsize=None)

train.homepage_featured.value_counts(normalize=True).plot(kind='bar',title='homepage_featured')
plt.show()

'''Similar thing we can say for home page featured'''

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

'Similar thing we can say for home page featured'

In [72]:
plt.figure(figsize=None)

train.center_type.value_counts(normalize=True).plot(kind='bar',title='center_type')
plt.show()

''' Type C and Type B have some significant values so we'll use it in making features'''

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

" Type C and Type B have some significant values so we'll use it in making features"

In [73]:
plt.figure(figsize=None)

sns.lineplot(x=train['week'],y=train['num_orders'])
plt.show()

'''
There was a huge increase of orders just after week 4(in week 5) and in week 48 

similarly 

There was a huge loss in week 62 
'''

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

'\nThere was a huge increase of orders just after week 4(in week 5) and in week 48 \n\nsimilarly \n\nThere was a huge loss in week 62 \n'

In [74]:
plt.figure(figsize=(16,9))
sns.heatmap(train.corr(),annot = True);
plt.xticks();


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [75]:
plt.figure(figsize=(10,5))

sns.lineplot(x=train["checkout_price"],y=train["num_orders"])
plt.show()

# impaction on no of order

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [76]:
plt.figure(figsize=(11,6))

sns.barplot(x=train["category"],y=train["num_orders"])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [77]:
plt.figure(figsize=None)

sns.barplot(x=train["cuisine"],y=train['num_orders'])
plt.show()

''' Italian was the most preffered cuisine by everyone'''

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

' Italian was the most preffered cuisine by everyone'

In [78]:
plt.figure(figsize=None)
sns.barplot(x=train["op_area"],y=train["num_orders"])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

# Checking Outliers 

Outliers are checked for Numerical coloumns

In [79]:
num_cols = train._get_numeric_data().columns
print(num_cols)


Index(['id', 'week', 'checkout_price', 'base_price', 'emailer_for_promotion',
       'homepage_featured', 'num_orders', 'city_code', 'region_code',
       'center_type', 'op_area', 'category', 'cuisine'],
      dtype='object')


In [80]:
train_copy = train.copy()
for feature in num_cols:
    plt.figure(figsize=None)

    train_copy[feature] = (train_copy[feature])
    train_copy.boxplot(column=feature)
    plt.titile = feature
    plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

# Handelling Outliers

In [81]:
upper_lim = train['checkout_price'].quantile(.95)
lower_lim = train['checkout_price'].quantile(.05)
train = train[(train['checkout_price']<upper_lim)&(train['checkout_price']>lower_lim)]

In [82]:
plt.figure(figsize=None)
train.boxplot(column=['checkout_price'])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [83]:
upper_lim = train['base_price'].quantile(.95)
lower_lim = train ['base_price'].quantile(0.5)
train = train[(train['base_price']< upper_lim)&(train['base_price']> lower_lim)]

In [84]:
plt.figure(figsize=None)
train.boxplot(column=['base_price'])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [85]:
upper_lim = train['op_area'].quantile(0.75)
lower_lim = train['op_area'].quantile(0.25)

train = train[(train['op_area']<upper_lim)&(train['op_area']>lower_lim)]


In [86]:
plt.figure(figsize=None)
train.boxplot(column=['op_area'])
plt.show()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [87]:
train.size

1090778

# Checking correlation :

Obtaining 8 feattures which has high correlation wrt target:
as all of the features were very less correlated to I choose to select top 8 features 

In [88]:
tarin = train.drop(['id'], axis=1)
correlation = train.corr(method='pearson')
columns = correlation.nlargest(8, 'num_orders').index
columns

Index(['num_orders', 'emailer_for_promotion', 'category', 'homepage_featured',
       'cuisine', 'region_code', 'op_area', 'city_code'],
      dtype='object')

# Normalisation Standerdization 


Normalization means to rescale the value in the range of [0,1] and Standerdizarion means rescaling the data to have a mean of 0 and a std deviation of 1 .

I tried performing normalisation in num_orders but 

# Slitting data into test and train 

30% test 70% train 

In [89]:
features = train.columns.drop(['num_orders'])
trainnew = train[features]
X = trainnew.values
y = train['num_orders'].values

from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size = 0.30)

In [90]:
print(trainnew.shape)
trainnew.head()

(83906, 12)


Unnamed: 0,id,week,checkout_price,base_price,emailer_for_promotion,homepage_featured,city_code,region_code,center_type,op_area,category,cuisine
32880,1472938,1,339.5,436.53,0,0,679,56,0,3.7,0,1
32881,1088659,2,323.01,437.53,0,0,679,56,0,3.7,0,1
32882,1171601,3,339.5,436.53,0,0,679,56,0,3.7,0,1
32883,1298679,4,339.5,435.53,0,0,679,56,0,3.7,0,1
32884,1072217,5,437.53,437.53,0,0,679,56,0,3.7,0,1


In [91]:
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Linear Regression:  

In [92]:
LR = LinearRegression()
LR.fit(X_train, y_train) 
y_pred = LR.predict(X_val) 
y_pred[y_pred<0] = 0 
from sklearn import metrics 
print('RMSLE:', 100*np.sqrt(metrics.mean_squared_log_error(y_val, y_pred)))
rms = sqrt(mean_squared_error(y_val, y_pred))
print('RMSE', rms)

RMSLE: 172.36715327145578
RMSE 220.76812525879538


# DECISION TREE

In [93]:
DT = DecisionTreeRegressor()
DT.fit(X_train, y_train)
y_pred = DT.predict(X_val)
y_pred[y_pred<0] = 0
from sklearn import metrics
print('RMSLE:', 100*np.sqrt(metrics.mean_squared_log_error(y_val, y_pred)))
rms = sqrt(mean_squared_error(y_val, y_pred))
print('RMSE', rms)

RMSLE: 73.71526260487182
RMSE 165.67567919565815


In [94]:
# RF = RandomForestClassifier()
# RF.fit(X_train,y_train)
# y_pred = RF.predict(x_val)
# y_pred[y_pred<0] = 0
# print('RMSLE:', 100*np.sqrt(metrics.mean_squared_log_error(y_val, y_pred)))
# rms = sqrt(mean_squared_error(y_val, y_pred))
# print('RMSE', rms)

# KNN

In [95]:
KNN = KNeighborsRegressor()
KNN.fit(X_train, y_train)
y_pred = KNN.predict(X_val)
y_pred[y_pred<0] = 0
from sklearn import metrics
print('RMSLE:', 100*np.sqrt(metrics.mean_squared_log_error(y_val, y_pred)))
rms = sqrt(mean_squared_error(y_val, y_pred))
print('RMSE', rms)

RMSLE: 115.32799777023186
RMSE 266.439519657779


# comparision on the RMSE and RMSLE values

We can clearly see that the Decision tree is performing best on out data

# Hyper parameter Tuning on Decision Tree

In [107]:
param_grid = { "min_samples_split": [2, 4, 8, 16], "min_samples_leaf": [1, 2, 3, 4], "max_leaf_nodes": [None, 10, 20, 100] }
grid_cv_dtm = GridSearchCV(DT, param_grid, cv=5)
grid_cv_dtm.fit(X_train, y_train)

GridSearchCV(cv=5, estimator=DecisionTreeRegressor(),
             param_grid={'max_leaf_nodes': [None, 10, 20, 100],
                         'min_samples_leaf': [1, 2, 3, 4],
                         'min_samples_split': [2, 4, 8, 16]})

In [108]:
print("R-Squared::{}".format(grid_cv_dtm.best_score_))
print("Best Hyperparameters::\n{}".format(grid_cv_dtm.best_params_))

R-Squared::0.7019099261597117
Best Hyperparameters::
{'max_leaf_nodes': None, 'min_samples_leaf': 4, 'min_samples_split': 16}


In [110]:
# df = pd.DataFrame(data=grid_cv_dtm.cv_results_)
# df.head()

In [111]:
grid_cv_dtm.best_estimator_.fit(X_train, y_train)
y_pred = grid_cv_dtm.best_estimator_.predict(X_val)
y_pred[y_pred<0] = 0
from sklearn import metrics
print('RMSLE:', 100*np.sqrt(metrics.mean_squared_log_error(y_val, y_pred)))
rms = sqrt(mean_squared_error(y_val, y_pred))
print('RMSE', rms)

RMSLE: 60.911636167716225
RMSE 142.13218491363548


# Result of hyper parameter tunung: 
We can see that there has been improvement in the scores after tuning

# Prediction on train data 

In [112]:
testfinal = test.drop(['meal_id', 'center_id'], axis=1)
testcols = test.columns.tolist()
print(testcols)

['id', 'week', 'center_id', 'meal_id', 'checkout_price', 'base_price', 'emailer_for_promotion', 'homepage_featured', 'city_code', 'region_code', 'center_type', 'op_area', 'category', 'cuisine']


In [113]:
X_test = testfinal[features].values
X_test

array([[1028232, 146, 158.11, ..., 2.0, 'Beverages', 'Thai'],
       [1262649, 147, 159.11, ..., 2.0, 'Beverages', 'Thai'],
       [1453211, 149, 157.14, ..., 2.0, 'Beverages', 'Thai'],
       ...,
       [1396176, 149, 629.53, ..., 4.5, 'Fish', 'Continental'],
       [1331977, 150, 629.53, ..., 4.5, 'Fish', 'Continental'],
       [1017414, 152, 630.53, ..., 4.5, 'Fish', 'Continental']],
      dtype=object)

In [114]:
lb1 = LabelEncoder()
testfinal['center_type'] = lb1.fit_transform(testfinal['center_type'])

lb2 = LabelEncoder()
testfinal['category'] = lb1.fit_transform(testfinal['category'])

lb3 = LabelEncoder()
testfinal['cuisine'] = lb1.fit_transform(testfinal['cuisine'])
print(testfinal.shape)
testfinal.head()

(32573, 12)


Unnamed: 0,id,week,checkout_price,base_price,emailer_for_promotion,homepage_featured,city_code,region_code,center_type,op_area,category,cuisine
0,1028232,146,158.11,159.11,0,0,647,56,2,2.0,0,3
1,1262649,147,159.11,159.11,0,0,647,56,2,2.0,0,3
2,1453211,149,157.14,158.14,0,0,647,56,2,2.0,0,3
3,1262599,150,159.14,157.14,0,0,647,56,2,2.0,0,3
4,1495848,151,160.11,159.11,0,0,647,56,2,2.0,0,3


In [115]:
X_test = testfinal[features].values
X_test


array([[1.028232e+06, 1.460000e+02, 1.581100e+02, ..., 2.000000e+00,
        0.000000e+00, 3.000000e+00],
       [1.262649e+06, 1.470000e+02, 1.591100e+02, ..., 2.000000e+00,
        0.000000e+00, 3.000000e+00],
       [1.453211e+06, 1.490000e+02, 1.571400e+02, ..., 2.000000e+00,
        0.000000e+00, 3.000000e+00],
       ...,
       [1.396176e+06, 1.490000e+02, 6.295300e+02, ..., 4.500000e+00,
        4.000000e+00, 0.000000e+00],
       [1.331977e+06, 1.500000e+02, 6.295300e+02, ..., 4.500000e+00,
        4.000000e+00, 0.000000e+00],
       [1.017414e+06, 1.520000e+02, 6.305300e+02, ..., 4.500000e+00,
        4.000000e+00, 0.000000e+00]])

In [118]:
pred = DT.predict(X_test)
submission = pd.DataFrame({'id' : testfinal['id'],'num_orders' : pred})
submission.head()

Unnamed: 0,id,num_orders
0,1028232,40.0
1,1262649,40.0
2,1453211,40.0
3,1262599,40.0
4,1495848,40.0
