# ML Challenge - Improve model (file 3/4)

After selecting Gradient Boosting out of all the options tested (see file 2), the next step is to continue tuning the model to improve scores.

## Import libraries

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
import pickle

## Open cleaned file

In [13]:
sales_updated = pd.read_csv('sales_updated.csv')
sales_updated

Unnamed: 0,Store_ID,Day_of_week,Date,Nb_customers_on_day,Open,Promotion,School_holiday,Sales,State_holiday_0,State_holiday_a,State_holiday_b,State_holiday_c
0,625,3,734874,641,1,1,0,7293,1,0,0,0
1,293,2,734884,877,1,1,1,7060,1,0,0,0
2,39,4,735256,561,1,1,0,4565,1,0,0,0
3,676,4,734894,1584,1,1,0,6380,1,0,0,0
4,709,3,735255,1477,1,1,0,11647,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...
640835,674,6,735253,611,1,0,0,4702,1,0,0,0
640836,1014,4,735613,1267,1,1,0,12545,1,0,0,0
640837,135,6,735618,595,1,0,0,5823,1,0,0,0
640838,810,1,735251,599,1,1,1,7986,1,0,0,0


## Create labels

In [14]:
features = sales_updated.drop(columns = ['Sales'])
labels = sales_updated['Sales']

## Split train-test data

In [15]:
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.20, random_state=0) 

## Gradient boosting

### Option 1: n_estimators = 100

In [6]:
gb_reg = GradientBoostingRegressor(max_depth=9,
            n_estimators=100,
            random_state=0)
gb_reg.fit(X_train, y_train)
pred = gb_reg.predict(X_test)
print(gb_reg.score(X_test, y_test))
print(gb_reg.score(X_train, y_train))

0.9523592002547066
0.9552872014492811


In [7]:
pickle.dump(gb_reg, open('model_gradientboosting_9_100.p', 'wb'))

### Option 2: n_estimators = 400

In [21]:
gb_reg = GradientBoostingRegressor(max_depth=9,
            n_estimators=400,
            random_state=0)
gb_reg.fit(X_train, y_train)
pred = gb_reg.predict(X_test)
print(gb_reg.score(X_test, y_test))
print(gb_reg.score(X_train, y_train))

0.9835755758533203
0.9879940750469733


In [22]:
pickle.dump(gb_reg, open('model_gradientboosting_9_400_2.p', 'wb'))

### Increase n_estimators

Note: increasing n_estimators would result in a higher score. However, due to time limitations (the challenge had to be completed in a few hours) and constraints with my computer performance, I kept my model as above, with n_estimators=400