# ML Challenge - Validation (file 4/4)

Note: all initial EDA steps have been previously completed on a separate file. The model has been trained and pickled on a second file, and the hyperparameters tuned on a third file.
This script only includes the final steps that will have to be applied to the final validation dataset when released.

Import libraries

In [1]:
import pandas as pd
import pickle
from datetime import datetime

Open validation dataset

In [20]:
validation_df = pd.read_csv('validation_for_students.csv')

Feature engineering

In [21]:
# Drop True_index - save it in a variable
true_index = validation_df[['True_index']]
validation_df.drop(columns=['True_index'], axis=1, inplace=True)

# Convert State_holiday to dummies 
validation_dummies = pd.get_dummies(validation_df['State_holiday'], prefix='State_holiday')
validation_updated = pd.merge(left=validation_df,
                            right = validation_dummies,
                            left_index = True,
                            right_index = True)
validation_updated.drop(columns=['State_holiday'], axis=1, inplace=True)

# Convert date to ordinal
def convert_date_to_ordinal(date):
    return datetime.strptime(date, '%Y-%M-%d').toordinal()

validation_updated['Date'] = validation_updated['Date'].apply(convert_date_to_ordinal)

Unpickle model (see file 3, tune_hyperparameters)

In [9]:
# Replace with real path of the pickled model:

model = pickle.load(open('pickled_model_path', 'rb'))

Apply model to predict validation dataset

In [22]:
pred = pd.DataFrame(model.predict(validation_updated))

Combine True index and prediction in a dataframe

In [23]:
final_pred = pd.concat([true_index, pred], axis=1)


Check the final dataframe has been successfully created

In [24]:
final_pred 

Unnamed: 0,True_index,0
0,7,-61.022543
1,19,3586.255177
2,31,6283.653725
3,45,3655.204055
4,56,3796.321917
...,...,...
71200,712004,5526.505420
71201,712018,10535.493609
71202,712020,15462.255358
71203,712023,-46.344184


Save results to csv file to submit it

In [25]:
final_pred.to_csv('final_prediction.csv')