# Inference Notebook - Tractors
<hr />

This notebook is used to load new data and get predictions (inference).

## Use:

- Step 1: Upload CSV file
- Step 2: Load CSV file into this Notebook
- Step 3: Run Model
- Step 4: Save output to a CSV
<hr />


# Import Required Libraries

In [6]:
import pandas as pd
import pickle
## Load Model
model = pickle.load(open('../../models/tractorLGBMR_ALL.model', 'rb'))
model

LGBMRegressor(bagging_fraction=0.8, bagging_freq=15, boosting_type='gbdt',
              class_weight=None, colsample_bytree=1.0, feature_fraction=0.5,
              importance_type='split', is_unbalance=True, learning_rate=0.1,
              max_depth=8, min_child_samples=20, min_child_weight=0.001,
              min_split_gain=0.0, n_estimators=400, n_jobs=-1, num_leaves=60,
              objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0,
              scoring='neg_mean_squared_error', silent=True, subsample=1.0,
              subsample_for_bin=200000, subsample_freq=0)

# Step 1: Upload and Rename CSV file
<hr />

1. Download Tractor Template [here](../data/tractor_sample.csv) to verify you are using the same column names
2. Upload the CSV file to the "data" directory with the name `tractor_inference.csv`


# Step 2: Load CSV file into this notebook
<hr /> 

1. Run the next cell by clicking "Run" or SHIFT+ENTER

In [7]:
%run tractor_data_transform.ipynb

There are 26 original columns and 6473 rows in this file.

 *************************** 

These columns were not expected in the data:

['vehicle_nbr', 'sales_grp', 'sale_date']

*************************** 

Categorical Features:

epa_tech_year: ['2002,2004,2007,2010,2013,2014']
transmission_manufacturer_cd: ['FULL,ALLI']
engine_manufacturer_cd: ['DETR,CUMM,CAT,INTL']
model_mfg: ['FRLT,INTL']

*************************** 


There were missing Categorical Values in the datasets uploaded.

  The model was trained with the values in the list below, and they
  will be created for running this inference with 0 as the value. 

  **Note: This could potentially affect the model performance**

 ['transmission_manufacturer_cd_ALLI', 'engine_manufacturer_cd_INTL', 'model_mfg_FRTL', 'engine_manufacturer_cd_CAT', 'model_mfg_INTL']


 *************************** 

After Cleanup and Imputation, there are 41 columns and 6473 rows in this file.
The model is expecting 41 columns

 *********************

# Step 3: Run Model

In [8]:
df["pred_proceed"] = model.predict(df) 

In [9]:
pd.set_option('display.max_columns', None)
df.head(10)

Unnamed: 0,accpt_date,accumulated_depreciation_amt,axle_total_count,days_to_outservice,drive_tire_size_cd,engine_horsepower,engine_model_id,model_year,net_vehicle_invest,odomoter_sale,rear_axle_capacity,sam_summary_class_cd,transmission_model_id,transmission_speed_qn,vehicle_age,vehicle_disp_cond_cd,vehicle_gvw_class_cd,vehicle_sam_class_cd,engine_manufacturer_cd_CUMM,engine_manufacturer_cd_DETR,engine_manufacturer_cd_OTHER,epa_tech_year_2002,epa_tech_year_2004,epa_tech_year_2007,epa_tech_year_2010,epa_tech_year_2013,epa_tech_year_2014,epa_tech_year_2017,epa_tech_year_UNKNOWN,model_mfg_FRTL,model_mfg_OTHER,suspension_type_cd_A,suspension_type_cd_L,suspension_type_cd_T,transmission_manufacturer_cd_FULL,transmission_manufacturer_cd_OTHER,transmission_manufacturer_cd_ALLI,engine_manufacturer_cd_INTL,model_mfg_FRTL.1,engine_manufacturer_cd_CAT,model_mfg_INTL,pred_proceed
0,201908,80163.0,3.0,0,1102,450,911375872,2012,86662.5,388336,40000,320,95511377,10,98,40.0,8,140,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
1,201908,79327.0,3.0,184,1373,455,952617676,2014,127965.94,505658,40000,330,596876053,10,68,40.0,8,170,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10190.860146
2,201811,73640.0,3.0,9,1219,500,629376267,2013,112713.8,771166,40000,330,4536076,13,85,10.0,8,170,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,11793.357505
3,201901,54934.0,3.0,76,1102,450,892241856,2013,90696.69,434498,40000,320,95511377,10,80,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
4,201901,54934.0,3.0,76,1102,450,892241856,2013,90696.69,434498,40000,320,95511377,10,80,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
5,201901,54934.0,3.0,76,1102,450,892241856,2013,90696.69,434498,40000,320,95511377,10,80,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
6,201901,54934.0,3.0,76,1102,450,892241856,2013,90696.69,434498,40000,320,95511377,10,80,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
7,201907,61575.0,3.0,95,1102,450,215612970,2012,89229.24,69497,40000,320,95511377,10,98,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
8,201907,61575.0,3.0,95,1102,450,215612970,2012,89229.24,69497,40000,320,95511377,10,98,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785
9,201907,61575.0,3.0,95,1102,450,215612970,2012,89229.24,69497,40000,320,95511377,10,98,10.0,8,140,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0,0,0,0,0,10649.84785


In [5]:
df.shape

(6473, 42)

# Step 4: Save and Download Results

In [19]:
from datetime import datetime
filename = datetime.now().strftime("%Y%m%d-%H%M%S")+'.csv'
filepath = '../../data/tractor_inference_'
fullpath = filepath+filename
df.to_csv(fullpath)
create_download_link(df,filename)