## Introduction
- Name: Izzan Dienurrahman
- Batch: HCK-003

## Objective
- Menerapkan model inference

## Problem Statements
- Meload data inference
- Meload model, estimator, scaler, dan selected features yang sudah disimpan sebelumnya.
- Mentransform data inference ke format yang siap di terima model
- Melakukan inference dalam hal ini prediksi harga/tarif perjalanan
- Meninjau hasil inference

# Import Library

In [1]:
# import semua library yang dibutuhkan
import joblib
import pandas as pd
import numpy as np

# Load Model

In [2]:
# load model, encoder, scaler, dan selected features
with open('model_lin_reg.pkl','rb') as file_1:
    model_lin_reg = joblib.load(file_1)

with open('model_scaler.pkl','rb') as file_2:
    model_scaler = joblib.load(file_2)

with open('model_encoder.pkl','rb') as file_3:
    model_encoder = joblib.load(file_3)

with open('list_num_cols.txt','rb') as file_4:
    num_cols = joblib.load(file_4)

with open('list_cat_cols.txt','rb') as file_5:
    cat_cols = joblib.load(file_5)

# Data Loading

In [3]:
data_infer = pd.read_csv('p1g1_infer.csv',index_col=[0]) #load ke dataframe

In [4]:
data_infer.head()

Unnamed: 0,distance,surge_multiplier,cab_type,service_type,weather
0,1.11,1.0,Lyft,Lyft,Partly Cloudy
1,2.48,1.0,Uber,Black,Overcast
2,2.94,1.0,Uber,Black SUV,Partly Cloudy
3,1.16,1.0,Uber,UberX,Mostly Cloudy
4,2.67,1.0,Uber,UberX,Clear


# Model Inference

In [5]:
# memisahkan fitur numerikal dan kategorikal
data_infer_num = data_infer[num_cols]
data_infer_cat = data_infer[cat_cols]

In [6]:
data_infer_num #show fitur numerikal

Unnamed: 0,distance,surge_multiplier
0,1.11,1.0
1,2.48,1.0
2,2.94,1.0
3,1.16,1.0
4,2.67,1.0
...,...,...
55090,2.50,1.0
55091,0.91,1.0
55092,1.79,1.0
55093,1.61,1.0


In [7]:
data_infer_cat #show fitur kategorikal

Unnamed: 0,cab_type,service_type,weather
0,Lyft,Lyft,Partly Cloudy
1,Uber,Black,Overcast
2,Uber,Black SUV,Partly Cloudy
3,Uber,UberX,Mostly Cloudy
4,Uber,UberX,Clear
...,...,...,...
55090,Uber,UberPool,Mostly Cloudy
55091,Lyft,Lux,Mostly Cloudy
55092,Lyft,Lyft,Mostly Cloudy
55093,Uber,UberXL,Overcast


In [8]:
# transform data numerikal dan kategorical
data_infer_num_scaled = model_scaler.transform(data_infer_num) # scaling data numerikal
data_infer_cat_encoded = model_encoder.transform(data_infer_cat) # encoding data kategorikal

In [9]:
data_infer_num_scaled # hasil scaling fitur numerikal

array([[0.20185185, 0.        ],
       [0.45555556, 0.        ],
       [0.54074074, 0.        ],
       ...,
       [0.32777778, 0.        ],
       [0.29444444, 0.        ],
       [0.18148148, 0.        ]])

In [10]:
data_infer_cat_encoded # hasil encoding fitur categorical dengan one-hot encoder

array([[1., 0., 0., ..., 1., 0., 0.],
       [0., 1., 1., ..., 0., 0., 0.],
       [0., 1., 0., ..., 1., 0., 0.],
       ...,
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 1., 0., 0.]])

In [11]:
# menggabungkan data numerikal dan kategorical yang sudah di encode
data_infer_final = np.concatenate([data_infer_num_scaled, data_infer_cat_encoded], axis=1)
data_infer_final

array([[0.20185185, 0.        , 1.        , ..., 1.        , 0.        ,
        0.        ],
       [0.45555556, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.54074074, 0.        , 0.        , ..., 1.        , 0.        ,
        0.        ],
       ...,
       [0.32777778, 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       [0.29444444, 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.18148148, 0.        , 0.        , ..., 1.        , 0.        ,
        0.        ]])

In [12]:
# mengekstrak nama fitur dari encoder
ohe_feature_names = model_encoder.get_feature_names_out(input_features=cat_cols)
ohe_feature_names

array(['cab_type_Lyft', 'cab_type_Uber', 'service_type_Black',
       'service_type_Black SUV', 'service_type_Lux',
       'service_type_Lux Black', 'service_type_Lux Black XL',
       'service_type_Lyft', 'service_type_Lyft XL', 'service_type_Shared',
       'service_type_UberPool', 'service_type_UberX',
       'service_type_UberXL', 'service_type_WAV', 'weather_ Clear ',
       'weather_ Drizzle ', 'weather_ Foggy ', 'weather_ Light Rain ',
       'weather_ Mostly Cloudy ', 'weather_ Overcast ',
       'weather_ Partly Cloudy ', 'weather_ Possible Drizzle ',
       'weather_ Rain '], dtype=object)

In [13]:
# menggabungkan nama fitur numerikal dan kategorikal dalam sebuah list
encoded_features_infer = data_infer_num.columns.to_list() + ohe_feature_names.tolist() 

In [14]:
# construct final df infer
df_infer_final = pd.DataFrame(data=data_infer_final,columns=encoded_features_infer)
df_infer_final.head(3)

Unnamed: 0,distance,surge_multiplier,cab_type_Lyft,cab_type_Uber,service_type_Black,service_type_Black SUV,service_type_Lux,service_type_Lux Black,service_type_Lux Black XL,service_type_Lyft,...,service_type_WAV,weather_ Clear,weather_ Drizzle,weather_ Foggy,weather_ Light Rain,weather_ Mostly Cloudy,weather_ Overcast,weather_ Partly Cloudy,weather_ Possible Drizzle,weather_ Rain
0,0.201852,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
1,0.455556,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,0.540741,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0


In [15]:
# Predict data inference
y_pred_inf = model_lin_reg.predict(df_infer_final)
y_pred_inf

array([ 5.81241989, 21.38837051, 32.49814987, ...,  7.77968979,
       14.0516777 ,  5.37236404])

In [16]:
data_infer['predicted_price'] = y_pred_inf

In [17]:
data_infer

Unnamed: 0,distance,surge_multiplier,cab_type,service_type,weather,predicted_price
0,1.11,1.0,Lyft,Lyft,Partly Cloudy,5.812420
1,2.48,1.0,Uber,Black,Overcast,21.388371
2,2.94,1.0,Uber,Black SUV,Partly Cloudy,32.498150
3,1.16,1.0,Uber,UberX,Mostly Cloudy,6.860828
4,2.67,1.0,Uber,UberX,Clear,11.195187
...,...,...,...,...,...,...
55090,2.50,1.0,Uber,UberPool,Mostly Cloudy,9.701954
55091,0.91,1.0,Lyft,Lux,Mostly Cloudy,13.418262
55092,1.79,1.0,Lyft,Lyft,Mostly Cloudy,7.779690
55093,1.61,1.0,Uber,UberXL,Overcast,14.051678


# Tinjauan Hasil

In [18]:
data_infer.describe()

Unnamed: 0,distance,surge_multiplier,predicted_price
count,55095.0,55095.0,55095.0
mean,2.191383,1.0,16.338212
std,1.178985,0.0,8.802529
min,0.02,1.0,-0.223522
25%,1.3,1.0,9.43079
50%,2.17,1.0,14.97176
75%,2.84,1.0,22.281445
max,7.86,1.0,46.853001


Secara umum output prediction valuenya relatif wajar, nilai minimum predicted price adalah minus namun relatif dekat dengan nol. Mean predicted price (16.33) juga dekat dengan mean price sewaktu EDA (16.5). Penulis cukup puas dengan hasil inference model yang dihasilkan.