## **Rideshare Price Prediction**

### **A. Introduction**

- **Name**  : Livia Amanda Annafiah
- **Dataset** : [Uber and Lyft Dataset Boston, MA](https://www.kaggle.com/datasets/brllrb/uber-and-lyft-dataset-boston-ma?resource=download)

---------------------

**Problem Statement**

A ride-hailing company is currently facing a significant challenge as the cost of rides tends to fluctuate unexpectedly. This unpredictability arises from a range of dynamic factors that influence pricing. Such unpredictability makes it hard for users to know how much they'll need to pay in advance, often leading to frustration and a loss of trust when the final price is higher than expected.

To address this, the company aims to improve the accuracy of their price prediction algorithms. By offering more reliable price forecasts, they seek to enhance user trust and satisfaction, strengthening their position in a competitive market.

**Objective**  

This project aims to develop a `Linear Regression` model designed for predicting ride-hailing prices accurately based on selected features. The evaluation metrics will include `MAE (Mean Absolute Error)`, `MSE (Mean Squared Error)`, `RMSE (Root Mean Squared Error)`, and `R2 score` to assess the model's performance and predictive accuracy.

This notebook is to test the model's inference capabilities using the model that has been developed.

### **B. Libraries**

The libraries used to test the model are as follows:

In [1]:
# Import Library
import pandas as pd
import numpy as np
import pickle
import json

**Libraries Function**
- pandas: data manipulation
- numpy: numerical computations and operations on arrays
- pickle: loading model
- json: reading json files

### **C. Data Loading**

The initial step involves loading the model and the inference data, which have been previously separated from the model training file.

In [2]:
# Load model and related files
with open('model.pkl', 'rb') as model_file:
    model = pickle.load(model_file)

with open('standard_scaler.pkl', 'rb') as scaler_file:
    standard_scaler = pickle.load(scaler_file)

with open('minmax_scaler.pkl', 'rb') as minmax_file:
    minmax_scaler = pickle.load(minmax_file)

with open('ohe.pkl', 'rb') as ohe_file:
    ohe = pickle.load(ohe_file)
    
with open('num_std.json', 'r') as num_std_file:
    num_std = json.load(num_std_file)

with open('num_minmax.json', 'r') as num_minmax_file:
    num_minmax = json.load(num_minmax_file)

with open('cat_nominal.json', 'r') as cat_nominal_file:
    cat_nominal = json.load(cat_nominal_file)


In [3]:
# Load data inference
df_inf = pd.read_csv('data_inf.csv')

df_inf

Unnamed: 0.1,Unnamed: 0,id,timestamp,hour,day,month,datetime,timezone,source,destination,...,precipIntensityMax,uvIndexTime,temperatureMin,temperatureMinTime,temperatureMax,temperatureMaxTime,apparentTemperatureMin,apparentTemperatureMinTime,apparentTemperatureMax,apparentTemperatureMaxTime
0,0,0fc86eb7-1c81-40ed-af4b-24f344a2d55c,1543744000.0,9,2,12,2018-12-02 09:42:57,America/New_York,North End,Theatre District,...,0.0894,1543770000,36.4,1543726800,50.94,1543788000,35.78,1543748400,50.27,1543788000


### **D. Data Splitting**

After loading the data, it must be separated according to data types, similar to the procedure followed during the modeling section.

In [4]:
# Split between numerical and categorical column
df_inf_num_std = df_inf[num_std]
df_inf_num_minmax = df_inf[num_minmax]
df_inf_cat_nominal = df_inf[cat_nominal]

df_inf_cat_nominal

Unnamed: 0,cab_type,name
0,Lyft,Lux


### **E. Feature Engineering**

The next step involves transforming the data, which includes encoding categorical variables and scaling numerical values, to ensure consistency with the preprocessing steps used during model training.

Before proceeding with the transformation, it is necessary to modify any columns that have changed, to ensure alignment with the preprocessing requirements established during the model training phase.

After making necessary modifications to the columns, the variables can now be scaled and encoded. Then, save the final transformed variables that will be used for making predictions.

In [5]:
# Scaling and Encoding
df_inf_num_standard_scaled = standard_scaler.transform(df_inf_num_std)
df_inf_num_minmax_scaled = minmax_scaler.transform(df_inf_num_minmax)
df_inf_cat_nominal_encoded = ohe.transform(df_inf_cat_nominal)

# Concatenate
df_inf_final = np.concatenate([df_inf_num_standard_scaled, df_inf_num_minmax_scaled, df_inf_cat_nominal_encoded], axis=1)

### **F. Model Prediction**

Finally, the prepared model can be applied to the processed inference data to generate predictions.

In [6]:
# Predict using Linear Regression
y_pred_inf = model.predict(df_inf_final)

# Show result
print('Price Prediction:',round(y_pred_inf[0],2))

Price Prediction: 15.11


### **G. Conclusion**

It can be concluded that the predicted price is **$15.11**, whereas the actual price is **$16.5**. This indicates that there is still a little gap and error in the model but the model succesfully predict the price.