In [23]:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.gaussian_process.kernels import RBF
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler


## Linear Regression
In this notebook we create a linear prediction model.

In [31]:
trips_df = pd.read_pickle('../00_data/trips_hourly_selected.pkl')

First we create a dataframe encompassing our selected paramenters.

In [32]:
MulReg = pd.DataFrame()
MulReg["demand"] = trips_df["starting_trips"]
MulReg["min_temp"] = trips_df["min_temp"]
MulReg["available_bikes"] = trips_df["available_bikes"]
MulReg["hour"] = trips_df["hour"]
MulReg["month"] = trips_df["month"]
MulReg["is_holiday"] = trips_df["is_holiday"]
MulReg.info()


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7977 entries, 2019-01-01 01:00:00 to 2019-12-31 23:00:00
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   demand           7977 non-null   float64
 1   min_temp         7977 non-null   float64
 2   available_bikes  7977 non-null   float64
 3   hour             7977 non-null   int64  
 4   month            7977 non-null   int64  
 5   is_holiday       7977 non-null   bool   
dtypes: bool(1), float64(3), int64(2)
memory usage: 381.7 KB


We split the dataframe in X and y and standardize the parameters in X. We also standardize the data thus ensuring they have the same mean (0) and standard deviation (1).

In [34]:
X = MulReg[["min_temp","available_bikes","hour","month","is_holiday",]]
X_std = StandardScaler().fit_transform(X)
y = MulReg["demand"]

We make a train test split with a testsize of 30%.

In [35]:
X_train, X_test, y_train, y_test = train_test_split(
    X_std, y, test_size=0.3, random_state=0
)


## Training the model

In [36]:
lin_mod_mul = LinearRegression()
lin_mod_mul.fit(X_train, y_train)
y_pred = lin_mod_mul.predict(X_test)
y_true_2 = y_test


Here we print the coefficients for the model.

In [37]:
print(
    "The Coefficients for our multiple linear regression model are:",
    "\n" "\n" "min_temp        =   ",
    lin_mod_mul.coef_[0],
    "\n" "available_bikes =  ",
    lin_mod_mul.coef_[1],
    "\n" "hour            =   ",
    lin_mod_mul.coef_[2],
    "\n" "month           =   ",
    lin_mod_mul.coef_[3],
    "\n" "is_holiday      =   ",
    lin_mod_mul.coef_[4],
    "\n" "The Intercept is:",
    lin_mod_mul.intercept_,
)


The Coefficients for our multiple linear regression model are: 

min_temp        =    -4.578909084473554 
available_bikes =   4.21231363158025 
hour            =    9.302884716591885 
month           =    2.9930031911936323 
is_holiday      =    -1.1950362852277219 
The Intercept is: 26.813398176070617


## Evaluating the testmetrics

In [38]:
print("MAE:", mean_absolute_error(y_true_2, y_pred))
print("MSE:", mean_squared_error(y_true_2, y_pred))
print("R^2:", r2_score(y_true_2, y_pred))

MAE: 14.779343682249632
MSE: 366.7337845239947
R^2: 0.20686247016334214
