# PRCP-1018-BikeRental

### Problem Statement

By predicting rental patterns, companies can adapt their pricing strategies according to anticipated demand. This involves forecasting the daily/hourly bike rental count using machine learning algorithms, taking into account environmental factors and seasonal conditions.

### 1.Importing Libraries

In [None]:
# Importing numpy library for working with arrays
import numpy as np

# Importing pandas library for working with data sets
import pandas as pd

# Importing seaborn library for visualization
import seaborn as sns

# Importing matplotlib.pyplot for visualization
import matplotlib.pyplot as plt
%matplotlib inline

# Importing warnings for disable warnings from the code
import warnings

# To ignore the warnings
warnings.filterwarnings('ignore')

# Importing ProfileReport from ydata_profiling
from ydata_profiling import ProfileReport

# Importing train_test_split
from sklearn.model_selection import train_test_split

# Importing metrics for evaluating the performance of Regression models
from sklearn.metrics import *

# Importing LogisticRegressor model from sklearn
from sklearn.linear_model import LinearRegression

# Importing DecisionTreeRegressor model from sklearn
from sklearn.tree import DecisionTreeRegressor

# Importing RandomForestRegressor model from sklearn
from sklearn.ensemble import RandomForestRegressor

# Importing XGBRegressor model from sklearn
from xgboost import XGBRegressor

# Importing SVR model from sklearn
from sklearn.svm import SVR

#Importing Kneighborsregressor model from sklearn
from sklearn.neighbors import KNeighborsRegressor

#Importing GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Importing RandomizedSearchCV for Hyperparameter tuning
from sklearn.model_selection import RandomizedSearchCV

### 2.Importing Data

In [None]:
day_data=pd.read_csv('day.csv',parse_dates=['dteday'])
day_data

In [None]:
hr_data=pd.read_csv('hour.csv',parse_dates=['dteday'])
hr_data

### 3.Domain Analysis

1. instant: It holds a unique identifier for each record.
2. dteday: This column will have the date of the record, in a format like YYYY-MM-DD.
3. season: This column indicates the season of the year where-1: Winter,2: Spring,3: Summer,4: Fall.
4. yr: It holds the the year of the observation, where-0: 2011,1: 2012.
5. mnth: This column tells the month of the year (1–12).
6. hr: This tells the hour of the day (0–23).
7. holiday: Here we can know whether the day is a holiday or not.
8. weekday: The day of the week (likely from 0 to 6, corresponding to Monday to Sunday).
9. workingday: A binary flag indicating whether the day is a working day (1 for working days, 0 for weekends or holidays).
10. weathersit: Here we can know the weather condition, represented as-1: Clear, Few clouds, Partly cloudy,2: Mist, Cloudy, Broken clouds, Few clouds,3: Light Snow, Light Rain with Thunderstorm, Scattered clouds,4: Heavy Rain, Ice Pellets, Thunderstorm, Mist, Snow with Fog
11. temp: This column tells the normalized temperature in Celsius (normalized between -8 and 39).
12. atemp: It holds the normalized feeling temperature in Celsius (normalized between -16 and 50).
13. hum: This column tells the normalized humidity (values between 0 and 1, where the maximum is 100%).
14. windspeed: This will let us know the normalized wind speed (values between 0 and 1, divided by a maximum value of 67).
15. casual: It tells the count of casual users (non-registered bike renters).
16. registered: This column holds the count of registered users (subscribers).
17. cnt: In this column will get the information about the total number of bike rentals (both casual and registered users).

In [None]:
#getting the peak hour of the day
peak_hour = hr_data.loc[hr_data.groupby('dteday')['cnt'].idxmax().reset_index(drop=True), ['dteday', 'hr']]
peak_hour.rename(columns={'hr': 'peak_hour'},inplace=True)

In [None]:
#getting the count of rentals during the peak hour for each day
peak_hour_counts=hr_data.groupby('dteday')['cnt'].max().reset_index()
peak_hour_counts.rename(columns={'cnt':'peak_hour_rentals'},inplace=True)

In [None]:
#count of rentals in day_time
day_rentals = hr_data[(hr_data['hr'] >= 6) & (hr_data['hr'] <=12)].groupby('dteday')['cnt'].sum().reset_index()
day_rentals.rename(columns={'cnt': 'day_rentals'},inplace=True)

In [None]:
#count of rentals at night_time
night_rentals = hr_data[(hr_data['hr'] >=21) | (hr_data['hr'] <=4)].groupby('dteday')['cnt'].sum().reset_index()
night_rentals.rename(columns={'cnt': 'night_rentals'},inplace=True)

In [None]:
data = day_data.copy()

# Merging all features on 'dteday'
data = data.merge(peak_hour, on='dteday', how='left')
data = data.merge(peak_hour_counts, on='dteday', how='left')
data = data.merge(day_rentals, on='dteday', how='left')
data = data.merge(night_rentals, on='dteday',how='left')

In [None]:
data

In [None]:
data['morning_rental_ratio'] = data['day_rentals'] / data['cnt']
data['night_rental_ratio'] = data['night_rentals'] / data['cnt']

In [None]:
data.drop(columns=['day_rentals','night_rentals'],inplace=True)

In [None]:
col_to_move = 'cnt'  
data[col_to_move] = data.pop(col_to_move) 

Insight: The hour and day data shared the same features, except for the hour predictor. As a result, we created a new columns based on the hour and merged it with the day data.

### 4.Basic Checks

In [None]:
#checking the first five rows of the data
data.head()

In [None]:
#checking the last five rows of the data
data.tail()

In [None]:
#checking the number of rows and column in the data
data.shape

Insight: We have 731 observations, 19 predictors and 1 target variable.

In [None]:
#checking the predictors of the data
data.columns

In [None]:
#checking the unique values
for i in data:
    print(i)
    print(data[i].unique())
    print(f'Number of unique values:{data[i].nunique()}')
    print('  ')

In [None]:
#checking the count of unique values
for i in data:
    print(data[i].value_counts())

In [None]:
#checking a concise summary of a data
data.info()

In [None]:
#To check the descriptive statistics of a data
data.describe()

### 5.Exploratory Data Analysis(EDA)

In [None]:
report=ProfileReport(data,title='Bike Rental Prediction',explorative=False)
report

Insight:The data has 10 Numerical,1 Datetime and 5 Categorical columns.

#### 5.1 Univariate Analysis

#### 5.1.1 For Categorical Column

In [None]:
#examining a single variable

In [None]:
categorical_col=['season','yr','holiday','workingday','weathersit']
plt.figure(figsize=(20,25))
plotnumber=1
for i in categorical_col:
    if plotnumber<=5:
        sp=plt.subplot(3,3,plotnumber)
        sns.countplot(x=i,data=data)
        
        plt.xlabel(i.title(),fontsize=20)
        plt.ylabel('count',fontsize=20)
    plotnumber+=1
plt.tight_layout()      

#### 5.1.2 For Numerical Columns

In [None]:
numerical_col=['mnth','weekday','temp', 'atemp', 'hum', 'windspeed','casual', 'registered','peak_hour', 'peak_hour_rentals',
       'morning_rental_ratio', 'night_rental_ratio']
plt.figure(figsize=(20,30),facecolor='white')
plotnumber=1
for i in numerical_col:
    if plotnumber<=12:
        sp=plt.subplot(4,3,plotnumber)
        sns.histplot(data[i],kde=True)

        plt.xlabel(i.title(),fontsize=20)
        plt.ylabel('count',fontsize=20)
    plotnumber+=1
plt.tight_layout()

#### 5.2 Bivariate Analysis

In [None]:
#examining the relationship between the numerical and categorical variable

In [None]:
plt.figure(figsize=(20,20),facecolor='white')
plotnumber=1
for i in categorical_col:
    if plotnumber<=5:
        sp=plt.subplot(3,3,plotnumber)
        sns.boxplot(x=i,y='cnt',data=data)

        plt.xlabel(i.title(),fontsize=20)
        plt.ylabel('cnt',fontsize=20)
    plotnumber+=1
plt.tight_layout()

In [None]:
#examining the relationship between two numerical variables

In [None]:
plt.figure(figsize=(20,18),facecolor='white')
plotnumber=1
for i in numerical_col:
    if plotnumber<=12:
        sp=plt.subplot(4,3,plotnumber)
        sns.scatterplot(x=i,y='cnt',data=data)

        plt.xlabel(i.title(),fontsize=20)
        plt.ylabel('cnt',fontsize=20)
    plotnumber+=1
plt.tight_layout()

#### 5.3 Multivariate Analysis

In [None]:
#to understand relationships, patterns, and dependencies between variables
sns.pairplot(data)
plt.show()

#### 5.4 Correlation

In [None]:
#visualizing the correlation between variables
correlation_matrix=data.drop(columns=['cnt']).corr()
plt.figure(figsize=(15,10))
sns.heatmap(correlation_matrix,annot=True,cmap='coolwarm',fmt='.2f')
plt.title('Correlation Matrix',fontsize=15)
plt.show()

Insights:
1. Bike rentals are significantly higher during the summer season.
2. The number of bike rentals is relatively low during weather conditions such as Light Snow, Light Rain with Thunderstorms, and Scattered Clouds.
3. The count of bike rentals is particularly low in the spring season and holidays(i.e.weekend).4. The highest number of rentals occurs during favorable weather conditions, while the lowest rentals are observed during poor weather conditions.
5. The season variable has a strong correlation with the month variable.
6. Temperature (temp) and feels-like temperature (atemp) are positively correlated with cnt (bike rentals).
7. The highest number of rentals occurs between 15.00 PM and 20.00 PM.

### 6.Data Preprocessing

#### 6.1 Dropping an unrequired Columns

In [None]:
columns_to_drop=['instant','dteday','season','atemp','casual','registered']
for i in columns_to_drop:
    if i in data.columns:
        data.drop(columns=[i],inplace=True)
data

Insights:
1. Dropped the 'instant' and 'dteday' features because they contain unique values, making them irrelevant for analysis.
2. Dropped 'season'and 'atemp' due to high correlation with 'mnth' and 'temp',reducing redundancy and avoiding multicollinearity.
3. The 'casual' and 'registered' features were removed since their cumulative sum forms 'cnt', the target variable. Keeping them could lead to data leakage in predictive modeling. 

#### 6.2 Handlling Null Values

In [None]:
#checking the null values
print(data.isnull().sum())

In [None]:
#getting the row containing null value
data[data['morning_rental_ratio'].isna()]

In [None]:
data['morning_rental_ratio']=data['morning_rental_ratio'].fillna(0)

In [None]:
data.isnull().sum().sum()

Insight:The final dataset contained null values, which we successfully handled using the fillna() method to ensure data consistency.

#### 6.3 Handling Duplicates

In [None]:
#checking for duplicates
print(data.duplicated())

Insight:Thre are no duplicates.

#### 6.4 Outliers Handling

In [None]:
#checkking the outliers
plt.figure(figsize=(25,10))
sns.boxplot(data)
plt.title("Boxplot for Outlier Detection",fontsize=15)
plt.show()

Insight:There are no significant outliers in this dataset, as observed from the boxplot analysis.

### 7.Data Splitting

In [None]:
#extracting independent features from the data
x=data.iloc[:,:-1]

#extracting dependent feature
y=data['cnt']

In [None]:
#splitting the training and testing data
x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.70,random_state=42)

### 8.Model Training

### 8.1 Linear Regression

In [None]:
# Initializing LinearRegression model
linear_model=LinearRegression()

# Fitting the training data to the Linear Regression model
linear_model.fit(x_train,y_train)

# predicting the target for testing data
y_pred_linear=linear_model.predict(x_test)

In [None]:
# Evaluating the LinearRegression model's performance by R2_score 
Linear_R2=r2_score(y_test,y_pred_linear)
print(f'r2_score(linear)        : {Linear_R2}')
print(f'mean_squared_error      : {mean_squared_error(y_test,y_pred_linear)}')
print(f'mean_absolute_error     : {mean_absolute_error(y_test,y_pred_linear)}')
print(f'root_mean_square_error  : {np.sqrt(mean_squared_error(y_test,y_pred_linear))}')

### 8.2 Decision Tree

In [None]:
#initializing the model
DT=DecisionTreeRegressor()

#fitting the training data to the model
DT.fit(x_train,y_train)

#predicting the target for testing data
y_pred_DT=DT.predict(x_test)

In [None]:
# Evaluating the DecisionTreeRegressor model's performance by R2_score 
DT_R2=r2_score(y_test,y_pred_DT)
print(f'r2_score(Dt) : {DT_R2}')

#### 8.2.1 Hyperparameter Tuning

In [None]:
# Initializing DecisionTreeRegressor model
DT_hp=DecisionTreeRegressor()

In [None]:
# Creating a dictionary with possible Hyperparameters
params={'splitter':["best", "random"],
        'criterion':["squared_error", "absolute_error"],
        'max_depth':list(range(1,20)),
        'min_samples_split':list(range(1,20)),
        'min_samples_leaf':list(range(1,20)),
        }

In [None]:
# Initializing RandomizedSearchCV
tree_cv=RandomizedSearchCV(estimator=DT_hp,
                    param_distributions=params,
                    scoring='r2',
                    n_jobs=-1,
                    cv=5,
                    verbose=3)

In [None]:
# Fitting the training data to the RandomizedSearchCV
tree_cv.fit(x_train,y_train)

In [None]:
# Getting best hyperparameters
tree_cv.best_params_

In [None]:
# Initializing DecisionTreeRegressor model
decision_tree_model=DecisionTreeRegressor(criterion='squared_error',max_depth=16,min_samples_leaf=13,min_samples_split=12,splitter='best')

# Fitting the training data to the DecisionTreeRegressor model
decision_tree_model.fit(x_train,y_train)

# predicting the target for testing data
y_pred_hp=decision_tree_model.predict(x_test)

In [None]:
# Evaluating the DecisionTreeRegressor model's performance by R2_score 
Dt_hp_R2=r2_score(y_test,y_pred_hp)
print(f'r2_score(Dt)            : {Dt_hp_R2}')
print(f'mean_squared_error      : {mean_squared_error(y_test,y_pred_hp)}')
print(f'mean_absolute_error     : {mean_absolute_error(y_test,y_pred_DT)}')
print(f'root_mean_square_error  : {np.sqrt(mean_squared_error(y_test,y_pred_DT))}')

### 8.3 Random Forest

In [None]:
# Initializing RandomForestRegressor model
RF_model=RandomForestRegressor()

# Fitting the training data to the RandomForestRegressor model
RF_model.fit(x_train,y_train)

# predicting the target for testing data
y_pred_RF=RF_model.predict(x_test)

In [None]:
# Evaluating the RandomForestRegressor model's performance by R2_score
RF_R2=r2_score(y_test,y_pred_RF)
print(f'r2_score(Rf) : {RF_R2}')

#### 8.3.1 Hyperparameter Tuning

In [None]:
# Initializing RandomForestRegressor model
RF_ht=RandomForestRegressor()

In [None]:
# Creating a dictionary with possible Hyperparameters
params={'n_estimators':[100,200],
        'max_depth':list(range(1,20)),
        'min_samples_split':list(range(1,10)),
        'min_samples_leaf':list(range(1,10)),
        }

In [None]:
# Initializing RandomizedSearchCV
RF_cv=RandomizedSearchCV(estimator=RF_ht,
                    param_distributions=params,
                    scoring='r2',
                    n_jobs=-1,
                    cv=3,
                    verbose=3)

In [None]:
# Fitting the training data to the RandomizedSearchCV
RF_cv.fit(x_train,y_train)

In [None]:
# Getting the best hyperparameters
RF_cv.best_params_

In [None]:
# Initializing RandomForestRegressor model
random_forest_model=RandomForestRegressor(n_estimators=200,max_depth=7,min_samples_leaf=1,min_samples_split=5)

# Fitting the training data to the RandomForestRegressor model
random_forest_model.fit(x_train,y_train)

# predicting the target for testing data
y_pred_ht=random_forest_model.predict(x_test)

In [None]:
# Evaluating the RandomForestRegressor model's performance by R2_score
RF_hp_R2=r2_score(y_test,y_pred_ht)
print(f'r2_score(Rf)            : {RF_hp_R2}')
print(f'mean_squared_error      : {mean_squared_error(y_test,y_pred_ht)}')
print(f'mean_absolute_error     : {mean_absolute_error(y_test,y_pred_ht)}')
print(f'root_mean_square_error  : {np.sqrt(mean_squared_error(y_test,y_pred_ht))}')

### 8.4 KNN

In [None]:
#Initializing KNeighborsRegressor model
knn=KNeighborsRegressor(n_neighbors=5)

#Fitting the training data to the KNeighborsRegressor model
knn.fit(x_train,y_train)

# predicting the target for testing data
y_pred_knn=knn.predict(x_test)

In [None]:
# Evaluating the KNN model's performance by R2_score
Knn_R2=r2_score(y_test,y_pred_knn)
print(f'r2_score(knn) : {Knn_R2}')

#### 8.4.1 Hyperparameter Tuning

In [None]:
# Initialize KNN model
knn = KNeighborsRegressor()

# creating a dictionary with possible parameters
param_grid = {'n_neighbors': range(1, 30, 2),  # Try odd values from 1 to 29
              'weights': ['uniform', 'distance'],  
              'metric': ['euclidean', 'manhattan']
              }


In [None]:
# Initializing RandomizedSearch CV
knn_cv= RandomizedSearchCV(knn, param_distributions=param_grid, cv=5, scoring='r2', n_jobs=-1)

# Fitting the training data to the RandomizedSearchCV
knn_cv.fit(x_train, y_train)


In [None]:
# Getting the best hyperparameters
knn_cv.best_params_

In [None]:
# Initializing KNN model
Knn_model=KNeighborsRegressor(weights='distance', n_neighbors= 9, metric='manhattan')

# Fitting the training data to the KNN model
Knn_model.fit(x_train,y_train)

# predicting the target for testing data
y_pred_ht_knn=Knn_model.predict(x_test)

In [None]:
# Evaluating the KNN model's performance by R2_score
Knn_hp_R2=r2_score(y_test,y_pred_ht_knn)
print(f'r2_score(knn)           : {Knn_hp_R2}')
print(f'mean_squared_error      : {mean_squared_error(y_test,y_pred_ht_knn)}')
print(f'mean_absolute_error     : {mean_absolute_error(y_test,y_pred_ht_knn)}')
print(f'root_mean_square_error  : {np.sqrt(mean_squared_error(y_test,y_pred_ht_knn))}')

### 8.5 XGBoost

In [None]:
#initializing XGBoostRegressor
XGB=XGBRegressor()

#fitting the training data to the model
XGB.fit(x_train,y_train)

#predicting the target for testing data
y_pred_XGB=XGB.predict(x_test)


In [None]:
# Evaluating the XGBoost's performance by R2_score
Xgb_R2=r2_score(y_test,y_pred_XGB)
print(f' r2_score(XGB) : {Xgb_R2}')

#### 8.4.1 Hyperparameter Tuning

In [None]:
# Creating a dictionary with possible Hyperparameters
xg_param_grid = {"gamma":[0,0.1,0.2,0.4],
                 "learning_rate":[0.01,0.02,0.03,0.04,0.05,0.06,0.1],
                 "max_depth":list(range(1,11)),
                 "n_estimators":[50,65,80,100,150],
                 "alpha":[0,0.1,0.5,1],
                 }

In [None]:
# Initializing XGBRegressor model
XGB_hp = XGBRegressor()

# Initializing RandomizedSearchCV
xgb_rcv = RandomizedSearchCV(estimator=XGB_hp, scoring="r2", param_distributions=xg_param_grid , cv=5, verbose=3,n_jobs=-1)

In [None]:
# Fitting the training data to the RandomizedSearchCV
xgb_rcv.fit(x_train, y_train)

In [None]:
#getting the best params
xgb_rcv.best_params_

In [None]:
#initializing XGB model
xgb_model=XGBRegressor(alpha=0, gamma=0,learning_rate=0.2, max_depth=2, n_estimators=65)

#fitting the training data to the model
xgb_model.fit(x_train,y_train)

#predicting the target for the testing data
xgb_y_pred=xgb_model.predict(x_test)

In [None]:
# Evaluating the XGBoost's performance by R2_score
Xgb_hp_R2=r2_score(y_test,xgb_y_pred)
print(f'r2_score               : {Xgb_hp_R2}')
print(f'mean_squared_error     : {mean_squared_error(y_test,xgb_y_pred,)}')
print(f'mean_absolute_error    : {mean_absolute_error(y_test,xgb_y_pred)}')
print(f'root_mean_square_error : {np.sqrt(mean_squared_error(y_test,xgb_y_pred))}')

### 8.5 Gradient Boosting

In [None]:
#initializing the model
gbm=GradientBoostingRegressor()

#fitting the training data to the model
gbm.fit(x_train,y_train)

#predicting the target for the testing data
y_pred_gbm=gbm.predict(x_test)


In [None]:
# Evaluating the GradientBoosting's performance by R2_score
gbm_R2=r2_score(y_test,y_pred_gbm)
print(f' r2_score(GBR) : {gbm_R2}')

#### 8.5.1 Hyperparameter Tuning

In [None]:
# Creating a dictionary with possible Hyperparameters
params={'n_estimators':[100,200,300],
        'learning_rate':[0.001,0.01,0.02,0.03,0.1],
        'max_depth':list(range(1,20)),
        'min_samples_split':list(range(1,10)),
        'min_samples_leaf':list(range(1,10)),
       }


In [None]:
#initializing the model
gbm_hp=GradientBoostingRegressor()

#initializing RandomizedSearchCv
gbm_cv=RandomizedSearchCV(estimator=gbm_hp,scoring='r2',param_distributions=params,cv=5,verbose=2,n_jobs=-1,n_iter=100)

#fitting a training data into RandomizedSearchCV
gbm_cv.fit(x_train,y_train)

In [None]:
#getting the best params
gbm_cv.best_params_

In [None]:
#initializing the model
gbr_best=GradientBoostingRegressor(n_estimators=200, min_samples_split=5, min_samples_leaf=5, max_depth=4, learning_rate=0.1)

#fitting the training data to the model
gbr_best.fit(x_train,y_train) 

#predicting the target for the testing data
y_pred_gbr_hp=gbr_best.predict(x_test)

In [None]:
# Evaluating the GradientBoosting's performance by R2_score
Gb_hp_R2=r2_score(y_test,y_pred_gbr_hp)
print(f'r2_score (gb)          : {Gb_hp_R2}')
print(f'mean_squared_error     : {mean_squared_error(y_test,y_pred_gbr_hp)}')
print(f'mean_absolute_error    : {mean_absolute_error(y_test,y_pred_gbr_hp)}')
print(f'root_mean_square_error : {np.sqrt(mean_squared_error(y_test,y_pred_gbr_hp))}')

In [None]:
#checking whether the best model has overfitted 
from sklearn.model_selection import KFold, cross_val_score

cv = KFold(n_splits=5, shuffle=True, random_state=42)
cross_val_results = cross_val_score(gbr_best, x, y, scoring='r2', cv=cv)
print(f'Std Dev with K-Fold: {np.std(cross_val_results)}')

### 8.6 SVR

In [None]:
# Initializing SVR model
svr_model=SVR()

# Fitting the training data to the SVR model
svr_model.fit(x_train,y_train)

# predicting the target for testing data
y_pred_svr=svr_model.predict(x_test)

In [None]:
# Evaluating the SVR's performance by R2_score
Svr_R2=r2_score(y_test,y_pred_svr)
print(f' r2_score (svr) : {Svr_R2}')

#### 8.6.1 Hyperparameter Tuning

In [None]:
#initializing the model
hp_model=SVR()

In [None]:
# Creating a dictionary with possible Hyperparameters
params={'kernel':['linear', 'rbf'],
        'degree':[1,2,3,4,5],
        'gamma':['scale', 'auto'],
        'C':[0.1, 1, 10, 100]
        }

In [None]:
# Initializing RandomizedSearchCV
svr_cv=RandomizedSearchCV(estimator=hp_model,
                          param_distributions=params,
                          scoring='r2',
                          n_jobs=-1,
                          cv=5,
                          verbose=3
                      
                     )

In [None]:
# Fitting the training data to the RandomizedSearchCV
svr_cv.fit(x_train,y_train)

In [None]:
# Getting best hyperparameters
svr_cv.best_params_

In [None]:
# Initializing SVR model
svr_model_hp=SVR(kernel='linear',gamma='scale',degree=5,C=100)

# Fitting the training data to the SVR model
svr_model_hp.fit(x_train,y_train)

# predicting the target for testing data
y_pred_hp_svr=svr_model_hp.predict(x_test)

In [None]:
# Evaluating the SVR model's performance by R2_score 
Svr_hp_R2=r2_score(y_test,y_pred_hp_svr)
print(f'r2_score (xgb)          : {Svr_hp_R2}')
print(f'mean_squared_error      : {mean_squared_error(y_test,y_pred_hp_svr)}')
print(f'mean_absolute_error     : {mean_absolute_error(y_test,y_pred_hp_svr)}')
print(f'root_mean_square_error  : {np.sqrt(mean_squared_error(y_test,y_pred_hp_svr))}')

### 9. Model Comparison Report

In [None]:
comparison_dict = {'Model':['LinearRegression','DecisionTreeRegressor','RandomForestRegressor','KNN','XGBoostRegressor','GradientBoostingRegressor','SVR'],
                   'R2-Score':[Linear_R2,Dt_hp_R2,RF_hp_R2,Knn_hp_R2,Xgb_hp_R2,Gb_hp_R2,Svr_hp_R2]
                  }
    

# Creating DataFrame
comparison_df = pd.DataFrame(comparison_dict)
print(comparison_df)

In [None]:
comparison_df = comparison_df.sort_values(by='R2-Score', ascending=True)

max_index = comparison_df['R2-Score'].idxmax() 

plt.figure(figsize=(15, 5))
sns.lineplot(data=comparison_df, x='Model', y='R2-Score', marker='o', color='red', label="Model Performance")


plt.xlabel("Model Name")
plt.ylabel("R2-Score")

plt.title("Comparison of Model Performance (R2_Score)")
plt.show()

### 10. Conclusion

Among the different algorithms we used to train our models, the Gradient Boosting model performed the best. The model worked well after being checked for overfitting, keeping the standard deviation within the allowed range (<=0.05).Additionally, we evaluated multiple metrics, and the error also remained within the acceptable range(<10%) indicating good model performance. These results confirms that the GRADIENT BOOSTING model is well-suited for this dataset. 

### 11. Report on challenges faced

1. We were provided with two datasets,and initially, handling them was challenging, especially when extracting features from the hourly dataset.
2. With many attributes in the dataset, understanding and analyzing different domains was quite challenging.
3. And also finding the relations between features and target are challenging.
 