# Bike Sharing Demand Prediction - Linear Regression

## Introduction
This notebook demonstrates the implementation of a multiple linear regression model to predict bike-sharing demand. The goal is to identify significant factors influencing demand and evaluate the model's performance.

In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
import statsmodels.api as sm

In [None]:
# Load the dataset
file_path = 'day.csv'
data = pd.read_csv(file_path)

# Display the first few rows of the dataset
data.head()

## Data Preprocessing

In [None]:
# Converting categorical variables into dummy variables
categorical_vars = ['season', 'weathersit', 'mnth', 'weekday']
data = pd.get_dummies(data, columns=categorical_vars, drop_first=True)

# Dropping irrelevant columns
data.drop(['instant', 'dteday', 'casual', 'registered'], axis=1, inplace=True)

In [None]:
# Splitting the data into training and testing sets
X = data.drop('cnt', axis=1)
y = data['cnt']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Initial Model Building and Evaluation

In [None]:
# Adding a constant to the independent variables
X_train_sm = sm.add_constant(X_train)

# Building the regression model
model = sm.OLS(y_train, X_train_sm).fit()

# Model summary
model.summary()

## Addressing Multicollinearity

In [None]:
# Dropping redundant and insignificant variables
X_train_refined = X_train.drop(columns=['atemp'])
weekday_dummies = [col for col in X_train_refined.columns if 'weekday_' in col and col != 'weekday_6']
insignificant_vars = ['mnth_7', 'mnth_8', 'mnth_4', 'mnth_5', 'holiday']
X_train_refined = X_train_refined.drop(columns=weekday_dummies + insignificant_vars)

# Adding constant and rebuilding the refined model
X_train_refined_sm = sm.add_constant(X_train_refined)
refined_model = sm.OLS(y_train, X_train_refined_sm).fit()

# Refined model summary
refined_model.summary()

## Testing the Refined Model

In [None]:
# Refining the test dataset to match training features
X_test_refined = X_test.drop(columns=['atemp'] + weekday_dummies + insignificant_vars, errors='ignore')
X_test_refined_sm = sm.add_constant(X_test_refined)

# Making predictions on the test dataset
y_pred = refined_model.predict(X_test_refined_sm)

# Calculating the R-squared score
test_r2_score = r2_score(y_test, y_pred)
test_r2_score

## Conclusion
The refined model achieved a high R-squared score of **0.851** on the test dataset, indicating strong predictive performance. Significant factors affecting bike demand include `temp`, `yr`, `workingday`, `season`, and weather conditions.