<center><h1>Project title: Seasonal Bike Rentals Prediction in Seoul </h1></center> <br>

This project aims to optimize bike rental operations in Seoul by leveraging data-driven insights. </br>
By analyzing various factors affecting bike rentals, we can improve resource allocation, enhance user experience, and potentially increase revenue for the bike rental service.

<h1>Project Objective</h1>

Identify and quantify the top 3-5 factors influencing bike rental demand in Seoul, providing actionable insights for operational decision-making. <br>
Develop a regression model to predict hourly bike rental demand with at least 85% accuracy <br>
Design and deploy an interactive web application using Streamlit, allowing users to test the regression model through an intuitive interface.

<h2>Phase One</h2>

<ul>
    <li>Data Acquisition: Gathering relevant datasets for analysis. </li>
    <li>Data Exploration: Analyzing data to understand its structure and key characteristics. </li>
    <li>Data Cleaning & Preprocessing: Preparing the data for modeling by handling missing values, outliers, etc. </li>
</ul>



<h2>Metadata for Seoul Bike data</h2>

<h3> Attribute information</h3> 
Source: https://archive.ics.uci.edu/dataset/560/seoul+bike+sharing+demand
<ul>
    <li><b>Date</b>  - Date bike was rented</li>
    <li><b>Rented Bike count</b> - Count of bikes rented at each hour</li>
    <li><b>Hour</b> - Hour of he day</li>
    <li><b>Temperature</b>-Temperature in Celsius</li>
    <li><b>Humidity</b> - Humidity %</li>
    <li><b>Windspeed</b></b> - In m/s</li>
    <li><b>Visibility - Visibility within 10m radius</li>
    <li><b>Dew point temperature</b>- Celsius</li>
    <li><b>Solar radiation</b> - MJ/m2</li>
    <li><b>Rainfall</b> - In mm</li>
    <li><b>Snowfall</b> - Snowfall (cm)</li>
    <li><b>Seasons</b> - Winter, Spring, Summer, Autumn</li>
    <li><b>Holiday</b> - Holiday/No holiday</li>
    <li><b>Functional Day</b> - NoFunc(Non Functional Hours), Fun(Functional hours)- Whetehr day is neither a weekend nor holiday (Work_Day)</li>
</ul>

In [None]:
# Loading packages

import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns
import matplotlib.pyplot as plt


In [None]:
data = pd.read_csv('../dataset/SeoulBikeData.csv', encoding='Windows-1252')

In [None]:
data.head()

In [None]:
data.info()

<h2>Exploratory Data Analysis (EDA)</h2>

In [None]:
# Create a copy of data

df = data.copy()

In [None]:
df.shape

In [None]:
df.describe().T 

In [None]:
# Are there duplicates
len(df[df.duplicated()])

In [None]:
# Checking distribution of bike rentals

sns.displot(df['Rented Bike Count'])

What does Functioning Day represent? There are no bike rentals for 'No' only 'Yes' - Can this be dropped off as its not adding any value?

In [None]:
# Creating additional features such as Week day and Month as bike rental may also depend on these spatial features

df['Date'] = pd.to_datetime(df['Date'], format="%d/%m/%Y") 
df['month'] = df['Date'].dt.month_name()
df['Weekday'] = df['Date'].dt.day_name()
df.columns
df.dtypes

# Order months in the right order
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
#months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df['month'] = pd.Categorical(df['month'], categories=months, ordered=True)
#df.sort_values(by='Date',inplace=True)

# Order week days in the right order
cats = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['Weekday'] = pd.Categorical(df['Weekday'], categories=cats, ordered=True)
#df.sort_values(by='Date',inplace=True)

In [None]:
df.head(3)

In [None]:
# Hitmap of bike rentals by Month and weekday
# ============================================================================================
df_pivot = df.groupby(['month','Weekday'])['Rented Bike Count'].sum().reset_index()
df_pivot

month_pivot=df_pivot.pivot_table(values='Rented Bike Count',index='Weekday',columns='month')

sns.set_theme(rc={'figure.figsize':(8,4)})

sns.heatmap(month_pivot, 
            cmap='Blues',
            linecolor='white',
            linewidth=0.5)

plt.ylabel("Week Day")
plt.xlabel("Month")

plt.show()

The heatmap shows most bikes are rented during the months of April to October spread across all weekdaya  with Friday and Saturday  recording the highest in June

In [None]:
# Selecting only numeric columns - drops of Seasons, Holiday Function Day Weekda and Month

df_numeric = df.select_dtypes(include=np.number)
sns.heatmap(df_numeric.corr(), annot=True)

Correlation between Tenperature and Dew point indicate existence of multicolinearity problems - suggest dropping Dew point feature

In [None]:
# Bike renatls by Month
#===================================

df.groupby('month')['Rented Bike Count'].sum().plot(kind='bar') 
#sns.barplot(x="month", y="Rented Bike Count", data=df)
plt.title("Number of Bike rentals per month")
plt.ylabel("Rentals")
plt.xlabel("Number of Bike rentals per month")

sns.despine(left=False, bottom=False)
plt.show()


In [None]:
# Bike renatls by Time of daya (hr)
#===================================
df.groupby('Hour')['Rented Bike Count'].sum().plot(kind='bar') 
plt.title("Number of Bike rentals per hour")
plt.ylabel("Rentals")
plt.xlabel("Time in hours")

sns.despine(left=False, bottom=False)
plt.show()

The hourly bike rentals has a bimodal distribution - One in the morning and the other in the afternoon/evening 

In [None]:
# Bike renatls by Temperature 
#===================================

df.groupby('Temperature(°C)')['Rented Bike Count'].sum().plot() 
plt.title("Temperature")
plt.ylabel("Rentals")
plt.xlabel("Temperature(°C)")

sns.despine(left=False, bottom=False)
plt.show()

In [None]:
# sns.jointplot(data=df, x = 'Temperature(°C)', y = 'Rented Bike Count')

In [None]:
# Bike renatls by Wind speed
#===================================

df.groupby('Wind speed (m/s)')['Rented Bike Count'].sum().plot() 
plt.title("Wind speed")
plt.ylabel("Rentals")
plt.xlabel("Wind speed (m/s)")

sns.despine(left=False, bottom=False)
plt.show()

In [None]:
# Bike renatls by Humidity
#===================================

df.groupby('Humidity(%)')['Rented Bike Count'].sum().plot() 
plt.title("Humidity")
plt.ylabel("Rentals")
plt.xlabel("Humidity(%)")

sns.despine(left=False, bottom=False)
plt.show()

In [None]:
# What are bike rentals by different Spatial features? 

df.groupby('Holiday')['Rented Bike Count'].sum().sort_values(ascending = False).reset_index()


In [None]:
Season_analysis = df.groupby('Seasons')['Rented Bike Count'].sum().sort_values(ascending = False).reset_index()

In [None]:
# Set the theme to white
sns.set_theme(style="white")
sns.set_theme(rc={'figure.figsize':(8,4)})


sns.set_theme(rc={'figure.figsize':(8,4)},palette='Blues_d')
sns.set_style(style='white')

sns.barplot(x='Seasons', y='Rented Bike Count', data=Season_analysis, 
             palette='rainbow',
            hue ="Seasons")

plt.title("Bike Rentals by Season")
plt.ylabel("Number of Rented Bike")
plt.xlabel("Seasons")

# Remove Top and Right borders
sns.despine(left=False, bottom=False)

plt.show()

Fewer bikes are rented in Winter compared to other seasons

In [None]:
# df.groupby('month')['Rented Bike Count'].sum().sort_values(ascending = False).reset_index()

In [None]:
# df.groupby('Weekday')['Rented Bike Count'].sum().sort_values(ascending = False).reset_index()

In [None]:
# df.groupby('Hour')['Rented Bike Count'].sum().sort_values(ascending = False).reset_index()

In [None]:
# (df.groupby(["Seasons","Humidity(%)", "Wind speed (m/s)"])['Rented Bike Count'].sum()).T