In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv("bike_sharing.csv")

In [4]:
df.head()

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


## <font color='purple'>1. Define Problem Statement and perform Exploratory Data Analysis.</font>

### <font color='purple'>Definition of problem (as per given problem statement with additional views)</font>

### Problem Statement

As cities continue to expand, sustainable transportation options like **bike-sharing systems** have become increasingly popular. These systems allow users to rent bicycles for short trips, reducing congestion, pollution, and dependency on private vehicles. 

However, one major challenge faced by bike-sharing companies is **understanding and predicting the demand for bikes**. Demand varies based on multiple factors such as **season, weather conditions, temperature, humidity, working days, holidays, and time of the year**. 

the main goal is to analyze this dataset to:
- Understand **how different factors influence the number of bikes rented**.
- Identify **patterns and trends** in user behavior.
- Provide **data-driven recommendations** to optimize operations, bike availability, and customer satisfaction.

This analysis aims to help management make better business decisions such as:
- **Forecasting demand** for future planning.
- **Improving fleet management** by ensuring bikes are available when and where they’re needed most.
- **Designing marketing strategies** based on customer usage patterns.
- **Enhancing profitability** by aligning resources with customer demand patterns.

By combining exploratory data analysis and statistical insights, this project will deliver a **clear understanding of demand behavior** and guide **strategic business decisions** for improved efficiency and customer experience.


**Objectives**
1. Analyze the key factors that affect bike rental counts.
2. Identify seasonal and temporal trends influencing usage.
3. Understand how weather variables (temperature, humidity, windspeed) impact rentals.
4. Recommend strategies to optimize inventory and operations.



### <font color='purple'>Observations on shape of data, data types of all the attributes, conversion of categorical attributes to 'category' (If required) , missing value detection, statistical summary.</font>

In [6]:
df.shape

print(f"There are {df.shape[0]} rows and {df.shape[1]} columns are present in the dataset ")

There are 10886 rows and 12 columns are present in the dataset 


In [5]:
df.dtypes

datetime       object
season          int64
holiday         int64
workingday      int64
weather         int64
temp          float64
atemp         float64
humidity        int64
windspeed     float64
casual          int64
registered      int64
count           int64
dtype: object

In [8]:
df.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   datetime    10886 non-null  object 
 1   season      10886 non-null  int64  
 2   holiday     10886 non-null  int64  
 3   workingday  10886 non-null  int64  
 4   weather     10886 non-null  int64  
 5   temp        10886 non-null  float64
 6   atemp       10886 non-null  float64
 7   humidity    10886 non-null  int64  
 8   windspeed   10886 non-null  float64
 9   casual      10886 non-null  int64  
 10  registered  10886 non-null  int64  
 11  count       10886 non-null  int64  
dtypes: float64(3), int64(8), object(1)
memory usage: 1.7 MB


In [21]:
df['season'] = df['season'].astype('category')
df['holiday'] = df['holiday'].astype('category')
df['workingday'] = df['workingday'].astype('category')
df['weather'] = df['weather'].astype('category')


In [27]:
df.describe()

Unnamed: 0,temp,atemp,humidity,windspeed,casual,registered,count
count,10886.0,10886.0,10886.0,10886.0,10886.0,10886.0,10886.0
mean,20.23086,23.655084,61.88646,12.799395,36.021955,155.552177,191.574132
std,7.79159,8.474601,19.245033,8.164537,49.960477,151.039033,181.144454
min,0.82,0.76,0.0,0.0,0.0,0.0,1.0
25%,13.94,16.665,47.0,7.0015,4.0,36.0,42.0
50%,20.5,24.24,62.0,12.998,17.0,118.0,145.0
75%,26.24,31.06,77.0,16.9979,49.0,222.0,284.0
max,41.0,45.455,100.0,56.9969,367.0,886.0,977.0


In [30]:
df.describe(include = "category")

Unnamed: 0,season,holiday,workingday,weather
count,10886,10886,10886,10886
unique,4,2,2,4
top,4,0,1,1
freq,2734,10575,7412,7192


In [31]:
df["datetime"]  = pd.to_datetime(df["datetime"])

In [32]:
df.info(memory_usage = "deep") 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   datetime    10886 non-null  datetime64[ns]
 1   season      10886 non-null  category      
 2   holiday     10886 non-null  category      
 3   workingday  10886 non-null  category      
 4   weather     10886 non-null  category      
 5   temp        10886 non-null  float64       
 6   atemp       10886 non-null  float64       
 7   humidity    10886 non-null  int64         
 8   windspeed   10886 non-null  float64       
 9   casual      10886 non-null  int64         
 10  registered  10886 non-null  int64         
 11  count       10886 non-null  int64         
dtypes: category(4), datetime64[ns](1), float64(3), int64(4)
memory usage: 723.7 KB


In [22]:
df.columns

Index(['datetime', 'season', 'holiday', 'workingday', 'weather', 'temp',
       'atemp', 'humidity', 'windspeed', 'casual', 'registered', 'count'],
      dtype='object')

In [23]:
df["season"].value_counts()

4    2734
2    2733
3    2733
1    2686
Name: season, dtype: int64

In [24]:
df["holiday"].value_counts()

0    10575
1      311
Name: holiday, dtype: int64

In [25]:
df["workingday"].value_counts()

1    7412
0    3474
Name: workingday, dtype: int64

In [26]:
df["weather"].value_counts()

1    7192
2    2834
3     859
4       1
Name: weather, dtype: int64

### Missing value detection