# Submission - Bike Sharing

Name : Chalil Al Vareel <br>
From : Universitas Sumatera Utara<br>
Dicoding ID : cavareel<br>


## Business Question

1. What are the conditions when bicycles are used on weekday, working days, and holiday?
2. Does the weather play a role in the number of bicycle users?
3. Does season impact the number of bike users? If so, what season with the most bicycles rental?

Based on these questions, we're going to use only one dataset _(day.csv)_ because it's more relevant in our case

## Data Wrangling

### Import Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
df = pd.read_csv("data/day.csv")
df.head()

## Assessing Data

### Checking table data

In [None]:
df.info()

In [None]:
df.isna().sum()

In [None]:
print("Duplicate: ", df.duplicated().sum())


In [None]:
df.describe()

### Deleting unused column

In [None]:
drop_col = ['instant', 'windspeed']

for i in df.columns:
  if i in drop_col:
    df.drop(labels=i, axis=1, inplace=True)

In [None]:
df.head()

### Changing value detail and data type

In [None]:
# Changing column name
df.rename(columns={
    'dteday': 'date',
    'yr': 'year',
    'mnth': 'month',
    'weathersit': 'weather_condition',
    'cnt': 'count',
    'hum': 'humidity'
}, inplace=True)

df.head()

In [None]:
# Changing value detail for column weekday, month, season, and weather condition
df['weekday'] = df['weekday'].map({
    0: 'Sun', 1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat'
})

df['month'] = df['month'].map({
    1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun',
    7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'
})

df['season'] = df['season'].map({
    1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'
})

df['weather_condition'] = df['weather_condition'].map({
    1: 'Clear/Partly Cloudy',
    2: 'Misty/Cloudy',
    3: 'Light Snow/Light Rain',
    4: 'Severe Weather'
})

In [None]:
# Change data type into datetime
df['date'] = pd.to_datetime(df.date)

# Change data type into categorical
df['season'] = df.season.astype('category')
df['year'] = df.year.astype('category')
df['month'] = df.month.astype('category')
df['holiday'] = df.holiday.astype('category')
df['weekday'] = df.weekday.astype('category')
df['workingday'] = df.workingday.astype('category')
df['weather_condition'] = df.weather_condition.astype('category')

In [None]:
df.info()

In [None]:
df.head()

## EDA

### Data Exploration

1. Grouping bike users (both casual and registered) based on weekday use

In [None]:
df.groupby(by='weekday').agg({
    'count': ['max', 'min', 'mean']
})

Based on result above, The order of the average number of tenants from largest to smallest is: Friday (Fri), Thursday (Thu), Saturday (Sat), Wednesday (Wed), Tuesday (Tue), Monday (Mon) and Sunday (Sun).

2. Grouping bike users (both casual and registered) based on working use

In [None]:
df.groupby(by='workingday').agg({
    'count': ['max', 'min', 'mean', 'sum']
})

Based on the results above, people use bike more in working day rather than non-working day even thought the result is not slightly different

3. Grouping bike users (both casual and registered) based on holiday use

In [None]:
df.groupby(by='holiday').agg({
    'count': ['max', 'min', 'mean', 'sum']
})

Based on the result above, people tend to rent more in non-holiday rather than holiday

4. Grouping bike users (both casual and registered) based on month

In [None]:
df.groupby(by='month').agg({
    'count': ['max', 'min', 'mean', 'sum']
})

# .sort_values(by='month', ascending=True)

Based on the results above, the month with the largest average and the largest sum is June. While the smallest average and smallest sum is January.

5. Grouping bike users (both casual and registered) based on weather

In [None]:
df.groupby(by='weather_condition').agg({
    'count': ['max', 'min', 'mean', 'sum']
})

Based on the results above, people tend to rent when the weather is sunny or cloudy. Very few when it snows and none when the weather is severe.

6. Analyze number of temp, atemp, and humidity based on season

In [None]:
df.groupby(by='season').agg({
    'temp': ['max', 'min', 'mean'],
    'atemp': ['max', 'min', 'mean'],
    'humidity': ['max', 'min', 'mean']
    })

Based on the results of the above, the following statements can be concluded:

- The highest average humidity occurs in the winter hen followed by fall, summer, and spring.
- The highest average temperature and atemp occurs in the fall then followed by summer, winter, and spring.

7. Grouping bicycle users (both casual and registered) based on season

In [None]:
df.groupby(by='season').agg({
    'count': ['max', 'min', 'mean', 'sum']
})

Based on the result above, The order of the average number of tenants from largest to smallest is: Fall, Summer, Winter, and Spring

8. Looking for correlation between casual and registered user

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
correlation_matrix = df.corr(numeric_only=True)
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))

sns.heatmap(
    correlation_matrix,
    annot=True,
    mask=mask,
    cmap="viridis",
    center=0,
    fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

Based on the results of the above, the following statements can be concluded:

- casual is moderately correlated with temp and atemp (0.54), and slightly negative with hum (-0.08).
- registered has the same pattern as casual, and is moderately correlated with casual (0.40).
- atemp and temp are highly correlated (0.99).
- humidity has a weak correlation with temp and atemp (0.13 and 0.14).
- count is strongly correlated with temp, atemp, casual, and registered (0.63, 0.63, 0.67, and 0.95), and slightly negative with hum (-0.10).

## Explanatory Analysis & Conclusion

### 1. What are the conditions when bicycles are used on weekday, working days, and holiday?

In [None]:
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(30, 20))

# Weekday
sns.barplot(
    data=df,
    x='weekday',
    y='count',
    ax=axes[0])
axes[0].set_title('Bike Users in Weekday')
axes[0].set_xlabel('Days')
axes[0].set_ylabel('Bike Users')

# Workingday
sns.barplot(
    data=df,
    x='workingday',
    y='count',
    ax=axes[1])
axes[1].set_title('Bike Users in Workingday')
axes[1].set_xlabel('Workingday')
axes[1].set_ylabel('Bike Users')

# Holiday
sns.barplot(
    data=df,
    x='holiday',
    y='count',
    ax=axes[2])
axes[2].set_title('Bike Users in Holiday')
axes[2].set_xlabel('Holiday')
axes[2].set_ylabel('Bike Users')

plt.tight_layout()
plt.show()

Based on visualization above, we can conclude that:
- Friday is the day with the most bike renters, and Sunday is the day with the least number of bike renters.
- The number of bicycle renters is more on weekdays (i.e. Monday - Friday) than weekends (i.e. Saturday and Sunday). 0 indicates weekend and 1 indicates weekday.
- The number of bicycle renters is much higher on non-holidays than on holidays.

### 2. Does the weather play a role in the number of bicycle users?

In [None]:
plt.figure(figsize=(12, 6))
sns.barplot(
    data=df,
    x='weather_condition',
    y='count'
    )

plt.title('Bike Users based on Weather Condition')
plt.xlabel('Weather Condition')
plt.ylabel('Bike Users')
plt.show()

Based on visualization above, we can see that weather play a hure roles in bike sharing. People tend to rent the bike in `Clear/Partly Cloud` and `Misty/Cloudy` rather than in `Light Snow/Light Rain`. and there is no tenants in Severe Weather

### 3. Does season impact the number of bike users? If so, what season with the most bicycles rental?

In [None]:
plt.figure(figsize=(10, 6))

seasonal_usage = df.groupby(
    'season')[['registered', 'casual']].sum().reset_index()

plt.bar(
    seasonal_usage['season'],
    seasonal_usage['registered'],
    label='Registered',
    color='tab:green'
)

plt.bar(
    seasonal_usage['season'],
    seasonal_usage['casual'],
    label='Casual',
    color='tab:orange'
)

plt.title('Bike Users based on Season')
plt.legend()
plt.show()

Based on visualization above, we can see that either registered or casual users love to rent bike in Fall, followed by summer, winter, and sping. In conclusion, season play a big role of bike rental

### Conclusion

- Q1<br>
The conditions shown in this visualization are divided into three parts. For the number of cyclists by workingday, workingday outperforms non-workingday although the results are not very significant, but there is still a clear difference even though both are above 4000 users. Moving on to holidays, non-holiday wins with a significant difference from holiday. It is possible that users prefer to rent bicycles on weekdays. Finally based on weekday, based on the data above, it is Friday (Fri) that ranks first with a mean of 4690.288462 and the last place is occupied by Sunday (Sun): 4228.828571.

- Q2<br>
There is a clear relationship between the two. Bicycle renters like it best when the weather is Sunny/Slightly Cloudy. The second position is when the weather is Foggy/Cloudy. Lastly, users disliked when it was Slightly Snowy/Rainy. And `None` for Severe weather because it is impossible to cycle during such weather conditions.

- Q3<br>
The last question's results are quite surprising that Spring is the least Season with bike user instead of Winter. This can be due to many factors such as weather conditions, air temperature, working days, holidays, and others. The first place was taken by Fall, which achieved more than 800,000 bike rentals.