# Project Data Analytics: Predicting BikeShare Rentals: Seasonality & Weather (2011-2012)

- Name: Stefanus Bernard Melkisedek
- Email: stefanussipahutar@gmail.com
- Id Dicoding: stefansphtr

## Define Business Questions

| **NO** | **BUSINESS QUESTION**                                                                                                                  |
| :----: | -------------------------------------------------------------------------------------------------------------------------------------- |
|   1.   | What is the impact of temperature on bike rentals?                              |
|   2.   | In which specific seasons do we see the most significant shifts in rental volume? |
|   3.   | Are there weekday-specific events that significantly affect rental volume in our area?                                                                                  |
|   4.   | Beyond temperature, which specific weather conditions (e.g., rain, wind) have the most significant impact on rentals?                                                  |
|   5.   | Based on historical data, can we develop a reliable model to predict daily/weekly bike rental demand?                                                     |

## Prepare all the library needed

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import statsmodels.api as sm
from module.plot_missing_value import percentage as pmv

## Data Wrangling

The data was obtained from the Kaggle Website, the dataset is called "Bike Sharing Dataset" and it was uploaded by LAKSHMIPATHI N in [here](https://www.kaggle.com/datasets/lakshmi25npathi/bike-sharing-dataset).

### Gathering data

1. Read the `day.csv` data and save it to the variable `day_df`

In [2]:
day_df = pd.read_csv('./data/day.csv')
day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


2. Read the `hour.csv` data and save it to the variable `hour_df`

In [4]:
hour_df = pd.read_csv('./data/hour.csv')
hour_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1
