# Data Analysis Project: Bike Sharing Dataset
- **Name:** Ridwan Halim
- **Email:** ridwaanhall.dev@gmail.com (old version: erbyl14@gmail.com)
- **Dicoding ID:** ridwaanhall


## Determining Business Questions

- In which season are bikes rented the most and the least?
- What is the relationship between weather conditions and the total bike rentals (`cnt`)?
- How do bike rental trends differ between working days (`workingday`) and weekends/holidays (`holiday`)?

## Import All Packages/Libraries Used

In [9]:
import pandas as pd

## Data Wrangling

### Gathering Data

In [19]:
data_day = pd.read_csv('data/day.csv')
data_day.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [20]:
data_hour = pd.read_csv('data/hour.csv')
data_hour.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


### Assessing Data

#### data_day

##### Get info from `data_day`

In [33]:
data_day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


**Insight:**

- The dataset is a `pandas` DataFrame.
- It contains 731 entries.
- There are 16 columns in total.
- All columns have no missing values.
- Data types:
  - 4 columns of type `float64`: `temp`, `atemp`, `hum`, `windspeed`
  - 11 columns of type `int64`: `instant`, `season`, `yr`, `mnth`, `holiday`, `weekday`, `workingday`, `weathersit`, `casual`, `registered`, `cnt`
  - 1 column of type `object`: `dteday`
- Memory usage: 91.5+ KB

---

**Mistakes:**

Points to be noted further:

Column `dteday`: The data type of this column is `object`. If this column contains dates, it should be converted to `datetime` data type.

##### Get describe from `data_day`

In [38]:
data_day.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


**Insight:**
- The dataset has 731 entries.
- **Mean Values**:
  - Average temperature (`temp`): 0.495
  - Average feeling temperature (`atemp`): 0.474
  - Average humidity (`hum`): 0.628
  - Average windspeed: 0.190
  - Average casual rentals: 848
  - Average registered rentals: 3656
  - Average total rentals (`cnt`): 4504

- **Standard Deviation**:
  - High variability in casual (686) and registered (1560) rentals.

- **Min and Max Values**:
  - Temperature ranges from 0.059 to 0.862
  - Feeling temperature ranges from 0.079 to 0.841
  - Humidity ranges from 0.000 to 0.973
  - Windspeed ranges from 0.022 to 0.507
  - Casual rentals range from 2 to 3410
  - Registered rentals range from 20 to 6946
  - Total rentals range from 22 to 8714

- **Quartiles**:
  - 25% of total rentals are below 3152
  - 50% of total rentals are below 4548
  - 75% of total rentals are below 5956

- **Potential Anomalies**:
   - Humidity (`hum`) has a minimum value of 0. This might be unrealistic as humidity typically does not reach 0%.
   - Windspeed (`windspeed`) also has a minimum value of 0. This might indicate a windless day, but it should be checked if this is reasonable or if there is a measurement error.
   - There is a significant difference between the number of casual and registered rentals. This might be normal, but it should be checked for any patterns or anomalies.
   - The maximum value for casual rentals is 3410, which is quite high compared to the average of 848. This might indicate certain days with very high rental activity.
   - The maximum value for registered rentals is 6946, which is also quite high compared to the average of 3656. This might indicate certain days with very high rental activity.
   - The maximum value for total rentals (`cnt`) is 8714, which is much higher than the average of 4504. This might indicate certain days with very high rental activity.

##### Checking for missing and duplicate data

In [34]:
data_day.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

**Insight:**
- All columns have no missing values.

In [35]:
print(data_day.duplicated().sum())

0


**Insight:**
- No duplicate data

#### data_hour

##### Get info from `data_hour`

In [36]:
data_hour.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


**Insight:**
- The dataset is a `pandas` DataFrame.
- It contains 17,379 entries.
- There are 17 columns in total.
- All columns have no missing values.
- Data types:
  - 4 columns of type `float64`: `temp`, `atemp`, `hum`, `windspeed`
  - 12 columns of type `int64`: `instant`, `season`, `yr`, `mnth`, `hr`, `holiday`, `weekday`, `workingday`, `weathersit`, `casual`, `registered`, `cnt`
  - 1 column of type `object`: `dteday`
- Memory usage: 2.3+ MB

---

**Mistakes:**

Points to be noted further:

Column `dteday`: The data type of this column is `object`. If this column contains dates, it should be converted to `datetime` data type.

##### Get describe from `data_hour`

In [39]:
data_hour.describe()

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


**Insight:**
- The dataset has 17,379 entries.
- **Mean Values**:
  - Average hour (`hr`): 11.55
  - Average temperature (`temp`): 0.497
  - Average feeling temperature (`atemp`): 0.476
  - Average humidity (`hum`): 0.627
  - Average windspeed: 0.190
  - Average casual rentals: 35.68
  - Average registered rentals: 153.79
  - Average total rentals (`cnt`): 189.46

- **Standard Deviation**:
  - High variability in casual (49.31) and registered (151.36) rentals.

- **Min and Max Values**:
  - Temperature ranges from 0.020 to 1.000
  - Feeling temperature ranges from 0.000 to 1.000
  - Humidity ranges from 0.000 to 1.000
  - Windspeed ranges from 0.000 to 0.851
  - Casual rentals range from 0 to 367
  - Registered rentals range from 0 to 886
  - Total rentals range from 1 to 977

- **Quartiles**:
  - 25% of total rentals are below 40
  - 50% of total rentals are below 142
  - 75% of total rentals are below 281

- **Potential Anomalies**:
  - Humidity and windspeed have minimum values of 0, which might be unrealistic.
  - Casual and registered rentals have minimum values of 0, indicating no rentals at certain times.

##### Checking for missing or duplicate data

In [37]:
data_hour.isnull().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

**Insight:**
- All columns have no missing values.

In [40]:
print(data_hour.duplicated().sum())

0


**Insight:**
- No duplicate data

### Cleaning Data

#### data_day

##### Drop (deleting) some unused columns

In [41]:
data_day = data_day.drop(columns=['windspeed', 'weekday'])
data_day.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,workingday,weathersit,temp,atemp,hum,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,2,0.344167,0.363625,0.805833,331,654,985
1,2,2011-01-02,1,0,1,0,0,2,0.363478,0.353739,0.696087,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,0.196364,0.189405,0.437273,120,1229,1349
3,4,2011-01-04,1,0,1,0,1,1,0.2,0.212122,0.590435,108,1454,1562
4,5,2011-01-05,1,0,1,0,1,1,0.226957,0.22927,0.436957,82,1518,1600


Several columns were removed as they are not required for the analysis of the defined business questions.

- `windspeed`: Contains information about wind speed, which is not needed for the analysis (Irrelevant Data).
- `weekday`: Contains information about the day of the week, which is redundant since we already have `workingday` and `holiday` columns (Redundancy).

## Exploratory Data Analysis (EDA)

### Explore ...

**Insight:**
- xxx
- xxx

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

**Insight:**
- xxx
- xxx

## Advanced Analysis (Optional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2