# Proyek Analisis Data: [Bike Sharing Dataset]
- **Nama:** Dicky Pratama Kusuma
- **Email:** 19.4.pratama@gmail.com
- **ID Dicoding:** prtmaars

## Menentukan Pertanyaan Bisnis

Dataset yang digunakan merupakan dataset rental sepeda di Washington D.C., USA. Data terdiri dari data harian dan data dalam interval 1 jam. Data yang digunakan merupakan data dari 1 Januari 2011 sampai dengan 31 Desember 2012. Dataset terdiri dari 16 kolom data dengan rincian sebagai berikut:
1. instant: Index data.
2. dteday : Tanggal diambilnya data.
3. season : Musim diambilnya data (1:spring, 2:summer, 3:fall, 4:winter).
4. yr : Tahun diambilnya data (0: 2011, 1:2012).
5. mnth : Bulan diambilnya data.
6. hr : Jam diambilnya data.
7. holiday : Boolean musim liburan (0: False, 1:True)
8. weekday : Hari dalam minggu (0-6).
9. workingday : Boolean hari kerja/libur (0: Kerja, 1:Libur).
10. weathersit : Kategori cuaca dengan rincian
    - 1: Cerah, Sedikit berawan, Sebagian berawan, Sebagian berawan
    - 2: Kabut + Berawan, Kabut + Awan terputus, Kabut + Sedikit berawan, Kabut
    - 3: Hujan salju ringan, Hujan ringan + Petir + Awan tersebar, Hujan ringan + Awan tersebar
    - 4: Hujan lebat + Butiran es + Petir + Kabut, Salju + Kabut
11. temp : Temperatur dalam derajat Celcius (°C). Nilai telah dinormalisasi dengan dibagi 41 (max).
12. atemp: Temperatur yang terasa dalam derajat Celcius (°C). Nilai telah dinormalisasi dengan dibagi 50 (max).
13. hum: Kelembapan. Nilai telah dinormalisasi dengan dibagi 100 (max).
14. windspeed: Kecepatan angin. Nilai telah dinormalisasi dengan dibagi 67 (max).
15. casual: Jumlah pengguna tidak terdaftar.
16. registered: Jumlah pengguna terdaftar.
17. cnt: Total jumlah sepeda sewaan, termasuk pengguna terdaftar dan tidak terdaftar.

**Pertanyaan:**
1. Analisis Cuaca: Bagaimana pengaruh kondisi cuaca dan temperatur terhadap jumlah total sepeda yang disewa?
2. Analisis Layanan: Jam berapa dan hari apa permintaan sepeda paling tinggi?

## Import Semua Packages/Library yang Digunakan

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**Penjelasan:** Library yang digunakan diantaranya adalah:
1. numpy, digunakan untuk komputasi numerik.
2. pandas, digunakan untuk manipulasi dan analisis data.
3. matplotlib, digunakan untuk visualisasi data.
4. seaborn, digunakan untuk visualisasi data.

## Data Wrangling

### Gathering Data

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [None]:
bike_hour = '/content/drive/My Drive/Project Dicoding/bike-sharing-dataset/hour.csv'
bike_day = '/content/drive/My Drive/Project Dicoding/bike-sharing-dataset/day.csv'

In [None]:
data_hour = pd.read_csv(bike_hour)
data_day = pd.read_csv(bike_day)

### Assessing Data

In [None]:
data_day

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.200000,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.229270,0.436957,0.186900,82,1518,1600
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,727,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
727,728,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.590000,0.155471,644,2451,3095
728,729,2012-12-29,1,1,12,0,6,0,2,0.253333,0.242400,0.752917,0.124383,159,1182,1341
729,730,2012-12-30,1,1,12,0,0,0,1,0.255833,0.231700,0.483333,0.350754,364,1432,1796


In [None]:
data_day.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


In [None]:
data_hour

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0000,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.80,0.0000,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.80,0.0000,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0000,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0000,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17374,17375,2012-12-31,1,1,12,19,0,1,1,2,0.26,0.2576,0.60,0.1642,11,108,119
17375,17376,2012-12-31,1,1,12,20,0,1,1,2,0.26,0.2576,0.60,0.1642,8,81,89
17376,17377,2012-12-31,1,1,12,21,0,1,1,1,0.26,0.2576,0.60,0.1642,7,83,90
17377,17378,2012-12-31,1,1,12,22,0,1,1,1,0.26,0.2727,0.56,0.1343,13,48,61


In [None]:
data_hour.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


### Cleaning Data

In [None]:
data_day.drop(columns=['instant'], inplace=True)
data_hour.drop(columns=['instant'], inplace=True)

In [None]:
print(data_day.duplicated().sum())
data_day.isnull().sum()

0


Unnamed: 0,0
dteday,0
season,0
yr,0
mnth,0
holiday,0
weekday,0
workingday,0
weathersit,0
temp,0
atemp,0


In [None]:
print(data_hour.duplicated().sum())
data_hour.isnull().sum()

0


Unnamed: 0,0
dteday,0
season,0
yr,0
mnth,0
hr,0
holiday,0
weekday,0
workingday,0
weathersit,0
temp,0


In [None]:
data_day

Unnamed: 0,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,2011-01-04,1,0,1,0,2,1,1,0.200000,0.212122,0.590435,0.160296,108,1454,1562
4,2011-01-05,1,0,1,0,3,1,1,0.226957,0.229270,0.436957,0.186900,82,1518,1600
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726,2012-12-27,1,1,12,0,4,1,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
727,2012-12-28,1,1,12,0,5,1,2,0.253333,0.255046,0.590000,0.155471,644,2451,3095
728,2012-12-29,1,1,12,0,6,0,2,0.253333,0.242400,0.752917,0.124383,159,1182,1341
729,2012-12-30,1,1,12,0,0,0,1,0.255833,0.231700,0.483333,0.350754,364,1432,1796


In [None]:
data_hour

Unnamed: 0,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0000,3,13,16
1,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.80,0.0000,8,32,40
2,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.80,0.0000,5,27,32
3,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0000,3,10,13
4,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0000,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17374,2012-12-31,1,1,12,19,0,1,1,2,0.26,0.2576,0.60,0.1642,11,108,119
17375,2012-12-31,1,1,12,20,0,1,1,2,0.26,0.2576,0.60,0.1642,8,81,89
17376,2012-12-31,1,1,12,21,0,1,1,1,0.26,0.2576,0.60,0.1642,7,83,90
17377,2012-12-31,1,1,12,22,0,1,1,1,0.26,0.2727,0.56,0.1343,13,48,61


In [None]:
print("Jumlah jam dalam 2 tahun (1 Jan 2011 - 31 Des 2012) =", (365*24)+(366*24))
print("Jumlah baris dalam data_hour =", data_hour.shape[0])
print("Selisih =", (365*24)+(366*24) - data_hour.shape[0])

Jumlah jam dalam 2 tahun (1 Jan 2011 - 31 Des 2012) = 17544
Jumlah baris dalam data_hour = 17379
Selisih = 165


In [None]:
all_dates = pd.date_range(start='2011-01-01', end='2012-12-31', freq='D')
all_hours = list(range(24))
full_combinations = pd.MultiIndex.from_product(
    [all_dates, all_hours],
    names=['dteday', 'hr']
).to_frame(index=False)

In [None]:
data_hour['dteday'] = pd.to_datetime(data_hour['dteday'])

In [None]:
data_hrfix = full_combinations.merge(data_hour, on=['dteday', 'hr'], how='left', indicator=True)
missing_data = data_hrfix[data_hrfix['_merge'] == 'left_only']

In [None]:
missing_data

Unnamed: 0,dteday,hr,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt,_merge
29,2011-01-02,5,,,,,,,,,,,,,,,left_only
50,2011-01-03,2,,,,,,,,,,,,,,,left_only
51,2011-01-03,3,,,,,,,,,,,,,,,left_only
75,2011-01-04,3,,,,,,,,,,,,,,,left_only
99,2011-01-05,3,,,,,,,,,,,,,,,left_only
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16044,2012-10-30,12,,,,,,,,,,,,,,,left_only
16251,2012-11-08,3,,,,,,,,,,,,,,,left_only
16755,2012-11-29,3,,,,,,,,,,,,,,,left_only
17356,2012-12-24,4,,,,,,,,,,,,,,,left_only


In [None]:
data_hrfix.drop(columns=['_merge'], inplace=True)

In [None]:
data_hrfix

Unnamed: 0,dteday,hr,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,2011-01-01,0,1.0,0.0,1.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.81,0.0000,3.0,13.0,16.0
1,2011-01-01,1,1.0,0.0,1.0,0.0,6.0,0.0,1.0,0.22,0.2727,0.80,0.0000,8.0,32.0,40.0
2,2011-01-01,2,1.0,0.0,1.0,0.0,6.0,0.0,1.0,0.22,0.2727,0.80,0.0000,5.0,27.0,32.0
3,2011-01-01,3,1.0,0.0,1.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.75,0.0000,3.0,10.0,13.0
4,2011-01-01,4,1.0,0.0,1.0,0.0,6.0,0.0,1.0,0.24,0.2879,0.75,0.0000,0.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17539,2012-12-31,19,1.0,1.0,12.0,0.0,1.0,1.0,2.0,0.26,0.2576,0.60,0.1642,11.0,108.0,119.0
17540,2012-12-31,20,1.0,1.0,12.0,0.0,1.0,1.0,2.0,0.26,0.2576,0.60,0.1642,8.0,81.0,89.0
17541,2012-12-31,21,1.0,1.0,12.0,0.0,1.0,1.0,1.0,0.26,0.2576,0.60,0.1642,7.0,83.0,90.0
17542,2012-12-31,22,1.0,1.0,12.0,0.0,1.0,1.0,1.0,0.26,0.2727,0.56,0.1343,13.0,48.0,61.0


In [None]:
data_hrfix.isnull().sum()

Unnamed: 0,0
dteday,0
hr,0
season,165
yr,165
mnth,165
holiday,165
weekday,165
workingday,165
weathersit,165
temp,165


In [None]:
data_hrfix['season'] = data_hrfix['dteday'].dt.month.map({3: 1, 4: 1, 5: 1, 6: 2, 7: 2, 8: 2, 9: 3, 10: 3, 11: 3, 12: 4, 1: 4, 2: 4})
data_hrfix['yr'] = data_hrfix['dteday'].dt.year.map({2011: 0}).fillna(1)
data_hrfix['mnth'] = data_hrfix['dteday'].dt.month
#data_hrfix['holiday'] =
#data_hrfix['weekday'] =
#data_hrfix['workingday'] =
data_hrfix["weathersit"] = data_hrfix["weathersit"].ffill()
data_hrfix["temp"] = data_hrfix["temp"].fillna(data_hrfix["temp"].mean())
data_hrfix["atemp"] = data_hrfix["atemp"].fillna(data_hrfix["atemp"].mean())
data_hrfix["hum"] = data_hrfix["hum"].fillna(data_hrfix["hum"].mean())
data_hrfix["windspeed"] = data_hrfix["windspeed"].fillna(data_hrfix["windspeed"].mean())
data_hrfix["casual"] = data_hrfix["casual"].interpolate(method="linear")
data_hrfix["registered"] = data_hrfix["registered"].interpolate(method="linear")
data_hrfix["cnt"] = data_hrfix["casual"] + data_hrfix["registered"]

In [None]:
data_hrfix.isnull().sum()

Unnamed: 0,0
dteday,0
hr,0
season,0
yr,0
mnth,0
holiday,165
weekday,165
workingday,165
weathersit,0
temp,0


## Exploratory Data Analysis (EDA)

### Explore ...

## Visualization & Explanatory Analysis

### Pertanyaan 1:

### Pertanyaan 2:

## Analisis Lanjutan (Opsional)

## Conclusion

- Conclution pertanyaan 1
- Conclution pertanyaan 2