# Bike Sharing Dataset

# About Dataset

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings('ignore')
import os

In [4]:
# Import Packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Gathering Data

In [70]:
# Membuat table day
day_df = pd.read_csv("data\day.csv")
day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [15]:
# Mmebuat tabel hour_df
hour_df = pd.read_csv("data\hour.csv")
hour_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


# Assesing Data
## day_df

In [16]:
day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     731 non-null    int64  
 1   dteday      731 non-null    object 
 2   season      731 non-null    int64  
 3   yr          731 non-null    int64  
 4   mnth        731 non-null    int64  
 5   holiday     731 non-null    int64  
 6   weekday     731 non-null    int64  
 7   workingday  731 non-null    int64  
 8   weathersit  731 non-null    int64  
 9   temp        731 non-null    float64
 10  atemp       731 non-null    float64
 11  hum         731 non-null    float64
 12  windspeed   731 non-null    float64
 13  casual      731 non-null    int64  
 14  registered  731 non-null    int64  
 15  cnt         731 non-null    int64  
dtypes: float64(4), int64(11), object(1)
memory usage: 91.5+ KB


Dataset memiliki 16 fitur dan 731 baris.

In [8]:
# Memeriksa duplikasi day_df
print("Jumlah duplikasi: ", day_df.duplicated().sum())
day_df.describe()

Jumlah duplikasi:  0


Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


In [11]:
# Memeriksa Missing Value
day_df.isna().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

Tidak ada Missing Value

In [12]:
day_df.describe()

Unnamed: 0,instant,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0,731.0
mean,366.0,2.49658,0.500684,6.519836,0.028728,2.997264,0.683995,1.395349,0.495385,0.474354,0.627894,0.190486,848.176471,3656.172367,4504.348837
std,211.165812,1.110807,0.500342,3.451913,0.167155,2.004787,0.465233,0.544894,0.183051,0.162961,0.142429,0.077498,686.622488,1560.256377,1937.211452
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.05913,0.07907,0.0,0.022392,2.0,20.0,22.0
25%,183.5,2.0,0.0,4.0,0.0,1.0,0.0,1.0,0.337083,0.337842,0.52,0.13495,315.5,2497.0,3152.0
50%,366.0,3.0,1.0,7.0,0.0,3.0,1.0,1.0,0.498333,0.486733,0.626667,0.180975,713.0,3662.0,4548.0
75%,548.5,3.0,1.0,10.0,0.0,5.0,1.0,2.0,0.655417,0.608602,0.730209,0.233214,1096.0,4776.5,5956.0
max,731.0,4.0,1.0,12.0,1.0,6.0,1.0,3.0,0.861667,0.840896,0.9725,0.507463,3410.0,6946.0,8714.0


In [27]:
# Menghitung nilai unik
day_df.nunique()

instant       731
dteday        731
season          4
yr              2
mnth           12
holiday         2
weekday         7
workingday      2
weathersit      3
temp          499
atemp         690
hum           595
windspeed     650
casual        606
registered    679
cnt           696
dtype: int64

## hour_df

In [18]:
hour_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17379 entries, 0 to 17378
Data columns (total 17 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   instant     17379 non-null  int64  
 1   dteday      17379 non-null  object 
 2   season      17379 non-null  int64  
 3   yr          17379 non-null  int64  
 4   mnth        17379 non-null  int64  
 5   hr          17379 non-null  int64  
 6   holiday     17379 non-null  int64  
 7   weekday     17379 non-null  int64  
 8   workingday  17379 non-null  int64  
 9   weathersit  17379 non-null  int64  
 10  temp        17379 non-null  float64
 11  atemp       17379 non-null  float64
 12  hum         17379 non-null  float64
 13  windspeed   17379 non-null  float64
 14  casual      17379 non-null  int64  
 15  registered  17379 non-null  int64  
 16  cnt         17379 non-null  int64  
dtypes: float64(4), int64(12), object(1)
memory usage: 2.3+ MB


In [None]:
Dataset memiliki 17 fitur dan 17378 baris.

In [19]:
# Memeriksa duplikasi hour_df
print("Jumlah duplikasi: ", hour_df.duplicated().sum())
hour_df.describe()

Jumlah duplikasi:  0


Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


In [20]:
# Memeriksa Missing Value
hour_df.isna().sum()

instant       0
dteday        0
season        0
yr            0
mnth          0
hr            0
holiday       0
weekday       0
workingday    0
weathersit    0
temp          0
atemp         0
hum           0
windspeed     0
casual        0
registered    0
cnt           0
dtype: int64

Tidak ada Missing Value

In [22]:
hour_df.describe()

Unnamed: 0,instant,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,8690.0,2.50164,0.502561,6.537775,11.546752,0.02877,3.003683,0.682721,1.425283,0.496987,0.475775,0.627229,0.190098,35.676218,153.786869,189.463088
std,5017.0295,1.106918,0.500008,3.438776,6.914405,0.167165,2.005771,0.465431,0.639357,0.192556,0.17185,0.19293,0.12234,49.30503,151.357286,181.387599
min,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.02,0.0,0.0,0.0,0.0,0.0,1.0
25%,4345.5,2.0,0.0,4.0,6.0,0.0,1.0,0.0,1.0,0.34,0.3333,0.48,0.1045,4.0,34.0,40.0
50%,8690.0,3.0,1.0,7.0,12.0,0.0,3.0,1.0,1.0,0.5,0.4848,0.63,0.194,17.0,115.0,142.0
75%,13034.5,3.0,1.0,10.0,18.0,0.0,5.0,1.0,2.0,0.66,0.6212,0.78,0.2537,48.0,220.0,281.0
max,17379.0,4.0,1.0,12.0,23.0,1.0,6.0,1.0,4.0,1.0,1.0,1.0,0.8507,367.0,886.0,977.0


In [23]:
# Menghitung jumlah nilai unik 
hour_df.nunique()

instant       17379
dteday          731
season            4
yr                2
mnth             12
hr               24
holiday           2
weekday           7
workingday        2
weathersit        4
temp             50
atemp            65
hum              89
windspeed        30
casual          322
registered      776
cnt             869
dtype: int64

## Catatan:

1. Data tersebut tidak ada yang kosong ataupun memiliki duplikat
2. Semua nilai maksimum, minimum, mean tampak masuk akal dan akurat
3. Memiliki 3 nilai unik di kolom cuaca pada day_df
4. Memiliki 4 nilai unik di kolom cuaca pada hour_df

Adapun beberapa hal yang perlu dipertimbangkan:
1. Perlu mengubah datetime menjadi data date
2. Beberapa data categorikal masih belum berupa kategorikal

# Cleaning Data

## day_df

In [33]:
# Mengubah tipe data ke datetime
day_df['dteday'] = pd.to_datetime(day_df.dteday)

# Mengubaha tipe data ke categorical
day_df['season'] = day_df.season.astype('category')
day_df['yr'] = day_df.yr.astype('category')
day_df['mnth'] = day_df.mnth.astype('category')
day_df['holiday'] = day_df.holiday.astype('category')
day_df['weekday'] = day_df.weekday.astype('category')
day_df['workingday'] = day_df.workingday.astype('category')
day_df['weathersit'] = day_df.weathersit.astype('category')

day_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,6,0,2,0.344167,0.363625,0.805833,0.160446,331,654,985
1,2,2011-01-02,1,0,1,0,0,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
2,3,2011-01-03,1,0,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
3,4,2011-01-04,1,0,1,0,2,1,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
4,5,2011-01-05,1,0,1,0,3,1,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600


In [34]:
day_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 731 entries, 0 to 730
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   instant     731 non-null    int64         
 1   dteday      731 non-null    datetime64[ns]
 2   season      731 non-null    category      
 3   yr          731 non-null    category      
 4   mnth        731 non-null    category      
 5   holiday     731 non-null    category      
 6   weekday     731 non-null    category      
 7   workingday  731 non-null    category      
 8   weathersit  731 non-null    category      
 9   temp        731 non-null    float64       
 10  atemp       731 non-null    float64       
 11  hum         731 non-null    float64       
 12  windspeed   731 non-null    float64       
 13  casual      731 non-null    int64         
 14  registered  731 non-null    int64         
 15  cnt         731 non-null    int64         
dtypes: category(7), datetime64

## hour_df

In [30]:
hour_df['season'] = hour_df.season.astype('category')
hour_df['yr'] = hour_df.yr.astype('category')
hour_df['mnth'] = hour_df.mnth.astype('category')
hour_df['holiday'] = hour_df.holiday.astype('category')
hour_df['weekday'] = hour_df.weekday.astype('category')
hour_df['workingday'] = hour_df.workingday.astype('category')
hour_df['weathersit'] = hour_df.weathersit.astype('category')

hour_df.head()

Unnamed: 0_level_0,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered
dteday,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2011-01-01,1,0,1,0,0,6,0,Cerah,0.24,0.2879,0.81,0.0,3,13
2011-01-01,1,0,1,1,0,6,0,Cerah,0.22,0.2727,0.8,0.0,8,32
2011-01-01,1,0,1,2,0,6,0,Cerah,0.22,0.2727,0.8,0.0,5,27
2011-01-01,1,0,1,3,0,6,0,Cerah,0.24,0.2879,0.75,0.0,3,10
2011-01-01,1,0,1,4,0,6,0,Cerah,0.24,0.2879,0.75,0.0,0,1


In [31]:
hour_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 17379 entries, 2011-01-01 to 2012-12-31
Data columns (total 14 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   season      17379 non-null  category
 1   yr          17379 non-null  category
 2   mnth        17379 non-null  category
 3   hr          17379 non-null  int64   
 4   holiday     17379 non-null  category
 5   weekday     17379 non-null  category
 6   workingday  17379 non-null  category
 7   weathersit  17379 non-null  category
 8   temp        17379 non-null  float64 
 9   atemp       17379 non-null  float64 
 10  hum         17379 non-null  float64 
 11  windspeed   17379 non-null  float64 
 12  casual      17379 non-null  int64   
 13  registered  17379 non-null  int64   
dtypes: category(7), float64(4), int64(3)
memory usage: 1.2 MB


Merubah deskripsi pada tabel

In [78]:
# Mengubah angka menjadi keterangan day_df
month_map = {
    1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun',
    7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'
}

season_map = {
    1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'
}

weekday_map = {
    0: 'Sun', 1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat'
}

weather_map = {
    1: 'Clear/Partly Cloudy',
    2: 'Misty/Cloudy',
    3: 'Light Snow/Rain',
    4: 'Severe Weather'
}

day_df['mnth'] = day_df['mnth'].map(month_map)
day_df['season'] = day_df['season'].map(season_map)
day_df['weekday'] = day_df['weekday'].map(weekday_map)
day_df['weathersit'] = day_df['weathersit'].map(weather_map)


In [79]:
# Mengubah angka menjadi keterangan pada hour_df
month_map = {
    1: 'Jan', 2: 'Feb', 3: 'Mar', 4: 'Apr', 5: 'May', 6: 'Jun',
    7: 'Jul', 8: 'Aug', 9: 'Sep', 10: 'Oct', 11: 'Nov', 12: 'Dec'
}

season_map = {
    1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'
}

weekday_map = {
    0: 'Sun', 1: 'Mon', 2: 'Tue', 3: 'Wed', 4: 'Thu', 5: 'Fri', 6: 'Sat'
}

weather_map = {
    1: 'Clear/Partly Cloudy',
    2: 'Misty/Cloudy',
    3: 'Light Snow/Rain',
    4: 'Severe Weather'
}

hour_df['mnth'] = hour_df['mnth'].map(month_map)
hour_df['season'] = hour_df['season'].map(season_map)
hour_df['weekday'] = hour_df['weekday'].map(weekday_map)
hour_df['weathersit'] = hour_df['weathersit'].map(weather_map)

# Eksplorasi Data
## day_df

In [72]:
# Pengelompokkan data berdasarkan bulan 
day_df.groupby(by='mnth').agg({
    'registered': ['max', 'min', 'mean', 'sum']
})

Unnamed: 0_level_0,registered,registered,registered,registered
Unnamed: 0_level_1,max,min,mean,sum
mnth,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
1,4185,416,1982.112903,122891
2,4546,905,2392.789474,136389
3,5893,491,2975.419355,184476
4,5950,674,3471.533333,208292
5,6433,2213,4135.5,256401
6,6456,2993,4540.6,272436
7,6790,2298,4303.080645,266791
8,6541,889,4502.5,279155
9,6946,1689,4594.466667,275668
10,6911,20,4235.354839,262592


In [44]:
# Pengelompokkan data berdasarkan jam berapa pengguna menyewa
hour_df.groupby(by='hr').agg({
    'registered': ['max', 'min', 'mean', 'sum']
})

Unnamed: 0_level_0,registered,registered,registered,registered
Unnamed: 0_level_1,max,min,mean,sum
hr,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,234,0,43.739669,31755
1,139,0,26.871547,19455
2,104,0,18.097902,12940
3,61,0,9.011478,6281
4,22,0,5.098996,3554
5,64,0,18.478382,13249
6,203,0,71.882759,52115
7,572,1,201.009629,146134
8,808,4,337.331499,245240
9,399,6,188.418157,136980


In [82]:
# Pengelompokan cuaca
day_df.groupby(by='weathersit').agg({
    'registered': ['max', 'min', 'mean', 'sum']
})

Unnamed: 0_level_0,registered,registered,registered,registered
Unnamed: 0_level_1,max,min,mean,sum
weathersit,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2


In [83]:
day_df.groupby(by='holiday').agg({
    'cnt': ['max', 'min', 'mean', 'sum']
})

Unnamed: 0_level_0,cnt,cnt,cnt,cnt
Unnamed: 0_level_1,max,min,mean,sum
holiday,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
0,8714,22,4527.104225,3214244
1,7403,1000,3735.0,78435


In [None]:
day_df.groupby(by='season').agg({
    'casual': 'mean',
    'registered': 'mean',
    'cnt': ['max', 'min', 'mean']
})

In [None]:
day_df.groupby(by='season').agg({
    'temp': ['max', 'min', 'mean'],
    'atemp': ['max', 'min', 'mean'],
    'hum': ['max', 'min', 'mean'],
    'cnt': ['max', 'min', 'mean']
})
   

In [None]:
fig, ax = plt.subplots(figsize=(10,6))
correlation_matrix = day_df.corr(numeric_only=True)
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))

sns.heatmap(
    correlation_matrix,
    annot=True,
    mask=mask,
    cmap="coolwarm",
    center=0,
    fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

Perlu menggarisbawahi beberapa korelasi yang kuat seperti cnt dan instant, temp, dan atemp, ini sangat berpengaruh satu sama lain. Kemudian humidity dan windspeed ternyata tidak terlalu berpengaruh terhadap penyewa.

In [None]:
fig, ax = plt.subplots(figsize=(10,6))
correlation_matrix = hour_df.corr(numeric_only=True)
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))

sns.heatmap(
    correlation_matrix,
    annot=True,
    mask=mask,
    cmap="coolwarm",
    center=0,
    fmt=".2f")
plt.title("Correlation Heatmap")
plt.show()

Disini cnt hrm temp, atemp, cukup berkorelasi. Untuk humidity dan windspeed sama spt sebelumnya tidak terlalu berkorelasi satu sama lain dengan cnt

## Pertanyaan 1
Jumlah Pengguna Sepeda berdasarkan Jam

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(
    x='hr',
    y='cnt',
    data=hour_df)

plt.title('Jumlah Pengguna Sepeda berdasarkan Jam')
plt.xlabel('Jam Sewa')
plt.ylabel('Jumlah Pengguna Sepeda')
plt.show()

Disini terlihat bahwa rata-rata orang menyewa paling banyak pada sore hari, puncaknya pada pukul 17.00. Paling sedikit adalah ketikan malam menuju subuh. Di pagi hari jam 8 juga terjadi lonjakan yang sangat signifikan. Ini mengindikasikan persewaan dapat dibuka sebelum jam 8 pagi dan sesudah subuh sekitar jam 6 pagi.

## Pertanyaan 2
Jumlah Pengguna Sepeda berdasarkan Kondisi Cuaca

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(
    x='weathersit',
    y='cnt',
    data=day_df)

plt.title('Jumlah Pengguna Sepeda berdasarkan Kondisi Cuaca')
plt.xlabel('Kondisi Cuaca')
plt.ylabel('Jumlah Pengguna Sepeda')
plt.show()
     

In [None]:
Terlihat jelas penyewa lebih banyak ketika harinya cerah dan berkurang ketika mendung atau berkabut. Menurun drstis ketika hujan.

## Pertanyaan 3
Jumlah Pengguna Sepeda berdasarkan musim

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(
    x='season',
    y='cnt',
    data=day_df)

plt.title('Jumlah Pengguna Sepeda berdasarkan musim')
plt.xlabel('Musim')
plt.ylabel('Jumlah Pengguna Sepeda')
plt.show()

Frekuensi penyewa paling tinggi saat musim gugur ternyata dan masih tinggi di musim dingin yang sebenarnya mungkin masih ada salju. Yang paling rendah berada pada musim semi. Ini menarik mengingat sebelumnya lebih sedikit penyewa saat hujan salju. Tapi pada musim tersebut total penyewa dalam musim tersebut masih banyak.

## Pertanyaan 3
Jumlah Pengguna Sepeda berdasarkan hari kerja

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(
    x='workingday',
    y='cnt',
    data=day_df)

plt.title('Jumlah Pengguna Sepeda berdasarkan hari kerja')
plt.xlabel('tidak hari kerja/hari kerja')
plt.ylabel('Jumlah Pengguna Sepeda')
plt.show()

Ternyata pada hari kerja lebih banyak penyewa dari pada bukan hari kerja

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(
    x='holiday',
    y='cnt',
    data=day_df)

plt.title('Jumlah Pengguna Sepeda berdasarkan hari libur')
plt.xlabel('tidak libur/libur')
plt.ylabel('Jumlah Pengguna Sepeda')
plt.show()

Berdasarkan hsil sebelumnya memang hari kerja lebih banyak penyewa daripada hari libur maka data ini masuk akal. Ketika libur, lebih sedikit penyewa daripada hari kerja

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(
    x='weekday',
    y='cnt',
    data=day_df)

plt.title('Jumlah Pengguna Sepeda setiap hari')
plt.xlabel('Hari')
plt.ylabel('Jumlah Pengguna Sepeda')
plt.show()

Hari paling sedikit penyewa adalah hari Minggu dan paling banyak pada hari Jum'at. Tapi dari keseluruhan tidak terlalu jauh berbeda antara hari satu dan hari lainnya.

# Kesimpulan
1. Rata-rata orang menyewa paling banyak pada sore hari, puncaknya pada pukul 17.00. Paling sedikit adalah ketikan malam menuju subuh. Di pagi hari jam 8 juga terjadi lonjakan yang sangat signifikan.
2. Penyewa lebih banyak menyewa sepeda ketika harinya cerah dan berkurang ketika mendung atau berkabut. Menurun drstis ketika hujan.
3, Frekuensi penyewa yang paling tinggi adalah saat saat musim gugur dan pada musim dinginpun penyewa masih banyak yang sebenarnya mungkin masih ada salju. Yang paling rendah berada pada musim semi. Ini menarik mengingat sebelumnya lebih sedikit penyewa saat hujan salju. Tapi pada musim tersebut total penyewa dalam musim tersebut masih banyak.
4. Ternyata pada hari kerja lebih banyak penyewa dari pada bukan hari kerja. Berdasarkan hasil sebelumnya memang hari kerja lebih banyak penyewa daripada hari libur maka data ini masuk akal. Ketika libur, lebih sedikit penyewa daripada hari kerja. Hari paling sedikit penyewa adalah hari Minggu dan paling banyak pada hari Jum'at. Tapi dari keseluruhan tidak terlalu jauh berbeda antara hari satu dan hari lainnya.