# Proyek Analisis Data: Bike Sharing Dataset
- **Nama**: Meisy Nathania Yogianty
- **Email**: meisynathania.y@gmail.com
- **ID Dicoding**: meisynathania

## Pertanyaan Bisnis:
1. Pada musim apa sepeda paling banyak disewakan?
2. Pada jam berapa bisnis persewaan sepeda ideal dibuka?

## Analisis Data

### 1. Persiapan
Pada tahap ini, dilakukan import library dan setting directory kerja

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import streamlit as st
import numpy as np
import os
import datetime
sns.set(style='dark')

### 2. Data Wrangling

#### a. Data Gathering

In [4]:
daily = pd.read_csv("day.csv")
daily.head()

FileNotFoundError: [Errno 2] No such file or directory: 'day.csv'

In [None]:
hourly = pd.read_csv("hour.csv")
hourly.head()

#### b. Data Assessing

In [None]:
daily.info()

In [None]:
hourly.info()

Pada kedua tabel, kolom dteday yang menyimpan tanggal masih bertipe object, maka harus diganti menjadi tipe tanggal yang sesuai

In [None]:
print(hourly.isna().sum())
print(daily.isna().sum())

In [None]:
print(daily.duplicated().sum())
print(hourly.duplicated().sum())

Kedua tabel tidak memiliki missing value dan data duplikat

Selanjutnya, cek parameter statistik kedua tabel:

In [None]:
daily.describe()

In [None]:
hourly.describe()

#### c. Data Cleaning

In [None]:
daily['dteday']= pd.to_datetime(daily['dteday'])
daily['season'].replace([1,2,3,4],['springer','summer', 'fall', 'winter'],inplace=True)
daily['yr'].replace([0,1],['2011','2012'],inplace=True)
daily['mnth'] = daily['mnth'].astype(str)
daily['holiday'].replace([0,1],['not holiday','holiday'],inplace=True)
daily['weekday'] = daily['weekday'].astype(str)
daily['workingday'].replace([0,1],['weekend/holiday','workingday'],inplace=True)
daily.info()

In [None]:
hourly['dteday']= pd.to_datetime(hourly['dteday'])
hourly['season'].replace([1,2,3,4],['springer','summer', 'fall', 'winter'],inplace=True)
hourly['yr'].replace([0,1],['2011','2012'],inplace=True)
hourly['mnth'] = hourly['mnth'].astype(str)
hourly['holiday'].replace([0,1],['not holiday','holiday'],inplace=True)
hourly['weekday'] = hourly['weekday'].astype(str)
hourly['workingday'].replace([0,1],['weekend/holiday','workingday'],inplace=True)
hourly.info()

Setelah semua kolom dipastikan bersih, unduh file data yang telah dibersihkan 

In [None]:
hourly.to_csv("hourly_clean.csv", index=False)

In [None]:
daily.to_csv("daily_clean.csv", index=False)

### 2. Exploratory Data Analysis

In [None]:
daily_clean.describe(include = "all")

In [None]:
hourly_clean.describe(include = "all")

a. Perbandingan Sepeda Tersewa saat holiday dan not holiday

In [None]:
daily_clean.groupby(by="holiday").agg({
    "instant": "nunique",
    "cnt": ["max", "min", "mean", "std"]
})

In [None]:
hourly_clean.groupby(by="hr").agg({
    "cnt": ["max", "min", "mean", "std"],
})

#1 Perbandingan peminjaman sepeda di tiap musim

In [None]:
seasonal = pd.DataFrame(daily_clean.groupby(by=["season","yr"]).agg({
    "cnt": ["sum"],
}).unstack())
seasonal

In [None]:
grouped_data = seasonal.groupby(['season'])['cnt'].sum().unstack()
grouped_data

In [None]:
grouped_data.plot(kind='bar', colormap='tab20')

Berdasarkan data bar chart di atas, terlihat pola berulang tiap tahun untuk jumlah sepeda tersewa di tiap musimnya

#2 Untuk menentukan jam buka yang tepat guna memperoleh keuntungan optimal, pemilik bisnis tentu perlu mengetahui jumlah sepeda tersewa per jam setiap harinya

In [None]:
hourly_day = hourly_clean[hourly_clean["dteday"] == '2011-01-01']
#tanggalnya dapat diganti untuk melihat perubahan per hari
plt.figure(figsize=(10, 5)) 
plt.plot(hourly_day["hr"], hourly_day["cnt"], marker='o', linewidth=2, color="#72BCD4") 
plt.title("Banyak Sepeda Tersewa per Jam", loc="center", fontsize=20) 
plt.xticks(fontsize=10) 
plt.yticks(fontsize=10) 
plt.show()

## Visualization and Explanatory Analysis

Prototype Dashboard

In [None]:
selected_year = '2011'

# Subset the data for the specified year and season
subset_data = daily_clean[(daily_clean['yr'] == selected_year)]
subset_data

In [None]:
seasonal = subset_data.groupby('season')['cnt'].sum()
seasonal

In [None]:
plt.bar(seasonal.index, seasonal)
plt.xlabel("Season")
plt.ylabel("Total cnt")
plt.title("Total cnt per Season")
plt.show()

In [None]:
#untuk metriks
totalcnt = daily_clean.cnt.sum()
totalcnt

## Conclusion
1. Pada musim gugur (fall), sepeda berhasil disewakan dengan jumlah paling banyak. Berikutnya, musim panas (summer) berada di urutan kedua, lalu musim salju (winter) di urutan ketiga, dan terakhir musim semi (spring) di urutan terakhir. Optimasi revenue dapat dilakukan di musim gugur, panas, dan salju.
2. Berdasarkan tren penyewa, bisnis ini paling cocok untuk memiliki jam operasi dari pukul 05.00-20.00?

## DASHBOARD BUILD

In [None]:
#dashboard
sidebarOpt = st.sidebar.selectbox(
    'Bike Share Dashboard Menu',
    ('Daily Report','Seasonal Report')
)
if sidebarOpt == 'Daily Report' or sidebarOpt =='':
    st.header('DAILY REPORT')
    #jumlah sepeda tersewa per jam setiap harinya
    dateInput = st.date_input(
        label='Pilih Tanggal',
        value = datetime.date(2011, 1, 1)
    )
    hourly_day = hourly_clean[hourly_clean["dteday"] == str(dateInput)] #subset data per hari
    daily_day = daily_clean[daily_clean["dteday"] == str(dateInput)]
    totalcnt = daily_day.cnt.sum()
    st.metric("Total Sepeda Tersewa", value = totalcnt)
    plt.figure(figsize=(10, 5)) 
    plt.plot(hourly_day["hr"], hourly_day["cnt"], marker='o', linewidth=2, color="#72BCD4") 
    plt.title("Banyak Sepeda Tersewa per Jam", loc="center", fontsize=20) 
    plt.xticks(fontsize=10) 
    plt.yticks(fontsize=10) 
    plt.show()
    st.pyplot(plt)
elif sidebarOpt == 'Seasonal Report':
    st.header("SEASONAL REPORT")
    #perbandingan peminjaman sepeda di tiap musim
    year = ['2011', '2012']
    selected_year = st.radio("Tahun", 
                             year, key="2011")
    subset_data = daily_clean[(daily_clean['yr'] == selected_year)]
    seasonal = subset_data.groupby('season')['cnt'].sum()
    fig, ax = plt.subplots(figsize = (10, 5))
    ax.bar(seasonal.index, seasonal.values,
           color='skyblue', label='Total')
    ax.set_xlabel("Musim")
    ax.set_ylabel("Total")
    ax.set_title("Banyak Sepeda Tersewa tiap Musim")
    st.pyplot(fig)