# üö¥ Bike Sharing Data Analysis

## Informasi Proyek
- **Dataset**: Bike Sharing Dataset (Hourly)
- **Tujuan**: Menganalisis pola penggunaan bike sharing untuk mendapatkan insights strategis
- **Teknik Analisis**: Time Series Analysis, Clustering, Correlation Analysis, Pattern Mining

## Deskripsi Dataset
Dataset ini berisi informasi rental sepeda per jam dengan variabel:
- **Temporal**: instant, dteday, season, yr, mnth, hr, holiday, weekday, workingday
- **Weather**: weathersit, temp, atemp, hum, windspeed
- **Target**: casual, registered, cnt (total count)

---

## 1. Import Libraries dan Load Data

Pada tahap ini, kita akan mengimport library yang diperlukan untuk analisis data.

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Machine Learning & Advanced Analytics
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from scipy import stats
from scipy.stats import pearsonr

# Time Series
from statsmodels.tsa.seasonal import seasonal_decompose

# Warnings
import warnings
warnings.filterwarnings('ignore')

# Settings
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)

## 2. Load dan Eksplorasi Data Awal

Memuat dataset dan melakukan pemeriksaan awal terhadap struktur data.

In [None]:
# Load data
df = pd.read_csv('Dataset/hour.csv')

# Informasi dasar
print("=" * 80)
print("INFORMASI DATASET")
print("=" * 80)
print(f"Jumlah baris: {len(df):,}")
print(f"Jumlah kolom: {len(df.columns)}")
print(f"Periode data: {df['dteday'].min()} s/d {df['dteday'].max()}")
print(f"\nTotal rental: {df['cnt'].sum():,} sepeda")
print(f"Rata-rata rental per jam: {df['cnt'].mean():.2f} sepeda")
print("\n" + "=" * 80)

In [None]:
# Tampilkan 5 baris pertama
df.head()

In [None]:
# Info dataset
df.info()

In [None]:
# Statistik deskriptif
df.describe()

In [None]:
# Cek missing values
print("Missing Values:")
print(df.isnull().sum())
print(f"\nPersentase missing values: {(df.isnull().sum().sum() / (len(df) * len(df.columns)) * 100):.2f}%")

## 3. Data Preprocessing dan Feature Engineering

Pada tahap ini, kita akan:
1. Mengubah tipe data yang sesuai
2. Membuat fitur-fitur baru untuk analisis lebih mendalam
3. Encoding kategorikal variables

In [None]:
# Convert date column
df['dteday'] = pd.to_datetime(df['dteday'])

# Create datetime column
df['datetime'] = df['dteday'] + pd.to_timedelta(df['hr'], unit='h')

# Extract temporal features
df['year'] = df['dteday'].dt.year
df['month'] = df['dteday'].dt.month
df['day'] = df['dteday'].dt.day
df['day_of_week'] = df['dteday'].dt.dayofweek
df['week_of_year'] = df['dteday'].dt.isocalendar().week

# Create categorical labels
season_labels = {1: 'Spring', 2: 'Summer', 3: 'Fall', 4: 'Winter'}
weather_labels = {1: 'Clear', 2: 'Mist', 3: 'Light Snow/Rain', 4: 'Heavy Rain/Snow'}
weekday_labels = {0: 'Sunday', 1: 'Monday', 2: 'Tuesday', 3: 'Wednesday', 
                  4: 'Thursday', 5: 'Friday', 6: 'Saturday'}

df['season_label'] = df['season'].map(season_labels)
df['weather_label'] = df['weathersit'].map(weather_labels)
df['weekday_label'] = df['weekday'].map(weekday_labels)

# Create time of day categories
def categorize_hour(hour):
    if 6 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 18:
        return 'Afternoon'
    elif 18 <= hour < 24:
        return 'Evening'
    else:
        return 'Night'

df['time_of_day'] = df['hr'].apply(categorize_hour)

# Create rush hour indicator
df['is_rush_hour'] = df['hr'].apply(lambda x: 1 if x in [7, 8, 17, 18] else 0)

# Temperature categories
df['temp_category'] = pd.cut(df['temp'], bins=[0, 0.3, 0.6, 1.0], 
                              labels=['Cold', 'Moderate', 'Hot'])

# Denormalize temperature (assuming normalized to 0-1, actual range -8 to 39¬∞C)
df['temp_celsius'] = df['temp'] * 41 - 8
df['atemp_celsius'] = df['atemp'] * 50 - 16

print("Feature Engineering Selesai!")
print(f"Total fitur sekarang: {len(df.columns)}")

In [None]:
# Tampilkan dataset dengan fitur baru
df.head(10)

## 4. Exploratory Data Analysis (EDA)

### 4.1 Distribusi Target Variable

In [None]:
# Distribusi total rental
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Total count
axes[0].hist(df['cnt'], bins=50, color='skyblue', edgecolor='black')
axes[0].set_title('Distribusi Total Rental (cnt)', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Jumlah Rental')
axes[0].set_ylabel('Frekuensi')
axes[0].axvline(df['cnt'].mean(), color='red', linestyle='--', label=f'Mean: {df["cnt"].mean():.2f}')
axes[0].legend()

# Casual users
axes[1].hist(df['casual'], bins=50, color='lightcoral', edgecolor='black')
axes[1].set_title('Distribusi Casual Users', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Jumlah Casual Users')
axes[1].set_ylabel('Frekuensi')
axes[1].axvline(df['casual'].mean(), color='red', linestyle='--', label=f'Mean: {df["casual"].mean():.2f}')
axes[1].legend()

# Registered users
axes[2].hist(df['registered'], bins=50, color='lightgreen', edgecolor='black')
axes[2].set_title('Distribusi Registered Users', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Jumlah Registered Users')
axes[2].set_ylabel('Frekuensi')
axes[2].axvline(df['registered'].mean(), color='red', linestyle='--', label=f'Mean: {df["registered"].mean():.2f}')
axes[2].legend()

plt.tight_layout()
plt.show()

print(f"Casual vs Registered Ratio: {df['casual'].sum() / df['registered'].sum():.2%}")

### 4.2 Analisis Temporal Pattern

In [None]:
# Rental berdasarkan jam
hourly_avg = df.groupby('hr')[['casual', 'registered', 'cnt']].mean().reset_index()

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(hourly_avg['hr'], hourly_avg['casual'], marker='o', label='Casual', linewidth=2)
ax.plot(hourly_avg['hr'], hourly_avg['registered'], marker='s', label='Registered', linewidth=2)
ax.plot(hourly_avg['hr'], hourly_avg['cnt'], marker='^', label='Total', linewidth=2, linestyle='--')

ax.set_title('Pola Rental Sepeda Berdasarkan Jam', fontsize=16, fontweight='bold')
ax.set_xlabel('Jam (0-23)', fontsize=12)
ax.set_ylabel('Rata-rata Jumlah Rental', fontsize=12)
ax.legend(fontsize=11)
ax.grid(True, alpha=0.3)
ax.set_xticks(range(0, 24))

# Highlight rush hours
ax.axvspan(7, 9, alpha=0.2, color='yellow', label='Morning Rush')
ax.axvspan(17, 19, alpha=0.2, color='orange', label='Evening Rush')

plt.tight_layout()
plt.show()

In [None]:
# Rental berdasarkan hari dalam seminggu
daily_avg = df.groupby('weekday_label')[['casual', 'registered', 'cnt']].mean().reset_index()
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
daily_avg['weekday_label'] = pd.Categorical(daily_avg['weekday_label'], categories=day_order, ordered=True)
daily_avg = daily_avg.sort_values('weekday_label')

fig, ax = plt.subplots(figsize=(12, 6))
x = np.arange(len(daily_avg))
width = 0.25

ax.bar(x - width, daily_avg['casual'], width, label='Casual', color='coral')
ax.bar(x, daily_avg['registered'], width, label='Registered', color='skyblue')
ax.bar(x + width, daily_avg['cnt'], width, label='Total', color='lightgreen')

ax.set_title('Rata-rata Rental Berdasarkan Hari', fontsize=16, fontweight='bold')
ax.set_xlabel('Hari', fontsize=12)
ax.set_ylabel('Rata-rata Jumlah Rental', fontsize=12)
ax.set_xticks(x)
ax.set_xticklabels(daily_avg['weekday_label'], rotation=45)
ax.legend()
ax.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Heatmap rental per jam dan hari
pivot_data = df.pivot_table(values='cnt', index='hr', columns='weekday_label', aggfunc='mean')
pivot_data = pivot_data[day_order]

plt.figure(figsize=(12, 8))
sns.heatmap(pivot_data, cmap='YlOrRd', annot=True, fmt='.0f', linewidths=0.5, cbar_kws={'label': 'Avg Rentals'})
plt.title('Heatmap: Rata-rata Rental per Jam dan Hari', fontsize=16, fontweight='bold')
plt.xlabel('Hari', fontsize=12)
plt.ylabel('Jam', fontsize=12)
plt.tight_layout()
plt.show()

### 4.3 Analisis Musiman (Seasonal Analysis)

In [None]:
# Rental berdasarkan musim
seasonal_stats = df.groupby('season_label').agg({
    'cnt': ['mean', 'sum', 'std'],
    'casual': 'mean',
    'registered': 'mean'
}).round(2)

print("Statistik Rental Berdasarkan Musim:")
print(seasonal_stats)

# Visualisasi
season_data = df.groupby('season_label')[['casual', 'registered', 'cnt']].mean().reset_index()
season_order = ['Spring', 'Summer', 'Fall', 'Winter']
season_data['season_label'] = pd.Categorical(season_data['season_label'], categories=season_order, ordered=True)
season_data = season_data.sort_values('season_label')

fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Bar chart
x = np.arange(len(season_data))
width = 0.25
axes[0].bar(x - width, season_data['casual'], width, label='Casual', color='coral')
axes[0].bar(x, season_data['registered'], width, label='Registered', color='skyblue')
axes[0].bar(x + width, season_data['cnt'], width, label='Total', color='lightgreen')
axes[0].set_title('Rata-rata Rental per Musim', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Musim')
axes[0].set_ylabel('Rata-rata Rental')
axes[0].set_xticks(x)
axes[0].set_xticklabels(season_data['season_label'])
axes[0].legend()
axes[0].grid(axis='y', alpha=0.3)

# Pie chart untuk total rental
season_total = df.groupby('season_label')['cnt'].sum()
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99']
axes[1].pie(season_total, labels=season_total.index, autopct='%1.1f%%', 
           colors=colors, startangle=90)
axes[1].set_title('Distribusi Total Rental per Musim', fontsize=14, fontweight='bold')

plt.tight_layout()
plt.show()

### 4.4 Analisis Pengaruh Cuaca (Weather Impact Analysis)

In [None]:
# Weather situation analysis
weather_stats = df.groupby('weather_label').agg({
    'cnt': ['mean', 'sum', 'count'],
    'casual': 'mean',
    'registered': 'mean'
}).round(2)

print("Statistik Rental Berdasarkan Kondisi Cuaca:")
print(weather_stats)

# Visualisasi
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. Box plot rental by weather
weather_order = ['Clear', 'Mist', 'Light Snow/Rain', 'Heavy Rain/Snow']
sns.boxplot(data=df, x='weather_label', y='cnt', order=weather_order, ax=axes[0, 0], palette='Set2')
axes[0, 0].set_title('Distribusi Rental Berdasarkan Cuaca', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Kondisi Cuaca')
axes[0, 0].set_ylabel('Jumlah Rental')
axes[0, 0].tick_params(axis='x', rotation=15)

# 2. Temperature vs rental
axes[0, 1].scatter(df['temp_celsius'], df['cnt'], alpha=0.3, s=10)
axes[0, 1].set_title('Temperatur vs Rental', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Temperatur (¬∞C)')
axes[0, 1].set_ylabel('Jumlah Rental')

# Add trend line
z = np.polyfit(df['temp_celsius'], df['cnt'], 2)
p = np.poly1d(z)
temp_range = np.linspace(df['temp_celsius'].min(), df['temp_celsius'].max(), 100)
axes[0, 1].plot(temp_range, p(temp_range), "r--", linewidth=2, label='Trend')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# 3. Humidity vs rental
axes[1, 0].scatter(df['hum'], df['cnt'], alpha=0.3, s=10, c='green')
axes[1, 0].set_title('Kelembaban vs Rental', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Kelembaban (normalized)')
axes[1, 0].set_ylabel('Jumlah Rental')
axes[1, 0].grid(True, alpha=0.3)

# 4. Windspeed vs rental
axes[1, 1].scatter(df['windspeed'], df['cnt'], alpha=0.3, s=10, c='orange')
axes[1, 1].set_title('Kecepatan Angin vs Rental', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Kecepatan Angin (normalized)')
axes[1, 1].set_ylabel('Jumlah Rental')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 5. Analisis Korelasi

Menganalisis hubungan antar variabel untuk menemukan faktor-faktor yang paling berpengaruh.

In [None]:
# Correlation matrix
corr_features = ['temp', 'atemp', 'hum', 'windspeed', 'casual', 'registered', 'cnt', 
                 'season', 'weathersit', 'hr', 'holiday', 'workingday']
corr_matrix = df[corr_features].corr()

plt.figure(figsize=(14, 10))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', center=0,
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Correlation Matrix - Bike Sharing Variables', fontsize=16, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

# Top correlations with cnt
cnt_corr = corr_matrix['cnt'].sort_values(ascending=False)
print("\nTop Korelasi dengan Total Rental (cnt):")
print(cnt_corr)

In [None]:
# Statistical significance test
print("Uji Signifikansi Korelasi dengan Total Rental:")
print("=" * 60)
for feature in ['temp', 'atemp', 'hum', 'windspeed', 'hr']:
    corr, p_value = pearsonr(df[feature], df['cnt'])
    significance = "Signifikan" if p_value < 0.05 else "Tidak Signifikan"
    print(f"{feature:12s}: r = {corr:6.3f}, p-value = {p_value:.4e} ({significance})")

## 6. Time Series Analysis (Analisis Lanjutan)

Menganalisis trend, seasonality, dan pola temporal menggunakan decomposition.

In [None]:
# Aggregate to daily level for better time series analysis
daily_data = df.groupby('dteday').agg({
    'cnt': 'sum',
    'casual': 'sum',
    'registered': 'sum',
    'temp': 'mean',
    'hum': 'mean',
    'windspeed': 'mean'
}).reset_index()

daily_data.set_index('dteday', inplace=True)

# Time series plot
fig, axes = plt.subplots(3, 1, figsize=(16, 12))

# Total rental over time
axes[0].plot(daily_data.index, daily_data['cnt'], linewidth=1.5, color='steelblue')
axes[0].set_title('Time Series: Total Daily Rentals', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Total Rentals')
axes[0].grid(True, alpha=0.3)

# Casual vs Registered
axes[1].plot(daily_data.index, daily_data['casual'], label='Casual', linewidth=1.5, color='coral')
axes[1].plot(daily_data.index, daily_data['registered'], label='Registered', linewidth=1.5, color='green')
axes[1].set_title('Time Series: Casual vs Registered Users', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Daily Rentals')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# 7-day moving average
daily_data['ma_7'] = daily_data['cnt'].rolling(window=7).mean()
daily_data['ma_30'] = daily_data['cnt'].rolling(window=30).mean()

axes[2].plot(daily_data.index, daily_data['cnt'], alpha=0.3, label='Daily', color='gray')
axes[2].plot(daily_data.index, daily_data['ma_7'], label='7-Day MA', linewidth=2, color='orange')
axes[2].plot(daily_data.index, daily_data['ma_30'], label='30-Day MA', linewidth=2, color='red')
axes[2].set_title('Time Series with Moving Averages', fontsize=14, fontweight='bold')
axes[2].set_ylabel('Daily Rentals')
axes[2].set_xlabel('Date')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Seasonal decomposition
# Using multiplicative model
decomposition = seasonal_decompose(daily_data['cnt'], model='multiplicative', period=7)

fig, axes = plt.subplots(4, 1, figsize=(16, 12))

decomposition.observed.plot(ax=axes[0], color='steelblue')
axes[0].set_ylabel('Observed')
axes[0].set_title('Time Series Decomposition', fontsize=16, fontweight='bold')
axes[0].grid(True, alpha=0.3)

decomposition.trend.plot(ax=axes[1], color='orange')
axes[1].set_ylabel('Trend')
axes[1].grid(True, alpha=0.3)

decomposition.seasonal.plot(ax=axes[2], color='green')
axes[2].set_ylabel('Seasonal')
axes[2].grid(True, alpha=0.3)

decomposition.resid.plot(ax=axes[3], color='red')
axes[3].set_ylabel('Residual')
axes[3].set_xlabel('Date')
axes[3].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Insight dari Time Series Decomposition:")
print("- Trend: Menunjukkan pertumbuhan jangka panjang rental sepeda")
print("- Seasonal: Pola mingguan yang konsisten")
print("- Residual: Fluktuasi acak setelah trend dan seasonality dihilangkan")

## 7. Clustering Analysis (K-Means)

Melakukan segmentasi pengguna berdasarkan pola penggunaan untuk menemukan grup-grup dengan karakteristik serupa.

In [None]:
# Prepare data for clustering
# Aggregate hourly patterns
cluster_features = df.groupby('hr').agg({
    'cnt': 'mean',
    'casual': 'mean',
    'registered': 'mean',
    'temp': 'mean',
    'hum': 'mean',
    'windspeed': 'mean'
}).reset_index()

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(cluster_features.drop('hr', axis=1))

# Elbow method to find optimal k
inertias = []
K_range = range(2, 11)

for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
    kmeans.fit(X_scaled)
    inertias.append(kmeans.inertia_)

# Plot elbow curve
plt.figure(figsize=(10, 6))
plt.plot(K_range, inertias, 'bo-', linewidth=2, markersize=8)
plt.xlabel('Number of Clusters (k)', fontsize=12)
plt.ylabel('Inertia', fontsize=12)
plt.title('Elbow Method untuk Optimal K', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Apply K-Means with optimal k (let's use k=4)
optimal_k = 4
kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
cluster_features['cluster'] = kmeans.fit_predict(X_scaled)

# Visualize clusters using PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

plt.figure(figsize=(12, 8))
scatter = plt.scatter(X_pca[:, 0], X_pca[:, 1], c=cluster_features['cluster'], 
                     cmap='viridis', s=200, alpha=0.6, edgecolors='black')
plt.colorbar(scatter, label='Cluster')
plt.xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%} variance)', fontsize=12)
plt.ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%} variance)', fontsize=12)
plt.title('K-Means Clustering Visualization (PCA)', fontsize=14, fontweight='bold')

# Add hour labels
for i, txt in enumerate(cluster_features['hr']):
    plt.annotate(txt, (X_pca[i, 0], X_pca[i, 1]), fontsize=9, ha='center')

plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"\nPCA Explained Variance: {sum(pca.explained_variance_ratio_):.2%}")

In [None]:
# Analyze cluster characteristics
print("\nKarakteristik Setiap Cluster:")
print("=" * 100)
cluster_summary = cluster_features.groupby('cluster').agg({
    'hr': lambda x: list(x),
    'cnt': 'mean',
    'casual': 'mean',
    'registered': 'mean',
    'temp': 'mean',
    'hum': 'mean',
    'windspeed': 'mean'
}).round(2)

for cluster_id in range(optimal_k):
    print(f"\nCluster {cluster_id}:")
    print(f"  Jam: {cluster_summary.loc[cluster_id, 'hr']}")
    print(f"  Avg Total Rental: {cluster_summary.loc[cluster_id, 'cnt']:.2f}")
    print(f"  Avg Casual: {cluster_summary.loc[cluster_id, 'casual']:.2f}")
    print(f"  Avg Registered: {cluster_summary.loc[cluster_id, 'registered']:.2f}")
    print(f"  Avg Temperature: {cluster_summary.loc[cluster_id, 'temp']:.2f}")
    print(f"  Avg Humidity: {cluster_summary.loc[cluster_id, 'hum']:.2f}")
    print(f"  Avg Windspeed: {cluster_summary.loc[cluster_id, 'windspeed']:.2f}")

## 8. Pattern Mining & Business Insights

Mengekstrak insight bisnis dari pola-pola yang ditemukan.

In [None]:
# Peak vs Off-peak analysis
df['usage_period'] = df['hr'].apply(lambda x: 'Peak' if x in [7, 8, 17, 18] else 'Off-Peak')

peak_analysis = df.groupby(['usage_period', 'workingday']).agg({
    'cnt': ['mean', 'sum'],
    'casual': 'mean',
    'registered': 'mean'
}).round(2)

print("Analisis Peak vs Off-Peak Hours:")
print(peak_analysis)

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Working day comparison
working_peak = df[df['workingday'] == 1].groupby('usage_period')['cnt'].mean()
axes[0].bar(working_peak.index, working_peak.values, color=['#ff9999', '#66b3ff'])
axes[0].set_title('Working Days: Peak vs Off-Peak', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Rata-rata Rental')
axes[0].grid(axis='y', alpha=0.3)

# Non-working day comparison
nonworking_peak = df[df['workingday'] == 0].groupby('usage_period')['cnt'].mean()
axes[1].bar(nonworking_peak.index, nonworking_peak.values, color=['#ff9999', '#66b3ff'])
axes[1].set_title('Non-Working Days: Peak vs Off-Peak', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Rata-rata Rental')
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
# Year-over-year growth analysis
yoy_growth = df.groupby(['year', 'month'])['cnt'].sum().reset_index()
yoy_pivot = yoy_growth.pivot(index='month', columns='year', values='cnt')

# Calculate growth rate
if len(yoy_pivot.columns) == 2:
    yoy_pivot['growth_rate'] = ((yoy_pivot.iloc[:, 1] - yoy_pivot.iloc[:, 0]) / yoy_pivot.iloc[:, 0] * 100).round(2)

print("\nYear-over-Year Growth Analysis:")
print(yoy_pivot)

# Visualize
if len(yoy_pivot.columns) >= 2:
    fig, axes = plt.subplots(1, 2, figsize=(16, 6))
    
    # Monthly comparison
    yoy_pivot.iloc[:, :2].plot(kind='bar', ax=axes[0], color=['skyblue', 'coral'])
    axes[0].set_title('Monthly Rentals: Year-over-Year Comparison', fontsize=14, fontweight='bold')
    axes[0].set_xlabel('Month')
    axes[0].set_ylabel('Total Rentals')
    axes[0].legend(title='Year')
    axes[0].grid(axis='y', alpha=0.3)
    axes[0].tick_params(axis='x', rotation=0)
    
    # Growth rate
    if 'growth_rate' in yoy_pivot.columns:
        axes[1].bar(yoy_pivot.index, yoy_pivot['growth_rate'], 
                   color=['green' if x > 0 else 'red' for x in yoy_pivot['growth_rate']])
        axes[1].axhline(y=0, color='black', linestyle='-', linewidth=0.5)
        axes[1].set_title('Monthly Growth Rate (%)', fontsize=14, fontweight='bold')
        axes[1].set_xlabel('Month')
        axes[1].set_ylabel('Growth Rate (%)')
        axes[1].grid(axis='y', alpha=0.3)
    
    plt.tight_layout()
    plt.show()

## 9. Key Business Insights & Recommendations

Rangkuman insight strategis untuk bisnis bike sharing.

In [None]:
# Calculate key metrics
total_rentals = df['cnt'].sum()
avg_daily_rentals = daily_data['cnt'].mean()
peak_hour = df.groupby('hr')['cnt'].mean().idxmax()
peak_hour_avg = df.groupby('hr')['cnt'].mean().max()
best_season = df.groupby('season_label')['cnt'].mean().idxmax()
best_weather = df.groupby('weather_label')['cnt'].mean().idxmax()
casual_pct = (df['casual'].sum() / total_rentals * 100)
registered_pct = (df['registered'].sum() / total_rentals * 100)

# Temperature sweet spot
temp_bins = pd.cut(df['temp_celsius'], bins=10)
temp_rentals = df.groupby(temp_bins)['cnt'].mean().sort_values(ascending=False)
optimal_temp = temp_rentals.index[0]

print("="*100)
print(" " * 35 + "KEY BUSINESS INSIGHTS")
print("="*100)
print(f"\n1. VOLUME METRICS:")
print(f"   ‚Ä¢ Total Rentals: {total_rentals:,}")
print(f"   ‚Ä¢ Average Daily Rentals: {avg_daily_rentals:,.0f}")
print(f"   ‚Ä¢ Peak Hour: {peak_hour}:00 with avg {peak_hour_avg:.0f} rentals")

print(f"\n2. USER SEGMENTATION:")
print(f"   ‚Ä¢ Registered Users: {registered_pct:.1f}% of total rentals")
print(f"   ‚Ä¢ Casual Users: {casual_pct:.1f}% of total rentals")
print(f"   ‚Ä¢ Strategy: Focus on converting casual to registered users")

print(f"\n3. TEMPORAL PATTERNS:")
print(f"   ‚Ä¢ Best Season: {best_season}")
print(f"   ‚Ä¢ Peak Hours: 7-9 AM and 5-7 PM (commuting hours)")
print(f"   ‚Ä¢ Weekend vs Weekday: Different usage patterns detected")

print(f"\n4. WEATHER IMPACT:")
print(f"   ‚Ä¢ Best Weather: {best_weather}")
print(f"   ‚Ä¢ Optimal Temperature: {optimal_temp}")
print(f"   ‚Ä¢ High correlation: Temperature (0.4+), Clear weather preferred")

print(f"\n5. STRATEGIC RECOMMENDATIONS:")
print(f"   ‚úì Increase bike availability during peak hours (7-9 AM, 5-7 PM)")
print(f"   ‚úì Implement dynamic pricing based on demand patterns")
print(f"   ‚úì Focus marketing on Fall season (highest usage)")
print(f"   ‚úì Develop weather-based promotions")
print(f"   ‚úì Create loyalty programs to convert casual to registered users")
print(f"   ‚úì Optimize bike redistribution using clustering insights")
print("\n" + "="*100)

## 10. Export Data untuk Dashboard

Menyimpan data yang sudah diproses untuk digunakan di dashboard Streamlit.

In [None]:
# Save processed data
df.to_csv('processed_hour.csv', index=False)
daily_data.to_csv('processed_daily.csv')
cluster_features.to_csv('cluster_analysis.csv', index=False)

print("Data berhasil disimpan untuk dashboard:")
print("‚úì processed_hour.csv")
print("‚úì processed_daily.csv")
print("‚úì cluster_analysis.csv")

## üìù Conclusion

Analisis ini telah berhasil mengungkap berbagai insight penting:

### Temuan Utama:
1. **Pola Temporal**: Terdapat pola yang jelas dengan peak hours di pagi (7-9) dan sore (17-19)
2. **Segmentasi Pengguna**: Registered users mendominasi dengan pola commuting yang konsisten
3. **Pengaruh Cuaca**: Temperatur dan kondisi cuaca memiliki pengaruh signifikan terhadap rental
4. **Clustering**: Berhasil mengidentifikasi 4 cluster waktu dengan karakteristik berbeda
5. **Pertumbuhan**: Tren positif year-over-year menunjukkan adopsi yang meningkat

### Teknik Analisis Lanjutan yang Digunakan:
- ‚úÖ Time Series Decomposition
- ‚úÖ K-Means Clustering
- ‚úÖ Principal Component Analysis (PCA)
- ‚úÖ Correlation Analysis dengan Statistical Testing
- ‚úÖ Pattern Mining

### Next Steps:
- Dashboard interaktif untuk visualisasi real-time
- Predictive modeling untuk demand forecasting
- A/B testing untuk strategi pricing

---
**Analyst**: Data Science Team  
**Date**: 2024  
**Tools**: Python, Pandas, Scikit-learn, Statsmodels, Matplotlib, Seaborn