 ============================================================
 Bike Sharing Demand – Data Overview
 ============================================================

 Objective:
 - Load the Bike Sharing Demand dataset
 - Understand the dataset structure
 - Identify the target variable and feature types
 - Frame the machine learning problem

 Important Note:
 - No preprocessing in this notebook
 - No feature engineering in this notebook
 - No model training or evaluation in this notebook
 - This notebook is strictly for data understanding only

 Machine Learning Framing:
 - Learning Type: Supervised Learning
 - Task Type: Regression
 - Target Variable: count (total number of bike rentals)
 - Input Features: weather, season, datetime-related variables, etc.
 - Problem Nature: Real-world, non-linear demand prediction

In [5]:
import numpy as np
import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 120)

In [15]:
train_path="../../datasets/bike_sharing/train.csv"
test_path="../../datasets/bike_sharing/test.csv"
train_df=pd.read_csv(train_path)
test_df=pd.read_csv(test_path)
train_df.head()

Unnamed: 0,datetime,season,holiday,workingday,weather,temp,atemp,humidity,windspeed,casual,registered,count
0,2011-01-01 00:00:00,1,0,0,1,9.84,14.395,81,0.0,3,13,16
1,2011-01-01 01:00:00,1,0,0,1,9.02,13.635,80,0.0,8,32,40
2,2011-01-01 02:00:00,1,0,0,1,9.02,13.635,80,0.0,5,27,32
3,2011-01-01 03:00:00,1,0,0,1,9.84,14.395,75,0.0,3,10,13
4,2011-01-01 04:00:00,1,0,0,1,9.84,14.395,75,0.0,0,1,1


In [12]:
train_df.shape, test_df.shape

((10886, 12), (6493, 9))

In [13]:
train_df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   datetime    10886 non-null  object 
 1   season      10886 non-null  int64  
 2   holiday     10886 non-null  int64  
 3   workingday  10886 non-null  int64  
 4   weather     10886 non-null  int64  
 5   temp        10886 non-null  float64
 6   atemp       10886 non-null  float64
 7   humidity    10886 non-null  int64  
 8   windspeed   10886 non-null  float64
 9   casual      10886 non-null  int64  
 10  registered  10886 non-null  int64  
 11  count       10886 non-null  int64  
dtypes: float64(3), int64(8), object(1)
memory usage: 1020.7+ KB


# Target Variable

- **count** → total number of bike rentals
- Continuous numerical variable

This is a **supervised regression problem**.


In [28]:
numerical_features = train_df.select_dtypes(include=[np.number]).columns.tolist()
categorical_features=train_df.select_dtypes(exclude=[np.number]).columns.tolist()
numerical_features

['season',
 'holiday',
 'workingday',
 'weather',
 'temp',
 'atemp',
 'humidity',
 'windspeed',
 'casual',
 'registered',
 'count']

In [29]:
categorical_features

['datetime']