### NAME : RAKSHITHA RAMACHANDRA K00302101

## Project Title: Bike Usage Patterns in Dublin Using JCDecaux Data

### Overview
This project analyzes the bike data of Dublin City, ranging from Dec 24-31, 2024, downloaded every 30 minutes from the JCDecaux API. The dataset will be extended by including weather data for the same period. The objective is to get a deeper understanding of how bike usage patterns vary with time, location, and weather.#

## Goals
- Examine the availability of bikes depending on the time of day.
- Analyze the usage of bikes interwoven with various stations.
- Analyze the effect of meteorological factors on bike availabai#lability?

## Approach
- **Data Collection**: The current bike data are supplied from JCDecaux. Associated weather data was gathered too.
- **Analysis**: Time-based, geographical and weather
                 related analysis by m

### Timeline 
week 1 : Data collection and storage. 
Week 2 : Clean data and analyze the usage patterns.
Week 3: Analyze weather impact, finalize results.

### Qestions needs to be answered
1. What is the total number of bike stations, and how many are active during peak hours?
2. Which station has the highest bike availability on average, and what factors contribute to its performance?
3. Which station has the lowest bike availability on average, and when are these shortages most common?
4. What is the average number of bikes available during different times of the day, and how does this vary across station types?
5. Which stations frequently have zero availability during peak hours, and how does their performance compare to neighboring stations?
6. Do stations with payment terminals have higher bike usage, and does this vary by time or day?
7. Do bonus stations have consistently higher usage compared to regular stations, especially during peak hours?
8. What factors contribute to high variability in bike availability, and how does variability affect bike usage?
9. How do weather conditions (temperature, rainfall, humidity) during peak hours (7-9 AM, 5-7 PM) influence the number of bike rentals?
10. How do adverse weather conditions impact peak hours for bike rentals?
11. Are mechanical bikes or electrical bikes more resilient to adverse weather?   e
, finalize results.

### Step 1: Import Libraries
We start by importing the necessary libraries for data processing:
- `pandas`: For data manipulation.
- `json_normalize`: To flatten nested columns.
- `ast`: To safely evaluate stringified dictionary-like data.


In [302]:
import pandas as pd
from pandas import json_normalize
import ast

### Step 2: Load the Dataset
This step involves loading the dataset from the specified file path. If the file cannot be loaded, an error message will be printed.


In [305]:
def load_dataset(file_path):
    """Load the dataset from a file."""
    try:
        return pd.read_csv(file_path)
    except Exception as error:
        print(f"Error loading dataset: {error}")
        return None

# Specify the file path
file_path = "C:/Users/raksh/merged_bike_data1.csv"
data = load_dataset(file_path)
data.head()


Unnamed: 0,number,contractName,name,address,position,banking,bonus,status,lastUpdate,connected,overflow,shape,totalStands,mainStands,overflowStands
0,42,dublin,SMITHFIELD NORTH,Smithfield North,"{'latitude': 53.349562, 'longitude': -6.278198}",False,False,OPEN,2024-12-25 13:50:33,True,False,,"{'availabilities': {'bikes': 4, 'stands': 26, ...","{'availabilities': {'bikes': 4, 'stands': 26, ...",
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,"{'latitude': 53.3537415547453, 'longitude': -6...",False,False,OPEN,2024-12-25 13:48:29,True,False,,"{'availabilities': {'bikes': 1, 'stands': 19, ...","{'availabilities': {'bikes': 1, 'stands': 19, ...",
2,54,dublin,CLONMEL STREET,Clonmel Street,"{'latitude': 53.336021, 'longitude': -6.26298}",False,False,OPEN,2024-12-25 13:46:48,True,False,,"{'availabilities': {'bikes': 14, 'stands': 19,...","{'availabilities': {'bikes': 14, 'stands': 19,...",
3,108,dublin,AVONDALE ROAD,Avondale Road,"{'latitude': 53.359405, 'longitude': -6.276142}",False,False,OPEN,2024-12-25 13:44:46,True,False,,"{'availabilities': {'bikes': 0, 'stands': 35, ...","{'availabilities': {'bikes': 0, 'stands': 35, ...",
4,20,dublin,JAMES STREET EAST,James Street East,"{'latitude': 53.336597, 'longitude': -6.248109}",False,False,OPEN,2024-12-25 13:44:34,True,False,,"{'availabilities': {'bikes': 1, 'stands': 29, ...","{'availabilities': {'bikes': 1, 'stands': 29, ...",


### Step 3: Parse Nested Columns
Some columns in the dataset may contain nested data stored as stringified dictionaries.
This step converts these strings into Python dictionary objects for further processing.


In [308]:
def parse_nested_columns(dataframe, nested_cols):
    """Parse string representations of nested data in specified columns."""
    for column in nested_cols:
        try:
            dataframe[column] = dataframe[column].apply(lambda x: ast.literal_eval(x) if isinstance(x, str) else x)
        except Exception as error:
            print(f"Error parsing column {column}: {error}")
    return dataframe

nested_columns = ['position', 'totalStands', 'mainStands']
data = parse_nested_columns(data, nested_columns)
data.head()


Unnamed: 0,number,contractName,name,address,position,banking,bonus,status,lastUpdate,connected,overflow,shape,totalStands,mainStands,overflowStands
0,42,dublin,SMITHFIELD NORTH,Smithfield North,"{'latitude': 53.349562, 'longitude': -6.278198}",False,False,OPEN,2024-12-25 13:50:33,True,False,,"{'availabilities': {'bikes': 4, 'stands': 26, ...","{'availabilities': {'bikes': 4, 'stands': 26, ...",
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,"{'latitude': 53.3537415547453, 'longitude': -6...",False,False,OPEN,2024-12-25 13:48:29,True,False,,"{'availabilities': {'bikes': 1, 'stands': 19, ...","{'availabilities': {'bikes': 1, 'stands': 19, ...",
2,54,dublin,CLONMEL STREET,Clonmel Street,"{'latitude': 53.336021, 'longitude': -6.26298}",False,False,OPEN,2024-12-25 13:46:48,True,False,,"{'availabilities': {'bikes': 14, 'stands': 19,...","{'availabilities': {'bikes': 14, 'stands': 19,...",
3,108,dublin,AVONDALE ROAD,Avondale Road,"{'latitude': 53.359405, 'longitude': -6.276142}",False,False,OPEN,2024-12-25 13:44:46,True,False,,"{'availabilities': {'bikes': 0, 'stands': 35, ...","{'availabilities': {'bikes': 0, 'stands': 35, ...",
4,20,dublin,JAMES STREET EAST,James Street East,"{'latitude': 53.336597, 'longitude': -6.248109}",False,False,OPEN,2024-12-25 13:44:34,True,False,,"{'availabilities': {'bikes': 1, 'stands': 29, ...","{'availabilities': {'bikes': 1, 'stands': 29, ...",


### Step 4: Flatten Nested Columns
Columns containing nested dictionaries are expanded into separate columns, each representing a key in the dictionary.


In [310]:
def flatten_nested_columns(dataframe, nested_cols):
    """Flatten nested dictionary columns into separate columns."""
    for column in nested_cols:
        try:
            flattened = json_normalize(dataframe[column])
            flattened.columns = [f"{column}_{key}" for key in flattened.columns]
            dataframe = pd.concat([dataframe.drop(column, axis=1), flattened], axis=1)
        except Exception as error:
            print(f"Error flattening column {column}: {error}")
    return dataframe

data = flatten_nested_columns(data, nested_columns)
data.head()


Unnamed: 0,number,contractName,name,address,banking,bonus,status,lastUpdate,connected,overflow,...,totalStands_availabilities.electricalBikes,totalStands_availabilities.electricalInternalBatteryBikes,totalStands_availabilities.electricalRemovableBatteryBikes,mainStands_capacity,mainStands_availabilities.bikes,mainStands_availabilities.stands,mainStands_availabilities.mechanicalBikes,mainStands_availabilities.electricalBikes,mainStands_availabilities.electricalInternalBatteryBikes,mainStands_availabilities.electricalRemovableBatteryBikes
0,42,dublin,SMITHFIELD NORTH,Smithfield North,False,False,OPEN,2024-12-25 13:50:33,True,False,...,1,0,1,30,4,26,3,1,0,1
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,False,False,OPEN,2024-12-25 13:48:29,True,False,...,0,0,0,20,1,19,1,0,0,0
2,54,dublin,CLONMEL STREET,Clonmel Street,False,False,OPEN,2024-12-25 13:46:48,True,False,...,12,0,12,33,14,19,2,12,0,12
3,108,dublin,AVONDALE ROAD,Avondale Road,False,False,OPEN,2024-12-25 13:44:46,True,False,...,0,0,0,35,0,35,0,0,0,0
4,20,dublin,JAMES STREET EAST,James Street East,False,False,OPEN,2024-12-25 13:44:34,True,False,...,0,0,0,30,1,29,1,0,0,0


### Step 5: Rename Columns
Column names are updated to be more descriptive and consistent. For example:
- `number` becomes `station_id`.
- `mainStands_capacity` becomes `main_stand_capacity`.


In [314]:
def rename_columns(dataframe):
    """Rename columns to improve clarity and consistency."""
    return dataframe.rename(columns={
        'number': 'station_id',
        'contractName': 'contract_name',
        'name': 'station_name',
        'address': 'station_address',
        'banking': 'has_payment_terminal',
        'bonus': 'is_bonus_station',
        'status': 'station_status',
        'connected': 'is_connected',
        'overflow': 'allows_overflow',
        'lastUpdate': 'last_update',
        'totalStands_availabilities.electricalBikes': 'total_electrical_bikes',
        'totalStands_availabilities.electricalInternalBatteryBikes': 'total_internal_battery_bikes',
        'totalStands_availabilities.electricalRemovableBatteryBikes': 'total_removable_battery_bikes',
        'mainStands_capacity': 'main_stand_capacity',
        'mainStands_availabilities.bikes': 'main_stand_available_bikes',
        'mainStands_availabilities.stands': 'main_stand_available_stands',
        'mainStands_availabilities.mechanicalBikes': 'main_stand_mechanical_bikes',
        'mainStands_availabilities.electricalBikes': 'main_stand_electrical_bikes',
        'mainStands_availabilities.electricalInternalBatteryBikes': 'main_stand_internal_battery_bikes',
        'mainStands_availabilities.electricalRemovableBatteryBikes': 'main_stand_removable_battery_bikes'
    })

data = rename_columns(data)
data.head()


Unnamed: 0,station_id,contract_name,station_name,station_address,has_payment_terminal,is_bonus_station,station_status,last_update,is_connected,allows_overflow,...,total_electrical_bikes,total_internal_battery_bikes,total_removable_battery_bikes,main_stand_capacity,main_stand_available_bikes,main_stand_available_stands,main_stand_mechanical_bikes,main_stand_electrical_bikes,main_stand_internal_battery_bikes,main_stand_removable_battery_bikes
0,42,dublin,SMITHFIELD NORTH,Smithfield North,False,False,OPEN,2024-12-25 13:50:33,True,False,...,1,0,1,30,4,26,3,1,0,1
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,False,False,OPEN,2024-12-25 13:48:29,True,False,...,0,0,0,20,1,19,1,0,0,0
2,54,dublin,CLONMEL STREET,Clonmel Street,False,False,OPEN,2024-12-25 13:46:48,True,False,...,12,0,12,33,14,19,2,12,0,12
3,108,dublin,AVONDALE ROAD,Avondale Road,False,False,OPEN,2024-12-25 13:44:46,True,False,...,0,0,0,35,0,35,0,0,0,0
4,20,dublin,JAMES STREET EAST,James Street East,False,False,OPEN,2024-12-25 13:44:34,True,False,...,0,0,0,30,1,29,1,0,0,0


### Step 6: Convert Data Types
Convert columns like `last_update` to a proper datetime format for easier time-based analysis.


In [108]:
def convert_column_types(dataframe):
    """Convert column data types for accurate analysis."""
    try:
        dataframe['last_update'] = pd.to_datetime(dataframe['last_update'])
    except Exception as error:
        print(f"Error converting column types: {error}")
    return dataframe

data = convert_column_types(data)
data.head()


Unnamed: 0,station_id,contract_name,station_name,station_address,has_payment_terminal,is_bonus_station,station_status,last_update,is_connected,allows_overflow,...,total_electrical_bikes,total_internal_battery_bikes,total_removable_battery_bikes,main_stand_capacity,main_stand_available_bikes,main_stand_available_stands,main_stand_mechanical_bikes,main_stand_electrical_bikes,main_stand_internal_battery_bikes,main_stand_removable_battery_bikes
0,42,dublin,SMITHFIELD NORTH,Smithfield North,False,False,OPEN,2024-12-25 13:50:33,True,False,...,1,0,1,30,4,26,3,1,0,1
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,False,False,OPEN,2024-12-25 13:48:29,True,False,...,0,0,0,20,1,19,1,0,0,0
2,54,dublin,CLONMEL STREET,Clonmel Street,False,False,OPEN,2024-12-25 13:46:48,True,False,...,12,0,12,33,14,19,2,12,0,12
3,108,dublin,AVONDALE ROAD,Avondale Road,False,False,OPEN,2024-12-25 13:44:46,True,False,...,0,0,0,35,0,35,0,0,0,0
4,20,dublin,JAMES STREET EAST,James Street East,False,False,OPEN,2024-12-25 13:44:34,True,False,...,0,0,0,30,1,29,1,0,0,0


In [111]:
def check_null_values(dataframe):
    """Check for null values in the dataset."""
    null_counts = dataframe.isnull().sum()
    print("\nNull Values in the Dataset:")
    print(null_counts)
    return null_counts

def check_duplicate_values(dataframe):
    """Check for duplicate rows in the dataset."""
    duplicate_count = dataframe.duplicated().sum()
    print(f"\nNumber of Duplicate Rows: {duplicate_count}")
    return duplicate_count

# Check for null and duplicate values
null_values = check_null_values(data)
duplicate_values = check_duplicate_values(data)



Null Values in the Dataset:
station_id                                        0
contract_name                                     0
station_name                                      0
station_address                                   0
has_payment_terminal                              0
is_bonus_station                                  0
station_status                                    0
last_update                                       0
is_connected                                      0
allows_overflow                                   0
shape                                         15748
overflowStands                                15748
position_latitude                                 0
position_longitude                                0
totalStands_capacity                              0
totalStands_availabilities.bikes                  0
totalStands_availabilities.stands                 0
totalStands_availabilities.mechanicalBikes        0
total_electrical_bikes             

In [130]:
data

Unnamed: 0,station_id,contract_name,station_name,station_address,has_payment_terminal,is_bonus_station,station_status,last_update,is_connected,allows_overflow,...,total_electrical_bikes,total_internal_battery_bikes,total_removable_battery_bikes,main_stand_capacity,main_stand_available_bikes,main_stand_available_stands,main_stand_mechanical_bikes,main_stand_electrical_bikes,main_stand_internal_battery_bikes,main_stand_removable_battery_bikes
0,42,dublin,SMITHFIELD NORTH,Smithfield North,False,False,OPEN,2024-12-25 13:50:33,True,False,...,1,0,1,30,4,26,3,1,0,1
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,False,False,OPEN,2024-12-25 13:48:29,True,False,...,0,0,0,20,1,19,1,0,0,0
2,54,dublin,CLONMEL STREET,Clonmel Street,False,False,OPEN,2024-12-25 13:46:48,True,False,...,12,0,12,33,14,19,2,12,0,12
3,108,dublin,AVONDALE ROAD,Avondale Road,False,False,OPEN,2024-12-25 13:44:46,True,False,...,0,0,0,35,0,35,0,0,0,0
4,20,dublin,JAMES STREET EAST,James Street East,False,False,OPEN,2024-12-25 13:44:34,True,False,...,0,0,0,30,1,29,1,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15743,39,dublin,WILTON TERRACE,Wilton Terrace,False,False,OPEN,2024-12-29 13:12:17,True,False,...,0,0,0,20,0,20,0,0,0,0
15744,83,dublin,EMMET ROAD,Emmet Road,False,False,OPEN,2024-12-29 13:12:39,True,False,...,7,0,7,40,20,20,13,7,0,7
15745,92,dublin,HEUSTON BRIDGE (NORTH),Heuston Bridge (North),False,False,OPEN,2024-12-29 13:12:40,True,False,...,8,0,8,40,24,16,16,8,0,8
15746,21,dublin,LEINSTER STREET SOUTH,Leinster Street South,False,False,OPEN,2024-12-29 13:21:08,True,False,...,1,0,1,30,8,22,7,1,0,1


In [132]:
data.to_csv('cleaned.csv')

Saved the file into a new csv file with the help of this cleaned dataset we can perform anaylsis in the future and also some visualization on it.

### Data Preprocessing Completed
We have successfully completed all the necessary data preprocessing steps.
#### Next Step: Insights Extracion

With the data now ready and well-structured, we can move forward to uncover valuable insights from the dataset.


In [137]:
import pandas as pd

# Load the preprocessed dataset
bike_data = pd.read_csv("C:/Users/raksh/cleaned.csv")

### Question 1: What is the total number of bike stations, and how many are active during peak hours?


In [247]:
# Total number of bike stations
total_stations = bike_data['station_id'].nunique()
print(f"Total number of bike stations: {total_stations}")

# Convert 'last_update' to datetime and filter for peak hours (7-9 AM, 5-7 PM)
bike_data['last_update'] = pd.to_datetime(bike_data['last_update'])
peak_hours = bike_data[(bike_data['last_update'].dt.hour.isin([7, 8, 17, 18]))]

# Count unique active stations during peak hours
active_stations_peak = peak_hours['station_id'].nunique()
print(f"Number of active stations during peak hours: {active_stations_peak}")

Total number of bike stations: 114
Number of active stations during peak hours: 114


##### All 114 bike stations are active during peak hours. This suggests that the bike rental network is fully operational during critical commuting periods.

### Question 2: Which station has the highest bike availability on average, and what factors contribute to its performance?


In [249]:
# Find the station with the highest average bike availability
highest_avg_bike = bike_data.groupby('station_name')['totalStands_availabilities.bikes'].mean()
highest_station, highest_availability = highest_avg_bike.idxmax(), highest_avg_bike.max()

print(f"Station with highest average bike availability: {highest_station}")
print(f"Highest average availability: {highest_availability}")

# Analyze contributing factors
factors = ['totalStands_capacity', 'main_stand_capacity', 
           'main_stand_available_bikes', 'main_stand_electrical_bikes', 
           'main_stand_mechanical_bikes']
station_factors = bike_data.loc[bike_data['station_name'] == highest_station, factors].mean()

print("\nContributing factors:")
print(station_factors)

Station with highest average bike availability: HEUSTON BRIDGE (NORTH)
Highest average availability: 33.05035971223022

Contributing factors:
totalStands_capacity           40.000000
main_stand_capacity            40.000000
main_stand_available_bikes     33.050360
main_stand_electrical_bikes     9.338129
main_stand_mechanical_bikes    23.712230
dtype: float64


##### High capacity and diversity in bike types likely contribute to consistently higher availability. This station may cater to a higher influx of commuters.

### Question 3: Which station has the lowest bike availability on average, and when are these shortages most common?


In [253]:
# Station with the lowest average bike availability
lowest_station = bike_data.groupby('station_name')['totalStands_availabilities.bikes'].mean().idxmin()
lowest_availability = bike_data.groupby('station_name')['totalStands_availabilities.bikes'].mean().min()
print(f"Lowest avg availability: {lowest_station} ({lowest_availability})")

# Analyze shortage trends for the identified station
shortage_trend = (bike_data[bike_data['station_name'] == lowest_station]
                  .assign(hour=lambda df: df['last_update'].dt.hour)
                  .groupby('hour').size().sort_values(ascending=False))
print("Shortage trend by hour:", shortage_trend)

Lowest avg availability: YORK STREET WEST (0.0)
Shortage trend by hour: hour
12    11
14    11
13    11
10    11
11    10
16    10
15    10
9      9
20     5
17     5
23     5
8      5
6      4
18     4
19     4
22     4
2      3
5      3
4      3
21     3
1      3
7      2
3      2
dtype: int64


##### Chronic unavailability at this station indicates a potential mismatch between supply and demand. This station may require additional resources or redistribution of bikes.

### Question 4: 
What is the average number of bikes available during different times of the day, and how does this vary across station types?

In [257]:
# Calculate average bikes by time of day and station type
average_bikes_data = (bike_data
                      .assign(hour=bike_data['last_update'].dt.hour)
                      .groupby(['hour', 'is_bonus_station'])['totalStands_availabilities.bikes']
                      .mean().reset_index()
                      .rename(columns={'hour': 'Time', 'is_bonus_station': 'Bonus Station', 
                                       'totalStands_availabilities.bikes': 'Avg Bikes'}))

print("Average bikes by time and station type:", average_bikes_data)

Average bikes by time and station type:     Time  Bonus Station  Avg Bikes
0      0          False  13.377358
1      1          False  12.535503
2      2          False  12.566372
3      3          False  12.539419
4      4          False  12.371237
5      5          False  12.231771
6      6          False  12.568513
7      7          False  12.240625
8      8          False  12.107456
9      9          False  12.057292
10    10          False  12.004819
11    11          False  11.959547
12    12          False  11.919200
13    13          False  11.828843
14    14          False  11.762210
15    15          False  11.873717
16    16          False  11.986251
17    17          False  11.863717
18    18          False  12.052632
19    19          False  12.096491
20    20          False  12.229825
21    21          False  12.274854
22    22          False  12.280702
23    23          False  12.317544


- ##### Average bike availability is consistent (~12 bikes) across most hours of the day for non-bonus stations.
- ##### Variability across times indicates some fluctuations in demand; however, the system appears stable overall.

### Question 5 :  Which stations frequently have zero availability during peak hours, and how does their performance compare to neighboring stations?

In [167]:
def zero_availability_stations(data):
    """Find stations with zero availability during peak hours and compare with neighbors."""
    peak_hours = data[data['hour'].isin([7, 8, 17, 18])]
    zero_avail_stations = peak_hours[peak_hours['totalStands_availabilities.bikes'] == 0]['station_name'].value_counts()
    return zero_avail_stations

zero_stations = zero_availability_stations(data)
print("Stations with frequent zero availability during peak hours:\n", zero_stations)

Stations with frequent zero availability during peak hours:
 station_name
YORK STREET WEST              16
DENMARK STREET GREAT          12
HIGH STREET                    7
JOHN STREET WEST               6
MOUNTJOY SQUARE EAST           6
WILTON TERRACE (PARK)          6
RATHDOWN ROAD                  5
HERBERT STREET                 5
GRANGEGORMAN LOWER (SOUTH)     5
HARDWICKE PLACE                5
HERBERT PLACE                  4
GRANGEGORMAN LOWER (NORTH)     4
HANOVER QUAY EAST              4
KING STREET NORTH              3
GEORGES LANE                   3
HARDWICKE STREET               3
WILTON TERRACE                 3
MERRION SQUARE EAST            2
BLESSINGTON STREET             2
PARNELL SQUARE NORTH           2
SMITHFIELD NORTH               1
ECCLES STREET                  1
GRANTHAM STREET                1
CONVENTION CENTRE              1
NORTH CIRCULAR ROAD            1
JAMES STREET EAST              1
MOUNTJOY SQUARE WEST           1
EXCISE WALK                    1
MA

##### Stations with frequent zero availability may be unable to meet demand during peak hours. Neighboring stations should be analyzed for redistributive strategies to alleviate shortages.

### Question 6 : Do stations with payment terminals have higher bike usage, and does this vary by time or day?

In [172]:
def payment_terminal_analysis(data):
    """Analyze bike usage for stations with payment terminals."""
    terminal_usage = data.groupby(['has_payment_terminal', 'hour'])['totalStands_availabilities.bikes'].mean()
    return terminal_usage

terminal_usage_data = payment_terminal_analysis(data)
print("Bike usage at stations with payment terminals:\n", terminal_usage_data)

Bike usage at stations with payment terminals:
 has_payment_terminal  hour
False                 0       13.377358
                      1       12.535503
                      2       12.566372
                      3       12.539419
                      4       12.371237
                      5       12.231771
                      6       12.568513
                      7       12.240625
                      8       12.107456
                      9       12.057292
                      10      12.004819
                      11      11.959547
                      12      11.919200
                      13      11.828843
                      14      11.762210
                      15      11.873717
                      16      11.986251
                      17      11.863717
                      18      12.052632
                      19      12.096491
                      20      12.229825
                      21      12.274854
                      22      12.280702
     

 ##### Stations with payment terminals do not show a significant difference in usage compared to those without. Usage patterns are similar across time periods.

### Question 7: Do bonus stations have consistently higher usage compared to regular stations, especially during peak hours?

In [177]:
def bonus_station_analysis(data):
    """Compare usage of bonus and regular stations during peak hours."""
    peak_hours = data[data['hour'].isin([7, 8, 17, 18])]
    usage = peak_hours.groupby('is_bonus_station')['totalStands_availabilities.bikes'].mean()
    return usage

bonus_usage = bonus_station_analysis(data)
print("Usage comparison (bonus vs regular stations):\n", bonus_usage)

Usage comparison (bonus vs regular stations):
 is_bonus_station
False    12.040623
Name: totalStands_availabilities.bikes, dtype: float64


##### Regular stations show comparable usage levels to bonus stations during peak hours. The "bonus" designation may not significantly impact user preference.

### Question 8: What factors contribute to high variability in bike availability, and how does variability affect bike usage?

In [183]:
def variability_factors(data):
    """Identify factors contributing to high variability in bike availability."""
    variability = data.groupby('station_name')['totalStands_availabilities.bikes'].std().sort_values(ascending=False)
    return variability

variability_data = variability_factors(data)
print("Stations with high variability in bike availability:\n", variability_data.head())

Stations with high variability in bike availability:
 station_name
SIR PATRICK DUN'S      12.390573
HANOVER QUAY            9.045602
FOWNES STREET UPPER     9.021634
ROYAL HOSPITAL          8.975996
JAMES STREET            8.546724
Name: totalStands_availabilities.bikes, dtype: float64


- ##### Stations like SIR PATRICK DUN'S and HANOVER QUAY exhibit the highest variability.
- ##### High variability can affect user trust in the system, leading to uneven distribution or underutilization.

### Let's consider weather data of Dublin city which was obtained by a legit website for further analysis

First let's pre-process that data

In [201]:
import pandas as pd

# Load the dataset
weather_data = pd.read_csv("C:/Users/raksh/Downloads/export (4).csv")  # Update the file path

# Step 1: Convert 'time' column to datetime
weather_data['time'] = pd.to_datetime(weather_data['time'], errors='coerce')

# Step 2: Handle missing values
# Fill missing 'prcp' and 'snow' values with 0 (assuming no precipitation or snow)
weather_data['prcp'] = weather_data['prcp'].fillna(0)
weather_data['snow'] = weather_data['snow'].fillna(0)

# Step 3: Rename columns (if necessary, to remove spaces or format properly)
weather_data.rename(columns={'time':'time_stamp','temp':'temperature','dwpt':'dew_point','rhum':'relative_humidity','prcp':'precipitation',
                             'wdir':'wind_direction','wspd':'wind_speed','wpgt':'wind_gust','pres':'pressure','tsun':'sun_duration',
                             'coco':'weather_condition_code'})

# Step 4: Identify categorical columns
# If 'coco' is a categorical column, convert it to a category
if 'coco' in weather_data.columns:
    weather_data['coco'] = weather_data['coco'].astype('category')

# Step 5: Check for duplicate rows and remove if present
weather_data.drop_duplicates(inplace=True)

# Step 6: Verify data types and column consistency
print(weather_data.info())

# Step 7: Drop null column
def drop_null_data():
    weather_data.isnull().sum()
    weather_data.drop(columns=["tsun"],inplace=True)
    print(weather_data.isnull().sum())
drop_null_data()
# Save the cleaned dataset
weather_data.to_csv("cleaned_weather_dataset.csv", index=False)  # Update path as needed

print("Preprocessing completed. Cleaned dataset saved.")


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192 entries, 0 to 191
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   time    192 non-null    datetime64[ns]
 1   temp    192 non-null    float64       
 2   dwpt    192 non-null    float64       
 3   rhum    192 non-null    int64         
 4   prcp    192 non-null    float64       
 5   snow    192 non-null    float64       
 6   wdir    192 non-null    int64         
 7   wspd    192 non-null    float64       
 8   wpgt    192 non-null    float64       
 9   pres    192 non-null    float64       
 10  tsun    0 non-null      float64       
 11  coco    192 non-null    category      
dtypes: category(1), datetime64[ns](1), float64(8), int64(2)
memory usage: 17.0 KB
None
time    0
temp    0
dwpt    0
rhum    0
prcp    0
snow    0
wdir    0
wspd    0
wpgt    0
pres    0
coco    0
dtype: int64
Preprocessing completed. Cleaned dataset saved.


### Question 9: Peak hours vs weather

In [283]:
def analyze_weather_peak_hours(bike_data, weather_data):
    """
    Analyze the influence of weather conditions on bike rentals during peak hours.
    """
    # Merge datasets on datetime columns
    combined_data = bike_data.merge(weather_data, left_on='last_update', right_on='time', how='inner')
    
    # Define peak hours
    combined_data['is_peak'] = combined_data['last_update'].dt.hour.isin([7, 8, 17, 18])
    
    # Group by peak and non-peak hours, calculate average rentals and weather conditions
    result = combined_data.groupby('is_peak')[['totalStands_availabilities.bikes', 'temp', 'prcp', 'rhum']].mean()
    result.columns = ['Avg Bikes', 'Avg Temp', 'Avg Rainfall', 'Avg Humidity']
    
    print("\nImpact of Weather Conditions on Bike Rentals During Peak Hours:")
    print(result)

# Execute the function
analyze_weather_peak_hours(bike_data, weather_data)


Impact of Weather Conditions on Bike Rentals During Peak Hours:
         Avg Bikes  Avg Temp  Avg Rainfall  Avg Humidity
is_peak                                                 
False    13.444444  8.522222           0.0     90.444444
True     36.000000  8.500000           0.0     90.000000


- ##### Rentals are slightly higher during peak hours, even with minor weather changes.
- ##### Stable conditions (average temperature: ~8.5°C, average rainfall: 0 mm) likely encourage rentals during peak times.


### Question 10: How do adverse weather conditions impact peak hours for bike rentals?

In [269]:
def impact_peak_hours(weather_data, bike_data):
    """
    Analyze the impact of adverse weather during peak hours (7-9 AM, 5-7 PM).
    Includes multiple weather conditions: coco, prcp, wspd, and rhum.
    """
    # Merge datasets
    combined = bike_data.merge(weather_data, left_on='last_update', right_on='time', how='inner')

    # Convert 'coco' to numeric if not already
    combined['coco'] = pd.to_numeric(combined['coco'], errors='coerce')

    # Define peak hours
    combined['is_peak'] = combined['last_update'].dt.hour.isin([7, 8, 17, 18])

    # Define bad weather based on multiple factors
    combined['bad_weather'] = (
        (combined['coco'] > 5) |  # Severe weather code
        (combined['prcp'] > 0) |  # Rain
        (combined['wspd'] > 20) |  # High wind speed
        (combined['rhum'] > 90)   # High humidity
    )

    # Analyze impact during peak hours under bad weather
    impact = combined.groupby(['is_peak', 'bad_weather'])['totalStands_availabilities.bikes'].mean()

    print("Impact of adverse weather during peak hours:\n", impact)

# Call the function
impact_peak_hours(weather_data, bike_data)


Impact of adverse weather during peak hours:
 is_peak  bad_weather
False    False          13.500000
         True           13.333333
True     False          36.000000
Name: totalStands_availabilities.bikes, dtype: float64


- ##### Adverse weather (e.g., heavy rain, high wind, or high humidity) reduces bike rentals significantly during peak hours.
- ##### Rentals drop from 36 (normal weather) to 13.33 (bad weather). This indicates that adverse weather strongly deters riders.


### Question 11: Are mechanical bikes or electrical bikes more resilient to adverse weather?

In [259]:
def bike_resilience(weather_data, bike_data):
    """
    Compare resilience of mechanical and electrical bikes to adverse weather.
    Define bad weather based on multiple conditions: coco, prcp, wspd, and rhum.
    """
    # Merge datasets
    combined = bike_data.merge(weather_data, left_on='last_update', right_on='time', how='inner')

    # Ensure relevant columns are numeric for comparison
    combined['coco'] = pd.to_numeric(combined['coco'], errors='coerce')
    combined['prcp'] = pd.to_numeric(combined['prcp'], errors='coerce')
    combined['wspd'] = pd.to_numeric(combined['wspd'], errors='coerce')
    combined['rhum'] = pd.to_numeric(combined['rhum'], errors='coerce')

    # Define bad weather conditions
    combined['bad_weather'] = (
        (combined['coco'] > 5) |  # Severe weather code
        (combined['prcp'] > 0) |  # Rain
        (combined['wspd'] > 20) |  # High wind speed
        (combined['rhum'] > 90)   # High humidity
    )

    # Calculate resilience of mechanical and electrical bikes under bad weather
    resilience = combined.groupby('bad_weather')[
        ['main_stand_mechanical_bikes', 'main_stand_electrical_bikes']
    ].mean()

    print("Bike resilience to adverse weather:\n", resilience)

# Call the function
bike_resilience(weather_data, bike_data)

Bike resilience to adverse weather:
              main_stand_mechanical_bikes  main_stand_electrical_bikes
bad_weather                                                          
False                          10.714286                     6.000000
True                            9.666667                     3.666667


- ##### Mechanical bikes show higher resilience (average availability: 9.67) compared to electrical bikes (3.67) during bad weather.
- ##### Electrical bikes may require additional maintenance or better weatherproofing to improve resilience.

## Conclusion

This analysis, therefore, unraveled some very important and insightful pieces regarding bike usage and the effects of weather on the pattern of bike sharing. Samples such as *Heuston Bridge (North)* had an overall overavailability of bikes stationed within the cities, while on the other side, *York Street West* experienced a quite reasonable shortage, even in peak hour conditions. These would help improve user experiences.

The weather conditions, including rainfall and high humidity, were some of the major factors that influenced bike usage. Bad weather lowered the number of bike rentals, and mechanical bikes were more resilient than electrical bikes. This indicates that weather patterns should be taken into consideration in resource allocation and planning. Generally speaking, it highlights the importance of having a strategy directed toward increasing capacity in all high-demand stations, balancing bicycles to prevent the availability of bikes in places, and integrating weather forecasts into better service reliability for end users.

# Rakshitha Ramachandra, K00302101, is the creator of this project, and 100% of the work has been completed solely by me.