# Real-Time Data Analysis of Dublin's Bike Stations


## Submitted By,

### Vinothini Murugesan (K00302090)

### Helmi Kaittikkattil Abraham (K00302088)                                                                                               

### Date:31/12/2024

# Introduction

## Overview
This assignment focuses on analyzing the usage patterns and availability of bike stations in Dublin. The primary data source for this analysis is the real-time bike data, which was retreived through the API hosted by JCDecaux. This dataset provides up-to-date information on bike availability and station occupancy across the city.

## Dataset Description
The real-time bike dataset includes crucial attributes such as:
- **Station_ID**: A unique identifier for each bike station.
- **Station_Name**: The name of each bike station.
- **Total_No_of_bikes_stands**: The total number of bike stands available at each station.
- **Available_bikes**: The number of bikes available at each station at a given time.
- **Last_update**: The timestamp of the latest update for bike availability.

## Goals of the Analysis
The primary goals of this analysis are to:
1. **Identify Usage Patterns**: Determine peak usage times and patterns across different stations.
2. **Evaluate Availability**: Analyze the availability of bikes at various stations to understand supply and demand dynamics.
3. **Optimize Resource Allocation**: Provide recommendations for improving bike distribution and availability, enhancing user satisfaction.



# Task 1: Development Plan
## Objectives and Questions
The goal of this development plan is to outline the objectives of our analysis of Dublin's bike station data. By setting clear goals and posing relevant questions, we aim to uncover meaningful insights that can optimize the city's bike-sharing program.Our analysis will focus on the following key questions:

1. **Which Stations are Most Frequently Used?**
   To identify the bike stations with the highest usage rates.

3. **Are Stations with Banking Functionality More Frequently Used Than Others?** To Compare usage rates between stations with and without banking functionality.

4. **What are the Busiest Times of the Day and Days of the Week?** To determine peak usage times throughout the day and identify the busiest days of the week.days of the week.

5. **How Does the Availability of Bikes Compare to the Total Capacity of Stands?** To Analyze the ratio of available bikes to the total capacity of bike stands at each station.ands at each station.

6. **Does Bike Usage Increase During Weekends Compared to Weekdays?** To investigate if there is a significant increase in bike usage on weekends.n bike usage on weekends.

7. **Are There Specific Times of Day When Bikes are More Likely to be Unavailable?** To identify times of day with the highest likelihood of bike unavailability.ihood of bike unavailability.

8. **Are There Patterns Where Stations Run Out of Bikes Completely?** To examine the frequency and patterns of stations running out of bikes.of stations running out of bikes.

9. **What are the Peak Usage Times for Bike Stations (e.g., Morning Rush, Evening Rush)?** To identify the exact times during the day when bike usage peaks.

10. **Are There Specific Stations Consistently Underutilized or Overutilized?** To identify stations with consistently low or high usage rates.

11. **Which Station is the Busiest Each Day?** To determine the busiest station for each day of the week.

By addressing these questions, we aim to uncover actionable insights that will enhance the efficiency and effectiveness of Dublin's bike-sharing program. This development plan provides a clear and coherent roadmap for our analysis, ensuring that we achieve our objectives and contribute valuable information to city planners and program managers.

# Task 2: Data Collection

Below code will fetch the real time data for Bike_station and weather for Dublin city and loads into the two different files for further analysis.

In [1]:
#-----------------------------------------------
#Loading Bike Station Dataset
#-----------------------------------------------

import requests
import pandas as pd
import schedule
import time
import datetime

# Configuration for Bike station Dataset
# ------------------------------
API_KEY = "12981e305e23e649e9527e520de25c7c215c32ed"
CONTRACT_NAME = "Dublin"
BASE_URL = "https://api.jcdecaux.com/vls/v1/stations"
fetched_data = []

# Fetching Columns
ALL_COLUMNS = [
    "number", "contract_name", "name", "address", "position",
    "banking", "bonus", "bike_stands", "available_bike_stands",
    "available_bikes", "status", "last_update", "timestamp"
]
# Function to fetch bike station data
# -----------------------------------
def fetch_bike_station_data():
    """
    Fetching bike station data from JCDecaux API and storing it in a list along with timestamp.
    """
    url = f"{BASE_URL}?contract={CONTRACT_NAME}&apiKey={API_KEY}"
    response = requests.get(url)
    
    if response.status_code == 200:
        data = response.json()
        timestamp = datetime.datetime.now().isoformat()
        
        for station in data:
            station["timestamp"] = timestamp  # Adding timestamp to each record
            fetched_data.append(station)
            
        print(f"Data fetched at {timestamp}")
    else:
        print(f"Error occurred while fetching data: {response.status_code}, {response.text}")

In [2]:
# Function to save data to CSV with consistent columns
# ----------------------------------------------------
def save_to_csv():
    """
    Load and Save the retrieved data to a CSV file with respective columns and ensuring the consistency.
    """
    if fetched_data:
        # Convert to a Pandas DataFrame
        df = pd.DataFrame(fetched_data)
        # Ensure all required columns are available
        for col in ALL_COLUMNS:
            if col not in df.columns:
                df[col] = None  # Add missing columns with empty values
        # Reorder columns to match EXPECTED_COLUMNS
        df = df[ALL_COLUMNS]
        # Save DataFrame to CSV file
        df.to_csv("Bike_station_data.csv", index=False, encoding="utf-8")
        print("Data saved to 'Bike_station_data.csv'.")
    else:
        print("No data has been fetched to save.")


In [11]:
# Main logic for scheduling and running the tasks
# -----------------------------------------------
# Schedule the data fetch function to run every 1 minute
schedule.every(1).minutes.do(fetch_bike_station_data)

# Fetch data for 5 minutes
print("Started loading the bike station data for 2 mins...")
start_time = time.time()
while time.time() - start_time < 2 * 60:  # Run for 2 mins
    schedule.run_pending()
    time.sleep(1)

# Save the data to a CSV file after collection
save_to_csv()
print("Bike data collection completed.")

# Display the output of the dataset 
# ------------------------------------------------
df = pd.read_csv("Bike_station_data.csv")
df.head()  # Display the first few rows of the dataset


Started loading the bike station data for 2 mins...
Data fetched at 2024-12-31T12:20:41.048319
Data fetched at 2024-12-31T12:20:41.254488
Data fetched at 2024-12-31T12:20:41.473478
Data fetched at 2024-12-31T12:21:42.030032
Data fetched at 2024-12-31T12:21:42.304545
Data fetched at 2024-12-31T12:21:42.515523
Data fetched at 2024-12-31T12:21:42.736790
Data saved to 'Bike_station_data.csv'.
Bike data collection completed.


Unnamed: 0,number,contract_name,name,address,position,banking,bonus,bike_stands,available_bike_stands,available_bikes,status,last_update,timestamp
0,42,dublin,SMITHFIELD NORTH,Smithfield North,"{'lat': 53.349562, 'lng': -6.278198}",False,False,30,14,16,OPEN,1735647234000,2024-12-31T12:17:17.753695
1,30,dublin,PARNELL SQUARE NORTH,Parnell Square North,"{'lat': 53.3537415547453, 'lng': -6.2653014478...",False,False,20,20,0,OPEN,1735646825000,2024-12-31T12:17:17.753695
2,54,dublin,CLONMEL STREET,Clonmel Street,"{'lat': 53.336021, 'lng': -6.26298}",False,False,33,17,16,OPEN,1735647263000,2024-12-31T12:17:17.753695
3,108,dublin,AVONDALE ROAD,Avondale Road,"{'lat': 53.359405, 'lng': -6.276142}",False,False,35,21,14,OPEN,1735647089000,2024-12-31T12:17:17.753695
4,20,dublin,JAMES STREET EAST,James Street East,"{'lat': 53.336597, 'lng': -6.248109}",False,False,30,27,3,OPEN,1735647248000,2024-12-31T12:17:17.753695
