# Public Transport Data Dashboard

![mrt jakarta](assets/mrt_jakarta.jpg)

### Objective:
The aim of this project is to fetch real-time or historical data about public transport systems (e.g., buses, trains, or subways) from open APIs or datasets. The gathered data will be processed, cleaned, and visualized through an interactive dashboard. The dashboard will help visualize patterns such as transport availability, punctuality, routes, and passenger demand over time.

## Key Features of the Project:

### Data Collection:
Identify reliable open sources for public transport data (e.g., city transport APIs, GTFS feeds, or public transport websites).
Use Python to fetch data using libraries such as requests, pandas, or openpyxl.


### Data Processing:
Clean and preprocess the data to ensure it's in a usable format.
Handle missing data, duplicates, and irrelevant columns.
Perform any necessary transformations (e.g., timestamp conversions, geospatial coordinates for locations).

### Data Storage:
Store the data in a local database (e.g., SQLite) or a cloud-based data warehouse (e.g., Google BigQuery, AWS Redshift) for later use.



### Dashboard Development:
Use a Python visualization library (e.g., Plotly, Dash, Matplotlib) to build an interactive dashboard.
The dashboard will allow users to interact with data, filter by transport type, and visualize transport routes, schedules, or other metrics.


### Experimentation & Analysis:
Experiment with data fetching, transformation, and the integration of APIs.
Explore possible analyses such as peak-hour transport usage, performance (on-time arrivals), and comparison across routes.

## Data Fetching:

In [9]:
# Import all of the libraries
import pandas as pd
import numpy as np
from google.cloud import bigquery
import os
# import requests
%autoreload 2
from utils.kaggle_util import download_kaggle_dataset
%load_ext autoreload
%autoreload 2
from utils.data_loader import import_csvs_and_merge

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [None]:
# Example usage:
# Don't forget to setup your kaggle user on .../Users/youruser/.kaggle/kaggle.json
dataset_name = "pablodiegoo/analysis-of-chicago-divvy-bicycle-sharing-updated" 
download_folder = "./assets/data/divvy_tripdata"
download_kaggle_dataset(dataset_name, download_folder)

In [None]:
# Load data into dataframe
df = import_csvs_and_merge("./assets/data/divvy_tripdata")

In [None]:
ev_pop_df = import_csvs_and_merge("./assets/data/ev_population/")

In [None]:
ev_pop_df.info()

In [None]:
df.info()

In [None]:
# Loading data from Google Cloud

In [None]:
# To run this client, you need to setup your account 1st on your local
# Ref: https://cloud.google.com/sdk/docs/install, https://cloud.google.com/bigquery/docs/authentication/getting-started
# Testing bigquery connection
client = bigquery.Client()
query = """
    SELECT * FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` LIMIT 100
"""
query_job = client.query(query)
df = query_job.to_dataframe()



In [3]:
df

Unnamed: 0,trip_id,subscriber_type,bike_id,bike_type,start_time,start_station_id,start_station_name,end_station_id,end_station_name,duration_minutes
0,26599763,Pay-as-you-ride,21707,electric,2022-05-06 14:19:39+00:00,4051,10th/Red River,4051,10th/Red River,195
1,26742903,3-Day Weekender,17460,electric,2022-05-23 16:24:46+00:00,4051,10th/Red River,4051,10th/Red River,2
2,26599923,Pay-as-you-ride,19453,electric,2022-05-06 14:37:41+00:00,4051,10th/Red River,4051,10th/Red River,178
3,26701683,Local31,21772,electric,2022-05-17 22:50:29+00:00,4051,10th/Red River,4051,10th/Red River,4
4,26788653,Pay-as-you-ride,21740,electric,2022-05-29 19:41:40+00:00,4051,10th/Red River,4051,10th/Red River,31
...,...,...,...,...,...,...,...,...,...,...
95,25044015,Single Trip (Pay-as-you-ride),514,classic,2021-09-05 19:06:56+00:00,4051,10th/Red River,4051,10th/Red River,26
96,25145053,Local31,1687,classic,2021-09-15 23:07:36+00:00,4051,10th/Red River,4051,10th/Red River,4
97,25230096,Single Trip (Pay-as-you-ride),2077,classic,2021-09-24 22:29:49+00:00,4051,10th/Red River,4051,10th/Red River,63
98,25230099,Single Trip (Pay-as-you-ride),342,classic,2021-09-24 22:31:01+00:00,4051,10th/Red River,4051,10th/Red River,725
