# Public Transport Data Dashboard

![mrt jakarta](plugins/assets/mrt_jakarta.jpg)

### Objective:
The aim of this project is to fetch real-time or historical data about public transport systems (e.g., buses, trains, or subways) from open APIs or datasets. The gathered data will be processed, cleaned, and visualized through an interactive dashboard. The dashboard will help visualize patterns such as transport availability, punctuality, routes, and passenger demand over time.

## Key Features of the Project:

### Data Collection:
Identify reliable open sources for public transport data (e.g., city transport APIs, GTFS feeds, or public transport websites).
Use Python to fetch data using libraries such as requests, pandas, or openpyxl.


### Data Processing:
Clean and preprocess the data to ensure it's in a usable format.
Handle missing data, duplicates, and irrelevant columns.
Perform any necessary transformations (e.g., timestamp conversions, geospatial coordinates for locations).

### Data Storage:
Store the data in a local database (e.g., SQLite) or a cloud-based data warehouse (e.g., Google BigQuery, AWS Redshift) for later use.



### Dashboard Development:
Use a Python visualization library (e.g., Plotly, Dash, Matplotlib) to build an interactive dashboard.
The dashboard will allow users to interact with data, filter by transport type, and visualize transport routes, schedules, or other metrics.


### Experimentation & Analysis:
Experiment with data fetching, transformation, and the integration of APIs.
Explore possible analyses such as peak-hour transport usage, performance (on-time arrivals), and comparison across routes.

## Data Fetching:

In [1]:
# Import all necessary packages
import pandas as pd
import numpy as np
from google.cloud import bigquery
import os
%load_ext autoreload
%autoreload 2
import plugins.utils as utils
import snowflake.connector
from plugins.config import snow_creds

In [None]:
# Example usage:
# Don't forget to setup your kaggle user on .../Users/youruser/.kaggle/kaggle.json


dataset_name = "pablodiegoo/analysis-of-chicago-divvy-bicycle-sharing-updated" 
download_folder = "./plugins/assets/data/divvy_tripdata"
utils.download_kaggle_dataset(dataset_name, download_folder)

In [None]:
# Load data into dataframe
df = utils.import_csvs_and_merge("./plugins/assets/data/divvy_tripdata")

In [None]:
ev_pop_df = utils.import_csvs_and_merge("./plugins/assets/data/ev_population/")

In [None]:
ev_pop_df.info()

In [None]:
df.info()

In [None]:
# Loading data from Google Cloud

In [None]:
# To run this client, you need to setup your account 1st on your local
# Ref: https://cloud.google.com/sdk/docs/install, https://cloud.google.com/bigquery/docs/authentication/getting-started
# Testing bigquery connection
client = bigquery.Client()
query = """
    SELECT * FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` LIMIT 5
"""
query_job = client.query(query)
df = query_job.to_dataframe()

In [None]:
df.head()

In [None]:
conn = snowflake.connector.connect(**snow_creds)

cur = conn.cursor()
cur.execute("SELECT * FROM public_transport.transport_base.example")
result = cur.fetch_pandas_all()
print(result)
cur.close()