# Public Transport Data Dashboard

![mrt jakarta](plugins/assets/mrt_jakarta.jpg)

### Objective:
The aim of this project is to fetch real-time or historical data about public transport systems (e.g., buses, trains, or subways) from open APIs or datasets. The gathered data will be processed, cleaned, and visualized through an interactive dashboard. The dashboard will help visualize patterns such as transport availability, punctuality, routes, and passenger demand over time.

## Key Features of the Project:

### Data Collection:
Identify reliable open sources for public transport data (e.g., city transport APIs, GTFS feeds, or public transport websites).
Use Python to fetch data using libraries such as requests, pandas, or openpyxl.


### Data Processing:
Clean and preprocess the data to ensure it's in a usable format.
Handle missing data, duplicates, and irrelevant columns.
Perform any necessary transformations (e.g., timestamp conversions, geospatial coordinates for locations).

### Data Storage:
Store the data in a local database (e.g., SQLite) or a cloud-based data warehouse (e.g., Google BigQuery, AWS Redshift) for later use.



### Dashboard Development:
Use a Python visualization library (e.g., Plotly, Dash, Matplotlib) to build an interactive dashboard.
The dashboard will allow users to interact with data, filter by transport type, and visualize transport routes, schedules, or other metrics.


### Experimentation & Analysis:
Experiment with data fetching, transformation, and the integration of APIs.
Explore possible analyses such as peak-hour transport usage, performance (on-time arrivals), and comparison across routes.

## Data Fetching:

In [None]:
# Import all necessary packages
import pandas as pd
import numpy as np
from google.cloud import bigquery
import os
%load_ext autoreload
%autoreload 2
import plugins.utils as utils
import snowflake.connector
from plugins.config import snow_creds, aws_creds

In [None]:
# Example usage:
# Don't forget to setup your kaggle user on .../Users/youruser/.kaggle/kaggle.json


dataset_name = "pablodiegoo/analysis-of-chicago-divvy-bicycle-sharing-updated" 
download_folder = "./plugins/assets/data/divvy_tripdata"
utils.download_kaggle_dataset(dataset_name, download_folder)

In [None]:
# Load data into dataframe
df = utils.import_csvs_and_merge("./plugins/assets/data/divvy_tripdata")

In [None]:
ev_pop_df = utils.import_csvs_and_merge("./plugins/assets/data/ev_population/")
ev_pop_df.rename(str.lower, axis='columns', inplace=True)

In [None]:
ev_pop_df = ev_pop_df[["county", "city", "state"]]
ev_pop_df.info()

In [None]:
df.info()

In [None]:
# Loading data from Google Cloud

In [None]:
# To run this client, you need to setup your account 1st on your local
# Ref: https://cloud.google.com/sdk/docs/install, https://cloud.google.com/bigquery/docs/authentication/getting-started
# Testing bigquery connection
client = bigquery.Client()
query = """
    SELECT * FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` LIMIT 5
"""
query_job = client.query(query)
df = query_job.to_dataframe()

In [None]:
df.head()

In [None]:
conn = snowflake.connector.connect(**snow_creds)

cur = conn.cursor()
cur.execute("SELECT * FROM public_transport.transport_base.example")
result = cur.fetch_pandas_all()
print(result)
cur.close()

In [None]:
# Create the handler instance
s3_handler = utils.S3ParquetHandler(aws_creds)


In [None]:
# Define your S3 bucket and file keys
destination_bucket = "project-etl-iqbal"
destination_key = "etl/iqbal_test.parquet"
# Write the DataFrame back to S3 as a Parquet file
s3_handler.write_parquet_to_s3(ev_pop_df[["county", "city", "state"]], destination_bucket, destination_key)
print(f"Data successfully written to s3://{destination_bucket}/{destination_key}")

In [None]:
# Define your S3 bucket and file keys
source_bucket = "project-etl-iqbal"
source_key = "etl/iqbal_test.parquet"
# Read the Parquet file from S3 into a DataFrame
df = s3_handler.read_parquet_from_s3(source_bucket, source_key)
print("Data read from S3:")
print(df.head())

## Data Fetching Process:

In [100]:
# Import all necessary packages
import pandas as pd
import numpy as np
import plugins.utils as utils
import snowflake.connector
from plugins.config import snow_creds, aws_creds
import json
import ast

In [3]:
dataset_name = "rounakbanik/the-movies-dataset" 
download_folder = "./plugins/assets/data/the-movies-dataset"
utils.download_kaggle_dataset(dataset_name, download_folder)

Dataset URL: https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset
Dataset 'rounakbanik/the-movies-dataset' downloaded successfully to './plugins/assets/data/the-movies-dataset'.


In [108]:
credits_df = pd.read_csv("./plugins/assets/data/the-movies-dataset/credits.csv")
keywords_df = pd.read_csv("./plugins/assets/data/the-movies-dataset/keywords.csv")
links_df = pd.read_csv("./plugins/assets/data/the-movies-dataset/links.csv")
movies_metadata_df = pd.read_csv("./plugins/assets/data/the-movies-dataset/movies_metadata.csv")
ratings_df = pd.read_csv("./plugins/assets/data/the-movies-dataset/ratings.csv")

  movies_metadata_df = pd.read_csv("./plugins/assets/data/the-movies-dataset/movies_metadata.csv")


In [8]:
credits_df.head()

Unnamed: 0,cast,crew,id
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602
3,"[{'cast_id': 1, 'character': ""Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862


In [109]:
credits_df['cast'] = credits_df['cast'].apply(ast.literal_eval)
credits_df['crew'] = credits_df['crew'].apply(ast.literal_eval)

In [191]:
credits_df.iloc[3]

cast    [{'cast_id': 1, 'character': 'Savannah 'Vannah...
crew    [{'credit_id': '52fe44779251416c91011acb', 'de...
id                                                  31357
Name: 3, dtype: object

In [110]:
sample = credits_df[0:10]

In [126]:
sample.head()

Unnamed: 0,cast,crew,id
0,"[{'cast_id': 14, 'character': 'Woody (voice)',...","[{'credit_id': '52fe4284c3a36847f8024f49', 'de...",862
1,"[{'cast_id': 1, 'character': 'Alan Parrish', '...","[{'credit_id': '52fe44bfc3a36847f80a7cd1', 'de...",8844
2,"[{'cast_id': 2, 'character': 'Max Goldman', 'c...","[{'credit_id': '52fe466a9251416c75077a89', 'de...",15602
3,"[{'cast_id': 1, 'character': 'Savannah 'Vannah...","[{'credit_id': '52fe44779251416c91011acb', 'de...",31357
4,"[{'cast_id': 1, 'character': 'George Banks', '...","[{'credit_id': '52fe44959251416c75039ed7', 'de...",11862


In [127]:
cast_flat = pd.json_normalize(sample['cast'][0])

In [128]:
cast_flat

Unnamed: 0,cast_id,character,credit_id,gender,id,name,order,profile_path
0,14,Woody (voice),52fe4284c3a36847f8024f95,2,31,Tom Hanks,0,/pQFoyx7rp09CJTAb932F2g8Nlho.jpg
1,15,Buzz Lightyear (voice),52fe4284c3a36847f8024f99,2,12898,Tim Allen,1,/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg
2,16,Mr. Potato Head (voice),52fe4284c3a36847f8024f9d,2,7167,Don Rickles,2,/h5BcaDMPRVLHLDzbQavec4xfSdt.jpg
3,17,Slinky Dog (voice),52fe4284c3a36847f8024fa1,2,12899,Jim Varney,3,/eIo2jVVXYgjDtaHoF19Ll9vtW7h.jpg
4,18,Rex (voice),52fe4284c3a36847f8024fa5,2,12900,Wallace Shawn,4,/oGE6JqPP2xH4tNORKNqxbNPYi7u.jpg
5,19,Hamm (voice),52fe4284c3a36847f8024fa9,2,7907,John Ratzenberger,5,/yGechiKWL6TJDfVE2KPSJYqdMsY.jpg
6,20,Bo Peep (voice),52fe4284c3a36847f8024fad,1,8873,Annie Potts,6,/eryXT84RL41jHSJcMy4kS3u9y6w.jpg
7,26,Andy (voice),52fe4284c3a36847f8024fc1,0,1116442,John Morris,7,/vYGyvK4LzeaUCoNSHtsuqJUY15M.jpg
8,22,Sid (voice),52fe4284c3a36847f8024fb1,2,12901,Erik von Detten,8,/twnF1ZaJ1FUNUuo6xLXwcxjayBE.jpg
9,23,Mrs. Davis (voice),52fe4284c3a36847f8024fb5,1,12133,Laurie Metcalf,9,/unMMIT60eoBM2sN2nyR7EZ2BvvD.jpg


In [137]:
cast_df = pd.json_normalize(sample['cast'][0])

In [138]:
cast_df

Unnamed: 0,cast_id,character,credit_id,gender,id,name,order,profile_path
0,14,Woody (voice),52fe4284c3a36847f8024f95,2,31,Tom Hanks,0,/pQFoyx7rp09CJTAb932F2g8Nlho.jpg
1,15,Buzz Lightyear (voice),52fe4284c3a36847f8024f99,2,12898,Tim Allen,1,/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg
2,16,Mr. Potato Head (voice),52fe4284c3a36847f8024f9d,2,7167,Don Rickles,2,/h5BcaDMPRVLHLDzbQavec4xfSdt.jpg
3,17,Slinky Dog (voice),52fe4284c3a36847f8024fa1,2,12899,Jim Varney,3,/eIo2jVVXYgjDtaHoF19Ll9vtW7h.jpg
4,18,Rex (voice),52fe4284c3a36847f8024fa5,2,12900,Wallace Shawn,4,/oGE6JqPP2xH4tNORKNqxbNPYi7u.jpg
5,19,Hamm (voice),52fe4284c3a36847f8024fa9,2,7907,John Ratzenberger,5,/yGechiKWL6TJDfVE2KPSJYqdMsY.jpg
6,20,Bo Peep (voice),52fe4284c3a36847f8024fad,1,8873,Annie Potts,6,/eryXT84RL41jHSJcMy4kS3u9y6w.jpg
7,26,Andy (voice),52fe4284c3a36847f8024fc1,0,1116442,John Morris,7,/vYGyvK4LzeaUCoNSHtsuqJUY15M.jpg
8,22,Sid (voice),52fe4284c3a36847f8024fb1,2,12901,Erik von Detten,8,/twnF1ZaJ1FUNUuo6xLXwcxjayBE.jpg
9,23,Mrs. Davis (voice),52fe4284c3a36847f8024fb5,1,12133,Laurie Metcalf,9,/unMMIT60eoBM2sN2nyR7EZ2BvvD.jpg


In [195]:
def normalize_cast(row):
    cast_df = pd.json_normalize(row['cast'])
    cast_df['credits_id'] = row['id']
    return cast_df

normalized_casts = pd.concat(sample.apply(normalize_cast, axis=1).tolist(), axis=0, ignore_index=True)

In [196]:
normalized_casts

Unnamed: 0,cast_id,character,credit_id,gender,id,name,order,profile_path,credits_id
0,14,Woody (voice),52fe4284c3a36847f8024f95,2,31,Tom Hanks,0,/pQFoyx7rp09CJTAb932F2g8Nlho.jpg,862
1,15,Buzz Lightyear (voice),52fe4284c3a36847f8024f99,2,12898,Tim Allen,1,/uX2xVf6pMmPepxnvFWyBtjexzgY.jpg,862
2,16,Mr. Potato Head (voice),52fe4284c3a36847f8024f9d,2,7167,Don Rickles,2,/h5BcaDMPRVLHLDzbQavec4xfSdt.jpg,862
3,17,Slinky Dog (voice),52fe4284c3a36847f8024fa1,2,12899,Jim Varney,3,/eIo2jVVXYgjDtaHoF19Ll9vtW7h.jpg,862
4,18,Rex (voice),52fe4284c3a36847f8024fa5,2,12900,Wallace Shawn,4,/oGE6JqPP2xH4tNORKNqxbNPYi7u.jpg,862
...,...,...,...,...,...,...,...,...,...
218,37,Admiral Chuck Farrell,5401b8650e0a2658ee004a76,0,55911,Billy J. Mitchell,15,,710
219,38,Computer Store Manager,5401b87f0e0a2658db004acf,2,27425,Constantine Gregory,16,/2zJA2LCS6utcD9hvXpDkRtY55ZO.jpg,710
220,39,Irina,5401b88d0e0a2658e2004c04,1,6613,Minnie Driver,17,/iWqTeFmdoY8V8RLcH89K75AKeQN.jpg,710
221,40,Anna,5401b8a70e0a2658e2004c07,1,29054,Michelle Arthur,18,/zKQFuK3W4LMzfIz8W86RYf6jGp7.jpg,710
