# Visualizing data in the transit_database SQLite database

We know from the work in the other notebooks in this file that this database contains 3 tables with various station details
This notebook will demonstrate ways to manipulate the tables and also a simple visualization of the station's location in a Folium map.

Much of the work here has been created with chatGPT plus

In [2]:
import pandas as pd
import sqlite3
import geopandas as gpd
from shapely.geometry import Point
import folium




### Query for data

We will merge station table with the other two entrances table. Note that doing this means there will be several rows where the station coordinate will be duplicated.

In [3]:

# Create a connection to the SQLite database
conn = sqlite3.connect('transit_database.db')

# SQL query
query = """
SELECT stations.*, 
       station_entrances."entrance_id", 
       entrances."longitude" as entrance_longitude, 
       entrances."latitude" as entrance_latitude
FROM stations
LEFT JOIN station_entrances
ON stations."station_id" = station_entrances."station_id"
LEFT JOIN entrances
ON station_entrances."entrance_id" = entrances."entrance_id"
WHERE stations.region='Klang Valley'
"""

# Load the data into a pandas DataFrame
klang_valley_df = pd.read_sql_query(query, conn)

# Don't forget to close the connection
conn.close()


### Create Geopandas and geometry objects

This part isn't necessary, but I believe that it is good practice to use geopandas when working with geographic data.
This part will convert a normal pandas dataframe to a geopandas dataframe using the station's coordinates as the main geometry.

In [4]:
# Create a new column in your DataFrame for the geographic data
klang_valley_df['geometry'] = [Point(xy) for xy in zip(klang_valley_df.longitude, klang_valley_df.latitude)]

# Convert the DataFrame to a GeoDataFrame
klang_valley_gdf = gpd.GeoDataFrame(klang_valley_df, geometry='geometry')

# Set the coordinate reference system (CRS) to EPSG:4326 (WGS84)
klang_valley_gdf.crs = "EPSG:4326"

#### Visualizing stations in the form of markers using lat,long coordinates columns (geopandas not necessary)

In [5]:
# Create a new DataFrame where each latitude and longitude pair is unique
unique_stations = klang_valley_gdf.drop_duplicates(subset=['latitude', 'longitude'])

# Create a map centered around the average latitude and longitude of the stations
map = folium.Map(location=[unique_stations['latitude'].mean(), unique_stations['longitude'].mean()], zoom_start=13)

# Add a marker for each station
for _, station in unique_stations.iterrows():
    folium.Marker(location=[station['latitude'], station['longitude']], 
                  popup=f"{station['name']} ({station['station_id']})").add_to(map)

# Display the map
map


#### Visualizing stations in the form of markers using geometry object column (must use geopandas)

In [6]:
# Create a new GeoDataFrame where each geometry is unique
unique_stations_gdf = klang_valley_gdf.drop_duplicates(subset=['geometry'])

# Create a map centered around the average latitude and longitude of the stations
klang_valley_map = folium.Map(location=[unique_stations_gdf['geometry'].y.mean(), unique_stations_gdf['geometry'].x.mean()], zoom_start=10)

# Add a marker for each station
for _, row in unique_stations_gdf.iterrows():
    folium.Marker(
        location=[row['geometry'].y, row['geometry'].x],  # Extract latitude and longitude from Point object
        popup=f"Station ID: {row['station_id']}<br>Name: {row['name']}<br>Provider: {row['service_provider_name']}"
    ).add_to(klang_valley_map)

# Display the map
klang_valley_map

#### Visualizing all available station entrances

In [23]:
# Create a new DataFrame where each entrances latitude and longitude pair is available
valid_entrances = klang_valley_gdf.dropna(subset=['entrance_latitude', 'entrance_longitude'])

# Create a map centered around the average latitude and longitude of the stations
map = folium.Map(location=[valid_entrances['entrance_latitude'].mean(), valid_entrances['entrance_longitude'].mean()], zoom_start=13)

# Add a marker for each station
for _, entrance in valid_entrances.iterrows():
    folium.Marker(location=[entrance['entrance_latitude'], entrance['entrance_longitude']], 
                  popup=f"{entrance['name']} ({entrance['station_id']})",
                  icon=folium.Icon(color="red", icon="")
                  ).add_to(map)

# Display the map
map