# Exploring Transit Station Entrances in Klang Valley with Overpass API and Python

In this notebook, we will explore the locations of subway entrances within Klang Valley using data from OpenStreetMap (OSM). 

We will leverage the Overpass API, a read-only API that allows access to OSM data. Specifically, we will use Overpass' powerful querying capabilities to identify nodes tagged as `subway_entrance` within the boundary of Malaysia.

Once we have collected the subway entrances, we will then identify which public transport stations these entrances are associated with. To achieve this, we will query OSM relations tagged as `public_transport`.

The data will be processed and stored in a pandas DataFrame, enabling us to manipulate and analyze the data easily.

The ultimate goal is to create a table of subway entrances, complete with their coordinates and associated station names, providing a clear understanding of the distribution of subway entrances across Malaysia.


## Initialize the Overpass API and variables

We will initialize two dataframes which will be used to create two tables down the line
*entrance* - Which will hold details and characteristics of each individual entrance. This includes coordinates and destinations
*station_entrances* - Will track the relationship between each entrance and station.

The creation of two tables is to handle the many-to-many relationship between entrances and train stations. One station can have many entrances. While one entrance can lead to many stations

For example Entrance A of MRT Kajang is also an entrance to KTM Kajang.

In [1]:
import geopandas as gpd
import overpy
import pandas as pd
import os

# initialize Overpass API
api = overpy.Overpass()

# DataFrames for entrances and station-entrance relationships
entrances = pd.DataFrame(columns=['Entrance ID', 'Longitude','Latitude','Entrance Destination'])
station_entrances = pd.DataFrame(columns=['Relationship ID', 'Entrance ID', 'Station ID','Station Name'])




## Query Subway Entrances in Malaysia

We use the api.query() function to send a query to the Overpass API. This query searches within the administrative boundary of Malaysia for nodes that are tagged as subway_entrance in OpenStreetMap.

In [2]:

# Query all subway entrances in Malaysia
result_entrances = api.query("""
[out:json][timeout:25];
area["name"="Malaysia"]["boundary"="administrative"]->.searchArea;
node["railway"~"subway_entrance|train_station_entrance"](area.searchArea);
out body;
""")


## Store Subway Entrance IDs and Coordinates

We loop through the nodes returned by our query, which represent subway entrances, and add their ID and coordinates to our entrances list.



In [3]:

# Store the subway_entrance object IDs and coordinates in df_entrances
for node in result_entrances.nodes:
    entrances = pd.concat(
        [entrances, 
         pd.DataFrame([{'Entrance ID': node.id, 
                        'Entrance Name':node.tags.get('ref'),
                        'Entrance Destination':node.tags.get('destination'),
                        'Longitude':node.lon,
                        'Latitude':node.lat}])], 
        ignore_index=True
    )



## Query Public Transport Relations in Malaysia

In [4]:
# Query all relations in Malaysia
result_relations = api.query("""
[out:json][timeout:25];
area["name"="Malaysia"]["boundary"="administrative"]->.searchArea;
relation["type"="public_transport"](area.searchArea);
out body;
> ;
out skel qt;
""")


## Link Subway Entrances to Public Transport Stations

In [5]:
# For each relation, check if it contains any of our entrances
relationship_id = 0
for relation in result_relations.relations:
    station_id = relation.tags.get('ref', 'Unnamed')
    station_name = relation.tags.get('name', 'Unnamed')
    for member in relation.members:
        if member.ref in entrances['Entrance ID'].values:
            station_entrances = pd.concat(
                [station_entrances,
                 pd.DataFrame([{'Relationship ID': relationship_id, 
                                'Entrance ID': member.ref,
                                'Station Name': station_name, 
                                'Station ID': station_id}])], 
                ignore_index=True
            )
            relationship_id += 1

print(entrances)
print(station_entrances)

     Entrance ID    Longitude   Latitude    Entrance Destination Entrance Name
0     1544031348  101.7113737  3.1459286                    None             B
1     1631412559  101.6049519  3.1132222                    None          None
2     2278515570  101.6440770  3.0506498                    None          None
3     2686635178  101.6991821  3.1385646                    None             C
4     3308608988  101.7127175  3.1587619                    None          None
..           ...          ...        ...                     ...           ...
219  10949038884  101.6814387  3.2377790                    None          None
220  11033916949  101.5939259  3.1494167  Tropicana Gardens Mall          None
221  11039579993  101.6933961  3.1667580       Sunway Putra Mall          None
222  11039579994  101.6937307  3.1670090                Chow Kit          None
223  11039725697  101.6962758  3.1493952                    None          None

[224 rows x 5 columns]
    Relationship ID  Entranc

## Save Found Dataset
Finally, we will save the found entrances dataset to a new CSV file for use in creating our database

In [6]:
# Define the directory where you want to save the cleaned data
data_directory = 'data'
kl_entrances_file = 'klang_valley__entrances.csv'
kl_entrances__station_relations = 'klang_valley_stations_entrances_relation.csv'


# Save the cleaned dataframes
entrances.to_csv(os.path.join(data_directory, kl_entrances_file), index=False)
station_entrances.to_csv(os.path.join(data_directory, kl_entrances__station_relations), index=False)



# Note

We're aware that there are gaps in the data. Not all entrances are available and mapped in OSM. Addtional entrances must be added manually for a more complete dataset.