# Exploring Subway Entrances in Malaysia with Overpass API and Python

In this notebook, we will explore the locations of subway entrances within Klang Valley using data from OpenStreetMap (OSM). 

We will leverage the Overpass API, a read-only API that allows access to OSM data. Specifically, we will use Overpass' powerful querying capabilities to identify nodes tagged as `subway_entrance` within the boundary of Malaysia.

Once we have collected the subway entrances, we will then identify which public transport stations these entrances are associated with. To achieve this, we will query OSM relations tagged as `public_transport`.

The data will be processed and stored in a pandas DataFrame, enabling us to manipulate and analyze the data easily.

The ultimate goal is to create a table of subway entrances, complete with their coordinates and associated station names, providing a clear understanding of the distribution of subway entrances across Malaysia.


## Initialize the Overpass API and variables

In [15]:
import geopandas as gpd
import overpy
import pandas as pd
import os

# initialize Overpass API
api = overpy.Overpass()

# List to hold entrances information
entrances = []



## Query Subway Entrances in Malaysia

We use the api.query() function to send a query to the Overpass API. This query searches within the administrative boundary of Malaysia for nodes that are tagged as subway_entrance in OpenStreetMap.

In [16]:

result_entrances = api.query("""
[out:json][timeout:25];
area["name"="Malaysia"]["boundary"="administrative"]->.searchArea;
node["railway"="subway_entrance"](area.searchArea);
out body;
""")


## Store Subway Entrance IDs and Coordinates

We loop through the nodes returned by our query, which represent subway entrances, and add their ID and coordinates to our entrances list.



In [17]:

# Store the subway_entrance object IDs and coordinates in a list
for node in result_entrances.nodes:
    entrances.append({'Entrance ID': node.id, 
                      'Entrance Name':node.tags.get('ref'),
                      'Entrance Destination':node.tags.get('destination'),
                      'Longitude':node.lon,
                      'Latitude':node.lat,
                      'Station Name': None})

# Create a DataFrame
entrances_data = pd.DataFrame(entrances)



## Query Public Transport Relations in Malaysia

In [18]:
# Query all relations in Malaysia
result_relations = api.query("""
[out:json][timeout:25];
area["name"="Malaysia"]["boundary"="administrative"]->.searchArea;
relation["type"="public_transport"](area.searchArea);
out body;
> ;
out skel qt;
""")


## Link Subway Entrances to Public Transport Stations

In [19]:

# For each relation, check if it contains any of our entrances
for relation in result_relations.relations:
    for member in relation.members:
        if member.ref in entrances_data['Entrance ID'].values:
            entrances_data.loc[entrances_data['Entrance ID'] == member.ref, 'Station Name'] = relation.tags.get('name', 'Unnamed')

print(entrances_data)

     Entrance ID Entrance Name Entrance Destination    Longitude   Latitude  \
0     1544031348             B                 None  101.7113737  3.1459286   
1     1631412559          None                 None  101.6049519  3.1132222   
2     2278515570          None                 None  101.6440770  3.0506498   
3     2686635178             C                 None  101.6991821  3.1385646   
4     3308608988          None                 None  101.7127175  3.1587619   
..           ...           ...                  ...          ...        ...   
211  10839997852             A   Off Persiaran APEC  101.6572070  2.9497119   
212  10860751179             D                 None  101.6952256  3.1732229   
213  10864116957          None                 None  101.7314751  3.1651901   
214  10949038882          None                 None  101.6808258  3.2376063   
215  10949038884          None                 None  101.6814387  3.2377790   

                 Station Name  
0    Bukit Bintang 

## Save Cleaned Datasets
Finally, we will save the found entrances dataset to a new CSV file for use in creating our database

In [20]:
# Define the directory where you want to save the cleaned data
data_directory = 'data'
kl_entrances_file = 'klang_valley_stations_entrances.csv'

# Save the cleaned dataframes
entrances_data.to_csv(os.path.join(data_directory, kl_entrances_file), index=False)