### Getting the number of MRT stations within 1km radius of HDB flat

We will first load the `hdb_cleaned_data.csv` file

In [1]:
import sys
sys.path.append('../')

import pandas as pd
from api.openrouteservice_api import Coordinate, get_distance_between_two_coordinates
from datetime import datetime

hdb_cleaned_df = pd.read_csv('../data/modified/hdb_cleaned_data.csv')
hdb_cleaned_df.head()

Unnamed: 0.1,Unnamed: 0,month,town,flat_type,block,street_name,storey_range,floor_area_sqm,flat_model,lease_commence_date,remaining_lease,resale_price,address
0,0,2015-01,ANG MO KIO,3 ROOM,174,ANG MO KIO AVE 4,07 TO 09,60.0,Improved,1986,70,255000.0,174 ANG MO KIO AVE 4
1,1,2015-01,ANG MO KIO,3 ROOM,541,ANG MO KIO AVE 10,01 TO 03,68.0,New Generation,1981,65,275000.0,541 ANG MO KIO AVE 10
2,2,2015-01,ANG MO KIO,3 ROOM,163,ANG MO KIO AVE 4,01 TO 03,69.0,New Generation,1980,64,285000.0,163 ANG MO KIO AVE 4
3,3,2015-01,ANG MO KIO,3 ROOM,446,ANG MO KIO AVE 10,01 TO 03,68.0,New Generation,1979,63,290000.0,446 ANG MO KIO AVE 10
4,4,2015-01,ANG MO KIO,3 ROOM,557,ANG MO KIO AVE 10,07 TO 09,68.0,New Generation,1980,64,290000.0,557 ANG MO KIO AVE 10


Then we  loading `hdb_coordinates.csv` to create a dictionary with key as `addresss` and values as a list of `latitude` and `longtitude`

In [2]:
hdb_coordinates_df = pd.read_csv('../data/modified/hdb_coordinates.csv')
# convert df into a dictionary with key as address and values as a list of coordinates
hdb_coordinates = hdb_coordinates_df.set_index('address').T.to_dict('list')
# check if dictionary is correct - display first five
print({k: hdb_coordinates[k] for k in list(hdb_coordinates)[:5]})

{'174 ANG MO KIO AVE 4': [0.0, 1.37509746867904, 103.83761896123], '541 ANG MO KIO AVE 10': [1.0, 1.37392238703482, 103.855621370524], '163 ANG MO KIO AVE 4': [2.0, 1.37354853919927, 103.838176471398], '446 ANG MO KIO AVE 10': [3.0, 1.36776094720351, 103.85535715026], '557 ANG MO KIO AVE 10': [4.0, 1.3716257020332, 103.857736107527]}


Then we load the `mrt_stations_opened` dataset

In [3]:
mrt_stations = pd.read_csv('../data/modified/mrt_stations_opened.csv')
mrt_stations.head()

Unnamed: 0,mrt_station_name,latitude,longtitude,opening_date
0,Jurong East MRT Station,1.333333,103.742222,5 November 1988
1,Bukit Batok MRT Station,1.349167,103.749722,10 March 1990
2,Bukit Gombak MRT Station,1.358611,103.751667,10 March 1990
3,Yew Tee MRT Station,1.396986,103.747239,10 February 1996
4,Kranji MRT Station,1.425047,103.761853,10 February 1996


Next we create a method to run through the entire `mrt_stations.csv` to compare the distance between the flat and the mrt stations provided that opening date of the mrt stations is before the transaction record's date. For simplicity sake, we will assume that the transaction record is taken on the first day of the month. It will return the mrt stations that are within 1km of the flat

In [5]:
# create a method to run through the entire mrt_stations.csv to compare the distance between the flat and the mrt stations provided that opening date of the mrt stations is before the transaction record year
def get_mrt_stations_within_1km(flat_coordinates, mrt_stations_df,transaction_record_year):
    # iterate through the mrt stations dataframe
    mrt = []
    for index, row in mrt_stations_df.iterrows():
        mrt_longtitude, mrt_latitude = float(row['longtitude']), float(row['latitude'])
        mrt_coordinates = Coordinate(lon=mrt_longtitude, lat=mrt_latitude)
        # if mrt opening year is before the transaction record date, then we query the distance else we skip
        opening_date = row['opening_date']
        if (datetime.strptime(opening_date, '%d %B %Y')) < datetime.strptime(transaction_record_year, '%Y-%m-%d'):
            distance = get_distance_between_two_coordinates(flat_coordinates, mrt_coordinates)
            # distance may be a float or None due to restrictions in API calls, check type first
            if type(distance) == float:
                if distance <= 1:
                    mrt.append(row['mrt_station_name'])
            else:
                mrt.append(distance) # this should be None, we will handle the Nones later on
        else:
            continue
    return mrt

# print(get_mrt_stations_within_1km(Coordinate(103.83761896123, 1.37509746867904), mrt_stations, '2015-01-01')) # Testing out mrts within 1km of 174 Ang Mo Kio Ave 4 - result is ['Mayflower MRT Station']

Next we create a new column in the `hdb_cleaned_df` that will store `mrt_stations_within_1km`

In [None]:
processed_flats = {} # declare this outisde so that we can use it next time in the event we exceed the API limit

# create a new column in hdb_cleaned_df mrt_stations_within_1km 
def create_mrt_stations_within_1km_column(hdb_cleaned_df):
    mrt_stations_within_1km = []
    for index, row in hdb_cleaned_df.iterrows():
        # get the address
        address = row['address']
        if address in processed_flats:
            continue
        else:
            # get the coordinates from hdb_coordinates dictionary
            flat_coordinates = hdb_coordinates[address]
            flat_coordinates = Coordinate(lon=float(flat_coordinates[1]), lat=float(flat_coordinates[0]))
            transaction_record_year = row['month'] + '-01'
            stations = get_mrt_stations_within_1km(flat_coordinates, mrt_stations, transaction_record_year)
            mrt_stations_within_1km.append(stations)
            processed_flats[address] = stations

    hdb_cleaned_df['mrt_stations_within_1km'] = mrt_stations_within_1km
    return hdb_cleaned_df

hdb_cleaned_df = create_mrt_stations_within_1km_column(hdb_cleaned_df)


Note that for one HDB flat, we will need to query using the API **129** times (since we have 129 rows of mrt stations). OpenRouteService API gives us **2000** queries for querying distances per 24h, so that will allow us to process 2000/129 which is about **15** flats per account. Since we have around **7921** unique flats in the dataset, that means we need 7921/15 which is approximately **528** accounts.

We might need to use geometric distance using in-built python libraries instead of the API. The tradeoff would be that we lose out on the geospatial context.

### Getting the supply of BTO flats within 4km of the flat address in the respective time periods