# MRT Distance Script

Script used to generate distance from resale flat to the closest MRT station. The script uses the latitude and longitude of the resale flat (from 'train.csv' or 'test.csv') and the latitude and longitude of the MRT station (from ./auxiliary-data/sg-train-stations.csv) to calculate distance between the resale flat and each MRT station, recording the shortest distance in a CSV format

## Import Packages

In [2]:
# import sys
# !{sys.executable} -m pip install geopy
import geopy.distance
import csv
import pandas as pd

## Import Dataset

In [7]:
train = pd.read_csv('./train.csv')
test = pd.read_csv('./test.csv')
mrt = pd.read_csv('./auxiliary-data/sg-train-stations.csv')

## Drop NaN Data

From this step on, change dataframes to use either the train or test dataset

In [8]:
test.dropna()
test = test.reset_index(drop=True)
test

Unnamed: 0,month,town,flat_type,block,street_name,storey_range,floor_area_sqm,flat_model,eco_category,lease_commence_date,latitude,longitude,elevation,subzone,planning_area,region
0,2004-01,bukit batok,4 room,186,bukit batok west avenue 6,04 to 06,94.0,new generation,uncategorized,1989,1.346581,103.744085,0.0,bukit batok west,bukit batok,west region
1,2001-11,tampines,5 room,366,tampines street 34,04 to 06,122.0,improved,uncategorized,1997,1.357618,103.961379,0.0,tampines east,tampines,east region
2,2002-07,jurong east,3 room,206,jurong east street 21,01 to 03,67.0,new generation,uncategorized,1982,1.337804,103.741998,0.0,toh guan,jurong east,west region
3,2015-04,ang mo kio,3 room,180,Ang Mo Kio Avenue 5,04 to 06,82.0,new generation,uncategorized,1981,1.380084,103.849574,0.0,yio chu kang east,ang mo kio,north-east region
4,2004-04,clementi,5 room,356,clementi avenue 2,01 to 03,117.0,standard,uncategorized,1978,1.313960,103.769831,0.0,clementi north,clementi,west region
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
107929,2008-04,hougang,5 room,981D,buangkok crescent,10 to 12,110.0,improved,uncategorized,2003,1.380452,103.879333,0.0,trafalgar,hougang,north-east region
107930,2006-01,kallang/whampoa,4 room,13,upper boon keng road,13 to 15,102.0,model a,uncategorized,1999,1.314481,103.870458,0.0,boon keng,kallang,central region
107931,2000-01,kallang/whampoa,3 room,1,beach road,07 to 09,68.0,improved,uncategorized,1979,1.294924,103.854315,0.0,city hall,downtown core,central region
107932,2009-07,jurong west,4 room,919,jurong west street 91,10 to 12,104.0,model a,uncategorized,1988,1.339927,103.687354,0.0,yunnan,jurong west,west region


In [9]:
df_tmp1 = pd.DataFrame()
df_tmp1['latitude'] = test['latitude']
df_tmp1['longitude'] = test['longitude']
coords_flats = df_tmp1.values.tolist()

df_tmp2 = pd.DataFrame()
df_tmp2['latitude'] = mrt['lat']
df_tmp2['longitude'] = mrt['lng']
coords_mrts = df_tmp2.values.tolist()

In [10]:
closest_mrt = []
dist_mrt = []

for coords_flat in coords_flats:
    min_dist = None
    index = -1
    idx = 0
    for coords_mrt in coords_mrts:
        dist = geopy.distance.distance(coords_flat, coords_mrt)
        
        if min_dist is None:
            min_dist = dist
            index = idx
        
        if dist < min_dist:
            min_dist = dist
            index = idx
            
        idx += 1
    closest_mrt.append(index)
    dist_mrt.append(min_dist.km)

In [11]:
rows = zip(closest_mrt, dist_mrt)

with open('./auxiliary-data/distance-to-mrt-test.csv', 'w', newline='') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
    for row in rows:
        wr.writerow(row)