<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 10px; height: 55px">


# Capstone Project: Forecasting HDB Resale Prices


## Notebook 2/4: Feature Engineering
---

## Getting Started

### Imports

In [1]:
# Importing Libraries
import pandas as pd
import numpy as np
import requests
import json
from geopy.distance import geodesic

In [48]:
# Importing relevant csv files
combined = pd.read_csv('../data/combined.csv')
mrt = pd.read_csv('../data/mrt.csv')
malls = pd.read_csv('../data/malls.csv')
supermarkets = pd.read_csv('../data/supermarkets.csv')
hawkers = pd.read_csv('../data/hawker_centres.csv')
parks = pd.read_csv('../data/parks.csv')
schools = pd.read_csv('../data/schools.csv', dtype=str)

In [3]:
combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 581229 entries, 0 to 581228
Data columns (total 14 columns):
 #   Column               Non-Null Count   Dtype  
---  ------               --------------   -----  
 0   date                 581229 non-null  object 
 1   year                 581229 non-null  int64  
 2   month                581229 non-null  int64  
 3   town                 581229 non-null  object 
 4   flat_type            581229 non-null  object 
 5   block                581229 non-null  object 
 6   street_name          581229 non-null  object 
 7   address              581229 non-null  object 
 8   storey_range         581229 non-null  object 
 9   floor_area_sqm       581229 non-null  float64
 10  flat_model           581229 non-null  object 
 11  lease_commence_date  581229 non-null  int64  
 12  remaining_lease      581229 non-null  float64
 13  resale_price         581229 non-null  float64
dtypes: float64(3), int64(3), object(8)
memory usage: 62.1+ MB


### Defining Functions

First, we would need to define functions to assist us with our feature engineering.
1. To scrape coordinates data for a specific address from the OneMap API
2. To find the closest geodesic distance of each address from a list of amenities, and the number of said amenities in 1km radius
3. To calculate the geodesic distance of each address from a specific location

In [4]:
# Creating a function to scrape coordiantes data from OneMap API

def get_coordinates(lst):

    for index, add in enumerate(lst):

        url= "https://developers.onemap.sg/commonapi/search?returnGeom=Y&getAddrDetails=Y&pageNum=1&searchVal="+ str(add)
        res = requests.get(url) # sending get request to the url
    
        try:
            data = res.json() # convert to json
        except ValueError:
            print(index, add, 'JSONDecodeError') # if error, print the index and address
            pass

        try:
            temp_df = pd.DataFrame.from_dict([dict((k, data['results'][0][k]) for k in ('LATITUDE', 'LONGITUDE'))]) # creating a dataframe from the json data
        except IndexError:
            print(index, add, 'IndexError') # if the address is not found, print the index and address
            pass

        temp_df["address"] = add # adding the address to the dataframe
        
        if index == 0:
            df = temp_df # if it is the first iteration, create the dataframe
        else:
            df = pd.concat([df, temp_df]) # concatenating the dataframes
    df = df[['address', 'LATITUDE', 'LONGITUDE']] # dropping the unwanted columns
    df.reset_index(drop=True, inplace=True) # resetting the index
    
    return df

In [31]:
# Creating a function to find the closest distance of each location from a list of amenities, and the number of said amenities in 1km radius

def search_nearby(address, amenity, radius=1):

    results = {}
    # First column must be address
    for index, block in enumerate(address.iloc[:, 0]):

        # 2nd column must be latitude, 3rd column must be longitude
        block_loc = (address.iloc[index, 1], address.iloc[index, 2])
        if amenity.equals(pri_school_coord):
            block_amenity = ['', '', 100, 0, 0] # creating list with dummy values
        else:
            block_amenity = ['', '', 100, 0]

        for ind, eachloc in enumerate(amenity.iloc[:, 0]):
            amenity_loc = (amenity.iloc[ind, 1], amenity.iloc[ind, 2])
            distance = geodesic(block_loc, amenity_loc) # calculate the distance between the block and amenity
            distance = float(str(distance)[:-3])  # convert to float

            if amenity.equals(pri_school_coord):
                if distance <= radius:   # compute number of schools in 1km radius
                    block_amenity[3] += 1
                if radius < distance <= 2:  # compute number of schools between 1km to 2km radius
                    block_amenity[4] += 1
            else:
                if distance <= radius:   # compute number of amenities in 1km radius
                    block_amenity[3] += 1


            if distance < block_amenity[2]:  # find nearest amenity
                block_amenity[0] = block # store the address
                block_amenity[1] = eachloc # store the amenity
                block_amenity[2] = distance # store the distance

        results[block] = block_amenity # store the results in a dictionary
        
    return results


In [55]:
# Creating a function to calculate distance from a particalar amenity
# First column of 'address' dataset must be address
# 2nd column must be latitude, 3rd column must be longitude
# 'location' variable must be a tuple

def dist_from_location(address, location):
    
    results = {}
    
    for index, block in enumerate(address.iloc[:, 0]):

        block_location = (address.iloc[index, 1], address.iloc[index, 2])   # block_location is a tuple
        block_amenity = ['', 100]    # block_amenity is a list
        distance = geodesic(block_location, location)   # compute distance from block to location
        distance = float(str(distance)[:-3])  # convert to float
        block_amenity[0] = block 
        block_amenity[1] = distance
        results[block] = block_amenity

    return results


---

## Unique Addresses

Now, let us apply our first function to all the unique addresses in our dataset. This function will create a dataframe consisting of the coordinates of these addresses.

In [16]:
# Creating a list of unique addresses
all_address = list(combined['address'])
unique_address = list(set(all_address))
print('No. of Unique Addresses:', len(unique_address))
unique_address[:10]

No. of Unique Addresses: 9462


['148 WOODLANDS ST 13',
 '426 CLEMENTI AVE 3',
 '85 CIRCUIT RD',
 '467 TAMPINES ST 44',
 '367 CORPORATION DR',
 '159 YUNG PING RD',
 '518 CHOA CHU KANG ST 51',
 '918 JURONG WEST ST 91',
 '682A EDGEDALE PLAINS',
 '416 ANG MO KIO AVE 10']

In [72]:
# Applying the function to find the coordinates of each address
block_coord = get_coordinates(unique_address)

76 12 REDHILL CL IndexError
140 5 SELETAR WEST FARMWAY 6 IndexError
151 36 DOVER RD IndexError
318 1A WOODLANDS CTR RD IndexError
420 10 YUNG KUANG RD IndexError
494 33 TAMAN HO SWEE IndexError
517 6 UPP BOON KENG RD IndexError
615 407 CLEMENTI AVE 1 IndexError
668 19 KG BAHRU HILL IndexError
701 10 TEBAN GDNS RD IndexError
767 169 BOON LAY DR IndexError
976 1 JLN PASAR BARU IndexError
993 171 BOON LAY DR IndexError
1019 20 UPP BOON KENG RD IndexError
1047 220 BOON LAY AVE IndexError
1312 172 BOON LAY DR IndexError
1730 6 TEBAN GDNS RD IndexError
1742 74 COMMONWEALTH DR IndexError
1979 2A WOODLANDS CTR RD IndexError
2055 170 BOON LAY DR IndexError
2086 96 MARGARET DR IndexError
2333 59 SIMS DR IndexError
2652 29 HAVELOCK RD IndexError
2764 30 LOR 5 TOA PAYOH IndexError
2902 91 ZION RD IndexError
2946 24 KG BAHRU HILL IndexError
2975 10 UPP BOON KENG RD IndexError
3073 54 SIMS DR IndexError
3124 78 COMMONWEALTH DR IndexError
3170 7 SELETAR WEST FARMWAY 6 IndexError
3358 90 ZION RD Index

In [71]:
print(block_coord.shape)
block_coord.head()

(10, 3)


Unnamed: 0,address,LATITUDE,LONGITUDE
0,148 WOODLANDS ST 13,1.43576274995203,103.77402996229
1,426 CLEMENTI AVE 3,1.31158040643011,103.764211279777
2,85 CIRCUIT RD,1.32265241052434,103.885755259463
3,467 TAMPINES ST 44,1.35996852651128,103.954665911889
4,367 CORPORATION DR,1.33769782307079,103.719145674511


In [96]:
# Exporting the coordinates to a csv file
block_coord.to_csv('../data/coordinates/block_coordinates.csv')

There are 83 addresses that are not updated on OneMap.sg as they are relatively new. I will be filling in those addresses manually to ensure we do not lose any data of newly released flats.

In [7]:
# Importing the amended block_coordinates file
block_coord = pd.read_csv('../data/coordinates/block_coordinates_amended.csv')

In [8]:
block_coord.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9462 entries, 0 to 9461
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   address    9462 non-null   object 
 1   LATITUDE   9462 non-null   float64
 2   LONGITUDE  9462 non-null   float64
dtypes: float64(2), object(1)
memory usage: 221.9+ KB


# Housing Amenities
---

Next, we will need to apply our first 2 functions to scrape coordinates, find the closest distance of each address from a list of amenities, and the number of said amenities in 1km radius. Relevant data for the respective amneties were extracted from [data.gov.sg](https://data.gov.sg/) in preparation for this stage.

## MRT Stations

In [36]:
# First look at the data
print(mrt.shape)
mrt.head()

(193, 7)


Unnamed: 0,STN_NAME,STN_NO,X,Y,Latitude,Longitude,COLOR
0,CHOA CHU KANG LRT STATION,BP1,18121.6052,40753.8693,1.384836,103.74458,OTHERS
1,FAJAR LRT STATION,BP10,21043.4356,40718.8826,1.384521,103.770827,OTHERS
2,SEGAR LRT STATION,BP11,20908.767,41078.4025,1.387772,103.769617,OTHERS
3,JELAPANG LRT STATION,BP12,20341.7491,40960.202,1.386703,103.764523,OTHERS
4,SENJA LRT STATION,BP13,20104.0139,40516.7226,1.382692,103.762388,OTHERS


In [37]:
# Extracting relevant columns
mrt = mrt[['STN_NAME','Latitude','Longitude']]
mrt.head()

Unnamed: 0,STN_NAME,Latitude,Longitude
0,CHOA CHU KANG LRT STATION,1.384836,103.74458
1,FAJAR LRT STATION,1.384521,103.770827
2,SEGAR LRT STATION,1.387772,103.769617
3,JELAPANG LRT STATION,1.386703,103.764523
4,SENJA LRT STATION,1.382692,103.762388


### Finding Nearest Mrt Station

In [39]:
# Applying the function
nearby_mrt = search_nearby(block_coord, mrt)

In [40]:
# Creating a dataframe from the dictionary and renaming the columns
nearby_mrt_df = pd.DataFrame.from_dict(nearby_mrt).T
nearby_mrt_df = nearby_mrt_df.rename(columns={0: 'address', 1: 'mrt', 2: 'mrt_dist', 3: 'num_mrt_1km'}).reset_index(drop=True)
nearby_mrt_df.head()

Unnamed: 0,address,mrt,mrt_dist,num_mrt_1km
0,148 WOODLANDS ST 13,MARSILING MRT STATION,0.359246,1
1,426 CLEMENTI AVE 3,CLEMENTI MRT STATION,0.392851,1
2,85 CIRCUIT RD,MATTAR MRT STATION,0.544896,6
3,467 TAMPINES ST 44,TAMPINES EAST MRT STATION,0.41771,1
4,367 CORPORATION DR,LAKESIDE MRT STATION,0.753707,1


---
## Malls

In [90]:
print(malls.shape)
malls.head()

(169, 1)


Unnamed: 0,malls
0,100 AM
1,313@Somerset
2,321 Clementi
3,888 Plaza
4,Admiralty Place


### Getting Coordinates

In [91]:
# Creating a list of unique malls
all_malls = list(malls['malls'])
print('No. of Malls:', len(all_malls))
all_malls[:10]


No. of Malls: 169


['100 AM',
 '313@Somerset',
 '321 Clementi',
 '888 Plaza',
 'Admiralty Place',
 'Alexandra Central',
 'Alexandra Retail Centre',
 'AMK Hub',
 'Anchorpoint',
 'Aperia']

In [92]:
# Getting coordinates of all the malls
mall_coord = get_coordinates(all_malls)

In [93]:
print(mall_coord.shape)
mall_coord.head()

(169, 3)


Unnamed: 0,address,LATITUDE,LONGITUDE
0,100 AM,1.27468281482263,103.843488359469
1,313@Somerset,1.30101436404056,103.838360664485
2,321 Clementi,1.31200212030821,103.764986676365
3,888 Plaza,1.43712301500434,103.795314383823
4,Admiralty Place,1.43963261801972,103.802121646793


In [94]:
# Exporting the coordinates to a csv file
mall_coord.to_csv('../data/coordinates/mall_coordinates.csv',index=False)

### Finding Nearest Malls

In [41]:
# Importing the mall_coordinates file
mall_coord = pd.read_csv('../data/coordinates/mall_coordinates.csv')

In [42]:
# Applying the function
nearby_malls = search_nearby(block_coord, mall_coord)

In [43]:
# Creating a dataframe from the dictionary and renaming the columns
nearby_malls_df = pd.DataFrame.from_dict(nearby_malls).T
nearby_malls_df = nearby_malls_df.rename(columns={0: 'address', 1: 'mall', 2: 'mall_dist', 3: 'num_mall_1km'}).reset_index(drop=True)
nearby_malls_df.head()

Unnamed: 0,address,mall,mall_dist,num_mall_1km
0,148 WOODLANDS ST 13,Marsiling Mall,0.695939,1
1,426 CLEMENTI AVE 3,321 Clementi,0.098085,4
2,85 CIRCUIT RD,Paya Lebar Square,0.870095,1
3,467 TAMPINES ST 44,Loyang Point,1.317606,0
4,367 CORPORATION DR,Taman Jurong Shopping Centre,0.347819,1


---
## Supermarkets

In [146]:
print(supermarkets.shape)
supermarkets.head()

(607, 4)


Unnamed: 0,licence_numbers,business_name,name_of_license,premise_address
0,B02008E000,SHENG SIONG SUPERMARKET,SHENG SIONG SUPERMARKET PTE LTD,"845 YISHUN STREET 81 #01-184, S(760845)"
1,B02011P000,GIANT,COLD STORAGE SINGAPORE (1983) PTE LTD,"524A JELAPANG ROAD #03-13/18, GREENRIDGE SHOPP..."
2,B02012N000,COLD STORAGE,COLD STORAGE SINGAPORE (1983) PTE LTD,"768 WOODLANDS AVENUE 6 #01-34, WOODLANDS MART ..."
3,B02015J000,SHENG SIONG SUPERMARKET,SHENG SIONG SUPERMARKET PTE LTD,"122 ANG MO KIO AVENUE 3 #01-1753,#01-1757,#01-..."
4,B02017C000,SHENG SIONG SUPERMARKET,SHENG SIONG SUPERMARKET PTE LTD,"301 WOODLANDS STREET 31 #01-217, S(730301)"


In [147]:
# Extracting postal code from the last 8 characters of premise_address
supermarkets['postal_code'] = supermarkets['premise_address'].str[-8:]
supermarkets['postal_code'] = supermarkets['postal_code'].str.strip('()')

In [148]:
supermarkets.head()

Unnamed: 0,licence_numbers,business_name,name_of_license,premise_address,postal_code
0,B02008E000,SHENG SIONG SUPERMARKET,SHENG SIONG SUPERMARKET PTE LTD,"845 YISHUN STREET 81 #01-184, S(760845)",760845
1,B02011P000,GIANT,COLD STORAGE SINGAPORE (1983) PTE LTD,"524A JELAPANG ROAD #03-13/18, GREENRIDGE SHOPP...",671524
2,B02012N000,COLD STORAGE,COLD STORAGE SINGAPORE (1983) PTE LTD,"768 WOODLANDS AVENUE 6 #01-34, WOODLANDS MART ...",730768
3,B02015J000,SHENG SIONG SUPERMARKET,SHENG SIONG SUPERMARKET PTE LTD,"122 ANG MO KIO AVENUE 3 #01-1753,#01-1757,#01-...",560122
4,B02017C000,SHENG SIONG SUPERMARKET,SHENG SIONG SUPERMARKET PTE LTD,"301 WOODLANDS STREET 31 #01-217, S(730301)",730301


In [141]:
supermarkets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 607 entries, 0 to 606
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   licence_numbers  607 non-null    object
 1   business_name    607 non-null    object
 2   name_of_license  607 non-null    object
 3   premise_address  607 non-null    object
 4   postal_code      607 non-null    object
dtypes: object(5)
memory usage: 23.8+ KB


### Getting Coordinates

In [149]:
# Creating a list of unique supermarkets
supermarket_address = list(supermarkets['postal_code'])
print('No. of Supermarkets:', len(supermarket_address))

unique_supermarket = list(set(supermarket_address))
print('Unique addresses:', len(unique_supermarket))

No. of Supermarkets: 607
Unique addresses: 547


In [151]:
# Applying the function to find the coordinates of each address
supermarket_coord = get_coordinates(unique_supermarket)

In [45]:
print(supermarket_coord.shape)
supermarket_coord.head()

(547, 3)


Unnamed: 0,address,LATITUDE,LONGITUDE
0,636937,1.270196,103.63448
1,389551,1.31415,103.888545
2,560260,1.368916,103.8345
3,538692,1.375715,103.879472
4,757177,1.45075,103.796847


In [152]:
# Exporting the coordinates to a csv file
supermarket_coord.to_csv('../data/coordinates/supermarket_coordinates.csv',index=False)

### Finding Nearest Supermarkets

In [44]:
# Importing the supermarket_coordinates file
supermarket_coord = pd.read_csv('../data/coordinates/supermarket_coordinates.csv')

In [46]:
# Applying the function
nearby_supermarkets = search_nearby(block_coord, supermarket_coord)

In [47]:
# Creating a dataframe from the dictionary and renaming the columns
nearby_supermarkets_df = pd.DataFrame.from_dict(nearby_supermarkets).T
nearby_supermarkets_df = nearby_supermarkets_df.rename(columns={0: 'address', 1: 'supermarket', 2: 'supermarket_dist', 3: 'num_supermarket_1km'}).reset_index(drop=True)
nearby_supermarkets_df.head()

Unnamed: 0,address,supermarket,supermarket_dist,num_supermarket_1km
0,148 WOODLANDS ST 13,730182,0.231658,5
1,426 CLEMENTI AVE 3,120451,0.172466,8
2,85 CIRCUIT RD,380114,0.156163,8
3,467 TAMPINES ST 44,520475,0.175411,5
4,367 CORPORATION DR,610399,0.319123,2


---
## Hawker Centres and Markets

### Finding Nearest

In [49]:
# Applying the function
nearby_hawkers = search_nearby(block_coord, hawkers)

In [61]:
# Creating a dataframe from the dictionary, dropping and renaming the columns
nearby_hawkers_df = pd.DataFrame.from_dict(nearby_hawkers).T
nearby_hawkers_df = nearby_hawkers_df.rename(columns={0: 'address', 1: 'hawker', 2: 'hawker_dist', 3: 'num_hawker_1km'}).reset_index(drop=True)
nearby_hawkers_df = nearby_hawkers_df.drop(['hawker'], axis=1)
print(nearby_hawkers_df.shape)
nearby_hawkers_df.head()

(9462, 3)


Unnamed: 0,address,hawker_dist,num_hawker_1km
0,148 WOODLANDS ST 13,0.685467,2
1,426 CLEMENTI AVE 3,0.188672,5
2,85 CIRCUIT RD,0.094512,5
3,467 TAMPINES ST 44,1.528611,0
4,367 CORPORATION DR,0.43349,1


---
## Parks

### Finding Nearest Parks

In [51]:
# Applying the function
nearby_parks = search_nearby(block_coord, parks)

In [52]:
# Creating a dataframe from the dictionary, dropping and renaming the columns
nearby_parks_df = pd.DataFrame.from_dict(nearby_parks).T
nearby_parks_df = nearby_parks_df.rename(columns={0: 'address', 1: 'park', 2: 'park_dist', 3: 'num_park_1km'}).reset_index(drop=True)
nearby_parks_df = nearby_parks_df.drop(['park'], axis=1)
nearby_parks_df.head()

Unnamed: 0,address,park_dist,num_park_1km
0,148 WOODLANDS ST 13,0.50081,2
1,426 CLEMENTI AVE 3,0.318873,5
2,85 CIRCUIT RD,0.233353,2
3,467 TAMPINES ST 44,0.795132,1
4,367 CORPORATION DR,0.269495,2


---
## Schools

For schools, we will only focus on Primary schools as there are certain [distance factors](https://www.moe.gov.sg/primary/p1-registration/distance) that parents can utilise to drastically improve the chances of enrollment. Research shows that this is one of the common concerns for a buyer, even though there are preventive measures in place. 

In [21]:
print(schools.shape)
schools.head()

(346, 7)


Unnamed: 0,school_name,address,postal_code,mrt_desc,dgp_code,zone_code,mainlevel_code
0,CANTONMENT PRIMARY SCHOOL,1 Cantonment Close,88256,Tanjong Pagar Outram Park,BUKIT MERAH,SOUTH,PRIMARY
1,CHIJ ST. THERESA'S CONVENT,160 LOWER DELTA ROAD,99138,"HARBOURFRONT MRT, TIONG BAHRU MRT",BUKIT MERAH,SOUTH,SECONDARY
2,CHIJ (KELLOCK),1 Bukit Teresa Road,99757,Outram Park Station,BUKIT MERAH,SOUTH,PRIMARY
3,RADIN MAS PRIMARY SCHOOL,1 BUKIT PURMEI AVENUE,99840,Tiong Bahru MRT HarbourFront MRT,BUKIT MERAH,SOUTH,PRIMARY
4,BLANGAH RISE PRIMARY SCHOOL,91 TELOK BLANGAH HEIGHTS,109100,"Telok Blangah, Tiong Bahru & Redhill",BUKIT MERAH,SOUTH,PRIMARY


In [22]:
schools.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 346 entries, 0 to 345
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   school_name     346 non-null    object
 1   address         346 non-null    object
 2   postal_code     346 non-null    object
 3   mrt_desc        346 non-null    object
 4   dgp_code        346 non-null    object
 5   zone_code       346 non-null    object
 6   mainlevel_code  346 non-null    object
dtypes: object(7)
memory usage: 19.0+ KB


In [23]:
# Getting unique values of school levels
schools['mainlevel_code'].unique()

array(['PRIMARY', 'SECONDARY', 'MIXED LEVELS', 'JUNIOR COLLEGE',
       'CENTRALISED INSTITUTE'], dtype=object)

In [24]:
# Extracting Primary Schools
pri_sch = schools[schools['mainlevel_code'] == 'PRIMARY']
print(pri_sch.shape)
pri_sch.head()

(187, 7)


Unnamed: 0,school_name,address,postal_code,mrt_desc,dgp_code,zone_code,mainlevel_code
0,CANTONMENT PRIMARY SCHOOL,1 Cantonment Close,88256,Tanjong Pagar Outram Park,BUKIT MERAH,SOUTH,PRIMARY
2,CHIJ (KELLOCK),1 Bukit Teresa Road,99757,Outram Park Station,BUKIT MERAH,SOUTH,PRIMARY
3,RADIN MAS PRIMARY SCHOOL,1 BUKIT PURMEI AVENUE,99840,Tiong Bahru MRT HarbourFront MRT,BUKIT MERAH,SOUTH,PRIMARY
4,BLANGAH RISE PRIMARY SCHOOL,91 TELOK BLANGAH HEIGHTS,109100,"Telok Blangah, Tiong Bahru & Redhill",BUKIT MERAH,SOUTH,PRIMARY
7,QIFA PRIMARY SCHOOL,50 WEST COAST AVENUE,128104,Clementi MRT Station,CLEMENTI,WEST,PRIMARY


### Getting Coordinates

In [25]:
# Creating a list of unique schools
all_schools = list(pri_sch['postal_code'])
print('All Postal Codes:', len(all_schools))

unique_schools = list(set(all_schools))
print('Unique Postal Codes:', len(unique_schools))

All Postal Codes: 187
Unique Postal Codes: 187


In [29]:
# Applying the function
pri_sch_coord = get_coordinates(unique_schools)

In [30]:
# Exporting the coordinates to a csv file
pri_sch_coord.to_csv('../data/coordinates/pri_school_coordinates.csv',index=False)

### Finding Nearby Schools

In [32]:
# Import school_coordinates file
pri_school_coord = pd.read_csv('../data/coordinates/pri_school_coordinates.csv')

In [33]:
# Applying the function
nearby_schools = search_nearby(block_coord, pri_school_coord)

In [34]:
# Creating a dataframe from the dictionary and renaming the columns
nearby_schools_df = pd.DataFrame.from_dict(nearby_schools).T
nearby_schools_df = nearby_schools_df.rename(columns={0: 'address', 1: 'school', 2: 'school_dist', 3: 'num_school_1km', 4: 'number_school_btw_1km_2km'}).reset_index(drop=True)
nearby_schools_df.head()

Unnamed: 0,address,school,school_dist,num_school_1km,number_school_btw_1km_2km
0,148 WOODLANDS ST 13,738927,0.21703,2,5
1,426 CLEMENTI AVE 3,129903,0.402997,4,0
2,85 CIRCUIT RD,387724,0.498281,3,3
3,467 TAMPINES ST 44,529565,0.658775,3,8
4,367 CORPORATION DR,618310,0.130183,2,5


---
## Distance From City Hall Mrt

Lastly, we will engineer a feature that reflects the distance of a unique address to the central area of Singapore. In my opinion, City Hall MRT seems to be a good central point within the central area to use to reflect this distance.

In [56]:
# Applying the function
cityhall_dist = dist_from_location(block_coord, location=(1.2931480783098117, 103.85202188773242))

# Creating a dataframe from the dictionary and renaming the columns
cityhall_dist = pd.DataFrame.from_dict(cityhall_dist).T
cityhall_dist = cityhall_dist.rename(columns={0: 'address', 1: 'cityhall_dist'}).reset_index(drop=True)
cityhall_dist.head()

Unnamed: 0,address,cityhall_dist
0,148 WOODLANDS ST 13,18.000426
1,426 CLEMENTI AVE 3,9.982797
2,85 CIRCUIT RD,4.973693
3,467 TAMPINES ST 44,13.604509
4,367 CORPORATION DR,15.586738


---
## Merge and Export

Now we will need to merge all our newly created datasets so that we can utilise them in our next notebook.

In [59]:
# Merge all dataframes

nearby_amenities = nearby_mrt_df.merge(nearby_malls_df, on='address', how='outer')
nearby_amenities = nearby_amenities.merge(nearby_supermarkets_df, on='address', how='outer')
nearby_amenities = nearby_amenities.merge(nearby_hawkers_df, on='address', how='outer')
nearby_amenities = nearby_amenities.merge(nearby_parks_df, on='address', how='outer')
nearby_amenities = nearby_amenities.merge(nearby_schools_df, on='address', how='outer')
nearby_amenities = nearby_amenities.merge(cityhall_dist, on='address', how='outer')

print(nearby_amenities.shape)
nearby_amenities.head()

(9462, 19)


Unnamed: 0,address,mrt,mrt_dist,num_mrt_1km,mall,mall_dist,num_mall_1km,supermarket,supermarket_dist,num_supermarket_1km,hawker_dist,num_hawker_1km,park_dist,num_park_1km,school,school_dist,num_school_1km,number_school_btw_1km_2km,cityhall_dist
0,148 WOODLANDS ST 13,MARSILING MRT STATION,0.359246,1,Marsiling Mall,0.695939,1,730182,0.231658,5,0.685467,2,0.50081,2,738927,0.21703,2,5,18.000426
1,426 CLEMENTI AVE 3,CLEMENTI MRT STATION,0.392851,1,321 Clementi,0.098085,4,120451,0.172466,8,0.188672,5,0.318873,5,129903,0.402997,4,0,9.982797
2,85 CIRCUIT RD,MATTAR MRT STATION,0.544896,6,Paya Lebar Square,0.870095,1,380114,0.156163,8,0.094512,5,0.233353,2,387724,0.498281,3,3,4.973693
3,467 TAMPINES ST 44,TAMPINES EAST MRT STATION,0.41771,1,Loyang Point,1.317606,0,520475,0.175411,5,1.528611,0,0.795132,1,529565,0.658775,3,8,13.604509
4,367 CORPORATION DR,LAKESIDE MRT STATION,0.753707,1,Taman Jurong Shopping Centre,0.347819,1,610399,0.319123,2,0.43349,1,0.269495,2,618310,0.130183,2,5,15.586738


In [60]:
# Exporting the dataframe to a csv file
nearby_amenities.to_csv('../data/nearby_amenities.csv', index=False)

---