<h1>Finding the perfect HDB Block in Singapore</h1>

<h2>1. Introduction</h2>

<p>Singapore is a very small but well-developed country in Southeast Asia. It has only one city, named Singapore, which has gained a reputation for holding first Trump-Kim Summit. Long before that, Singapore is always being a very international city where most regional headquarters for top 500 fortune companies are located. According to the report of Department of Statistics Singapore, the proportion of Singapore citizens in the total population is only 61.56% in 2018, and for Permanent Residents and foreigners, it is 9.26% and 29.16% respectively. For people moving to Singapore, settling down and finding the perfect place to live would be the first thing to be considered.</p>

<p>However, the current property recommendation system in Singapore is not as perfect as it should be.  Generally speaking, if someone decides to live in a new place, before contacting a specific property agent, he or she should clearly know that which location is a preference, as different districts usually have different agents. This question is actually very difficult to be answered for residents in Singapore, especially those newcomer foreigners.  Singapore has no rural areas, and the mature communities always have convenient transportation and life facilities. If your work location is in the CBD like Raffles Place, even if you live in the westmost or east most areas of Singapore, it may only take an hour from your home to work place. If you can bear the distance for 30 minutes’ public transportation, you still have too many choices about where to live.  As a result, region factor cannot be in the first place to classify costumers in local property market. </p>

<p>In this report, I try to build a new property recommendation system for local agents. I believe customers’ different needs on life-styles is beyond location preference in most cases in Singapore. For example, parents with young kids would take the distance to elite primary schools on as their priority, while foreign students and young employees maybe enjoy a rapid and modern life-style, and they might choose somewhere close to MRT, coffee shops and gyms.  So, I suggest we should firstly cluster properties types in the whole city (which is equal to the whole country) by differences of customers’ needs, then add that customers’ detailed preferences to decide which regions and what blocks are the best choice for them. Over eighty percent of Singapore residents choose to live in HDB (Housing and Development Board) blocks, and the data about those HDB blocks is very easy to get from government supported websites. As a result, I narrowed down my research scope, and focus on HDB only. </p>

<h2>2. Data</h2>

<p>The data includes three aspects.</p>

<h4>2.1 HDB related data</h4>

**Basic building features**

<p>I got a data set about all the HDB blocks(12132 blocks in total) in Singpaore, from website "https://data.gov.sg/", which includes the block numbers, street names, flat areas, the built year and so on. I assumed that those blocks  standing in the same street and built in the same year have similar features,  and emerged them into a group. Eventually, I got 1570 block groups. As is shown(the first column shows the representive block of each group, the last column shows other similar blocks numbers that have the same street names): </p>

In [15]:
import pandas as pd
filename = "simi_blocks.csv"
df = pd.read_csv(filename)
df.head()

Unnamed: 0,blk_no,street,max_floor_lvl,year_completed,residential,commercial,market_hawker,miscellaneous,multistorey_carpark,precinct_pavilion,...,4room_sold,5room_sold,exec_sold,multigen_sold,studio_apartment_sold,1room_rental,2room_rental,3room_rental,other_room_rental,similar blocks
0,469B,ADMIRALTY DR,16,1999,Y,N,N,N,N,N,...,72,48,54,0,0,0,0,0,0,"356A,356B,356C,357,357A,357B,357C,359,359A,359..."
1,467A,ADMIRALTY DR,16,2000,Y,Y,N,N,N,N,...,72,48,54,0,0,0,0,0,0,"353A,353B,353C,354A,354B,354C,354D,467,467A"
2,405,ADMIRALTY LINK,15,1999,Y,N,N,Y,N,N,...,112,0,0,0,0,0,0,0,0,401402403404405
3,485,ADMIRALTY LINK,16,2001,Y,N,N,N,N,N,...,154,0,0,0,0,0,0,0,0,484485
4,493,ADMIRALTY LINK,21,2002,Y,N,N,N,N,N,...,82,59,0,0,0,0,0,0,0,486491492493


**price feature**

<p>Price is a very important factor for making decisions about renting or buying houses. However, the data set I got from the above webside didn't include the price feature. So, I decide to collect that kind of data from local property webside. and I chose the webside "https://www.srx.com.sg/", which is from SRX company. by inputing the street name and block number of a certain block, we can get a web page that shows a list of price details. I use the mean price from that list of blocks as our index</p> 

**location data**

<p>Getting location data is a necessary in our report for two reasons. Firstly, by using location data of each block, we can get venue features from foursquare, the life styles reflected by venue features are very important for our analysis. Secondly, we can compute the distance between each block and other life facilities such as primary school.</p>
<p>By using the package Geocoder in Python, I got the latitudes and longitudes of the 1570 representive blocks.Then I creat a map by adding the price index</p>

In [18]:
import folium
print(folium.__version__)

filename2 = "0_blocks_include_price.csv"
df00 = pd.read_csv(filename2)
df01 = df00.drop([1529])

# extract lat, lon, and magnitude for the folium heatmap
lats = df01['latitude'].astype(float)
lons = df01['longitude'].astype(float)
mag = df01['psf_sale'].astype(float)
lats = lats.values.tolist()
lons = lons.values.tolist()
mag = mag.values.tolist()
mm = folium.Map([1.3521, 103.8198], tiles='Cartodb Positron', zoom_start=12)
#creating a Marker for each point in df_sample. Each point will get a popup with their zip

for i in range(0,len(lats)):
    folium.Marker(location = [lats[i],lons[i]],
                 # Set icon to DivIcon to with conditional style formatting to reference the random temp value.
                  icon=folium.DivIcon(html=f"""<div style="font-family: courier new; color: {'yellowgreen' if mag[i] < 300 else 'yellow' if mag[i] <400 else 'gold' if mag[i] <500 else 'orange' if mag[i] <600 else 'red' if mag[i] <700 else 'darkred'} ">{str(int(mag[i]))}</div>""")
                 ).add_to(mm)


0.5.0


In [19]:
mm

**venue features**

In [21]:
filename3 = "block_venues.csv"
df3 = pd.read_csv(filename3)
df3.head()

Unnamed: 0.1,Unnamed: 0,block_street,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,"1A,CANTONMENT RD",1.27783,103.840953,Nylon Coffee Roasters,1.276657,103.840073,Coffee Shop
1,1,"1A,CANTONMENT RD",1.27783,103.840953,Binomio Spanish Restaurante,1.277713,103.842248,Spanish Restaurant
2,2,"1A,CANTONMENT RD",1.27783,103.840953,Man Man 鰻満 Japanese Unagi Restaurant (Man Man ...,1.278876,103.841514,Japanese Restaurant
3,3,"1A,CANTONMENT RD",1.27783,103.840953,D.Bespoke,1.27868,103.840897,Speakeasy
4,4,"1A,CANTONMENT RD",1.27783,103.840953,APIARY,1.279499,103.842294,Ice Cream Shop


In [22]:
print('There are {} uniques categories.'.format(len(df3['Venue Category'].unique())))

There are 308 uniques categories.


<h4>2.2 Primary school related data</h4>

<p>Singapore govenment claims that the primary school enrollment policy strictly obeys within 1 km and within 2 km standard. As a result, moving to a new place to get a higher chance in school enrollment is very common in Singapore. So I add the prmary school related data in my research</p>
<p> We got the primary school list from webside, and by using geocoder package, got their locations as well.</p>

In [23]:
filename4 = "primary_school.csv"
df4 = pd.read_csv(filename4)
df4.head()

Unnamed: 0,name,latitude,longitude
0,Anderson Primary School,1.376318,103.835562
1,Ang Mo Kio Primary School,1.3691,103.83936
2,CHIJ St Nicholas Girls’ (Primary),1.340853,103.878447
3,Jing Shan Primary School,1.372258,103.852015
4,Mayflower Primary School,1.376664,103.843242


<h4>2.3 MindChamps preschools related data</h4>

<p>MindChamps is a very famous preschool brand in local education system, and it is alse my favourate type. So I planned to explore further on choosing perfect block after clustering those HDB blocks and giving them different labels.</p>
<p> This data set is not included in the cluster analysis</p>

In [24]:
filename5 = "mindchamps_info.csv"
df5 = pd.read_csv(filename5)
df5.head()

Unnamed: 0.1,Unnamed: 0,name,address,centre_no,postal_code,address_0,latitude,longitude
0,0,Boon Keng (Kallang),"30A Kallang Place, #01-01, Singapore 339213 (n...",6291 3068 /\n8820 3118,339213,30A Kallang Place,1.314695,103.865767
1,1,City Square Mall,"180 Kitchener Road, City Square Mall #07-01/05...",6834 4388,208539,180 Kitchener Road,1.311243,103.856577
2,2,Concorde Hotel (Orchard),"100 Orchard Road, #01-03C, Concorde Hotel & Sh...",6235 2358 /\n9665 3840,238840,100 Orchard Road,1.300618,103.842155
3,3,Liang Court,"177 River Valley Road, Liang Court #05-01, Sin...",6338 3002 /\n9114 2280,179030,177 River Valley Road,1.291575,103.845284
4,4,Paragon,"290 Orchard Road, Paragon #06-19/20, Singapore...",6732 0087,238859,290 Orchard Road,1.303661,103.835366
