# Analysis HDB Resale Price vs Venues Data in Singapore
### Applied Data Science Capstone by Sazili Muhammad

## Table of contents
* [1. Introduction: Business Problem](#introduction)
* [2. Data](#data)
* [3. Methodology](#methodology)
* [4. Analysis](#analysis)
* [5. Results and Discussion](#results)
* [6. Conclusion](#conclusion)
* [7. References](#references)

## 1. Introduction <a name="introduction"></a>

#### 1.1 Background

Majority of Singapore population are living in the public housing developments in Singapore are publicly governed and developed by the state **Housing and Development Board (HDB)** under a 99-year lease. These flats are located in housing estates, which are self-contained satellite towns with well-maintained schools, supermarkets, malls, community hospitals, clinics, hawker centres (food court) and sports and recreational facilities. Every housing estate includes **MRT stations** and bus stops that link residents to other parts of the city-state.

As compared to most parts of the world, public housing in Singapore is not ostracised by a wide majority of the population and its government, and acts as a necessary and vital measure to provide immaculate and safe housing surrounded by public amenities at affordable prices, especially during its rapid development and industrialisation in the early years of independence. It is also meant to foster social cohesion between social classes and races of Singapore, and prevent neglected areas or districts and ethnic enclaves from developing. As such, **it is considered a unique part of Singaporean culture and identity**, being commonly associated with the country.

There are a large variety of flat types and layouts which cater to various housing budgets. HDB flats were built in mind to primarily provide affordable housing for **Singaporeans/Permanent Residents** and their purchase can be financially aided by the **Central Provident Fund (CPF)** in addition to various grants. Due to changing demands, HDB introduced the Design, Build and Sell Scheme to produce up-market public housing developments.

New public housing flats are strictly only eligible for purchase towards Singaporean citizens. The housing schemes and grants available to finance the purchase of a flat are also only extended to households owned by Singaporeans, while Permanent Residents do not get any housing grants or subsidies from the Singaporean government and could only purchase **resale flats from the secondary market at a market price**. Such policies have helped **Singapore reach a home-ownership rate of 91%, one of the highest in the world**. In 2008, Singapore was lauded by the United Nations Habitat's State of the World's Cities report as the only slum-free city in the world. [\[1\]](#references)

#### 1.2 Problem

HDB flats surrounded by various amenities such as hawker centers, coffee shop, school, carpark, community centers, shopping center/malls, playground, sport center. Those amenities impact to housing resale price fluctuatives which effect resale prices are not balances among towns in Singapore. Some town are highly popular and mature which makes rising of resale price faster than other town. 

This situation makes house buyer has difficulties to purchase flats. The challenges also facing to business owner to invest which location has profitable and mature. For government and regulation makers will have difficulties to make fair decision and build facilities in certain location.

#### 1.3 Purpose

This project will try to analyze the housing resale price vs various amenities (venues) in different angle. This project will give better insight to house buyer, business owner, or government/regulator to have better decision to invest and improve in various location across the island.

## 2. Data aquisition and data cleaning <a name="data"></a>

#### 2.1 Data Sources

The dataset for this project are captured from: 
1. Data.gov.sg [\[2\]](#references). This is Singapore government body which release data for public purpose. 
2. Google Maps to capture latitude and longitude.

#### 2.2 Data cleaning and data selection

Originally this data is very detail which comprises resale price from all towns in Singapore from Jan-2017 until Oct-2020 which has in detail block number, flat type, and flat floor. Since this project is not purpose into detail flat address but instead only interested on towns. So that, I clean up and summarize the dataset into average of housing resale price across all Singapore towns for year 2020 (since Jan-Oct). 

Note that, all the prices is in Singapore dollars (SGD).

The total Singapore towns are 26 towns, as follows:

In [17]:
# Import python libraries
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

# initialize list of lists 
data_ori = [
['ANG MO KIO', 400148.74],
['BEDOK', 405873.45],
['BISHAN', 627302.21],
['BUKIT BATOK', 397548.97],
['BUKIT MERAH', 562416.43],
['BUKIT PANJANG', 432441.11],
['BUKIT TIMAH', 696836.38],
['CENTRAL AREA', 585125.84],
['CHOA CHU KANG', 404028.3],
['CLEMENTI', 493022.24],
['GEYLANG', 429776.44],
['HOUGANG', 449869.59],
['JURONG EAST', 407133.8],
['JURONG WEST', 407372.67],
['KALLANG/WHAMPOA', 475455.08],
['MARINE PARADE', 480472.48],
['PASIR RIS', 499348.52],
['PUNGGOL', 462420.83],
['QUEENSTOWN', 584474.49],
['SEMBAWANG', 392777.98],
['SENGKANG', 450835.38],
['SERANGOON', 487274.65],
['TAMPINES', 478879.13],
['TOA PAYOH', 442313.77],
['WOODLANDS', 392932.61],
['YISHUN', 379274.21]
] 

# Create the pandas DataFrame 
data_ori = pd.DataFrame(data_ori, columns = ['Town','Resale Price']) 

data_ori

Unnamed: 0,Town,Resale Price
0,ANG MO KIO,400148.74
1,BEDOK,405873.45
2,BISHAN,627302.21
3,BUKIT BATOK,397548.97
4,BUKIT MERAH,562416.43
5,BUKIT PANJANG,432441.11
6,BUKIT TIMAH,696836.38
7,CENTRAL AREA,585125.84
8,CHOA CHU KANG,404028.3
9,CLEMENTI,493022.24


In order to use Foursquare to find venues data, I capture latitude and longitude for each Singapore town from Google Maps, as shown below:

In [18]:
data_latlng = [
['ANG MO KIO', 1.3691, 103.8454],
['BEDOK', 1.3236, 103.9273],
['BISHAN', 1.3526, 103.8352],
['BUKIT BATOK', 1.359, 103.7637],
['BUKIT MERAH', 1.2819, 103.8239],
['BUKIT PANJANG', 1.3774, 103.7719],
['BUKIT TIMAH', 1.3294, 103.8021],
['CENTRAL AREA', 1.2789, 103.8536],
['CHOA CHU KANG', 1.384, 103.747],
['CLEMENTI', 1.3162, 103.7649],
['GEYLANG', 1.3201, 103.8918],
['HOUGANG', 1.3612, 103.8863],
['JURONG EAST', 1.3329, 103.7436],
['JURONG WEST', 1.3404, 103.709],
['KALLANG/WHAMPOA', 1.3245, 103.8572],
['MARINE PARADE', 1.302, 103.8971],
['PASIR RIS', 1.3721, 103.9474],
['PUNGGOL', 1.3984, 103.9072],
['QUEENSTOWN', 1.2942, 103.7861],
['SEMBAWANG', 1.4491, 103.8185],
['SENGKANG', 1.3868, 103.8914],
['SERANGOON', 1.3554, 103.8679],
['TAMPINES', 1.3496, 103.9568],
['TOA PAYOH', 1.3343, 103.8563],
['WOODLANDS', 1.4382, 103.789],
['YISHUN', 1.4304, 103.8354]
] 

# Create the pandas DataFrame 
data_latlng = pd.DataFrame(data_latlng, columns = ['Town','Latitude','Longitude']) 

data_latlng

Unnamed: 0,Town,Latitude,Longitude
0,ANG MO KIO,1.3691,103.8454
1,BEDOK,1.3236,103.9273
2,BISHAN,1.3526,103.8352
3,BUKIT BATOK,1.359,103.7637
4,BUKIT MERAH,1.2819,103.8239
5,BUKIT PANJANG,1.3774,103.7719
6,BUKIT TIMAH,1.3294,103.8021
7,CENTRAL AREA,1.2789,103.8536
8,CHOA CHU KANG,1.384,103.747
9,CLEMENTI,1.3162,103.7649


Combine original data from Data.gov.sg and Google maps (latitude/longitude), hence getting final dataset as follows:

In [61]:
data = pd.merge(left=data_ori, right=data_latlng, left_on='Town', right_on='Town').sort_values(by=['Resale Price'], inplace=False, ascending=False)
data.reset_index(drop=True, inplace=True)

data.shape
data

Unnamed: 0,Town,Resale Price,Latitude,Longitude
0,BUKIT TIMAH,696836.38,1.3294,103.8021
1,BISHAN,627302.21,1.3526,103.8352
2,CENTRAL AREA,585125.84,1.2789,103.8536
3,QUEENSTOWN,584474.49,1.2942,103.7861
4,BUKIT MERAH,562416.43,1.2819,103.8239
5,PASIR RIS,499348.52,1.3721,103.9474
6,CLEMENTI,493022.24,1.3162,103.7649
7,SERANGOON,487274.65,1.3554,103.8679
8,MARINE PARADE,480472.48,1.302,103.8971
9,TAMPINES,478879.13,1.3496,103.9568


#### Hence, I will use this dataset above for further analysis for this project

## 3. Methodology <a name="methodology"></a>

## 4. Analysis <a name="analysis"></a>

## 5. Results and Discussion <a name="results"></a>

## 6. Conclusion <a name="conclusion"></a>

## 7. References <a name="references"></a>

* [\[1\] Public Housing in Singapore](https://en.wikipedia.org/wiki/Public_housing_in_Singapore)
* [\[2\] Resale Flat Prices](https://data.gov.sg/dataset/resale-flat-prices)