## HDB Resale Flat Across Towns

### Aim: What is the HDB Resale Flat Across the Towns over the last 3 years? 

### Dataset

#### This dataset shows the resale price transactions based on the registration of the resale transactions, which comprises of month, town, flat type, block, street name, storey range, floor area, flat model, lease commencement date, lease remaining period, and the resale price variables.

#### Chart Type: Grouped Bar Chart

#### Source: https://data.gov.sg/dataset/resale-flat-prices

### Methodology

#### Step 1: Import the required libraries 

In [None]:
import numpy as np
import matplotlib.pyplot as plt

#### Step 2: Import the required dataset

In [None]:
filename = 'C:\\Users\Jeffrey Wong\SP_Assignment_Python\HDB_resale_flat_prices.csv'

data = np.genfromtxt(filename, skip_header = 1, dtype = [('month', 'U10'), ('town', 'U50'),
                                                        ('flat_type', 'U10'), ('block', 'U10'),
                                                        ('street_name', 'U50'), ('storey_range', 'U50'),
                                                        ('floor_area_sqm', 'i8'), ('flat_model', 'U50'),
                                                        ('lease_commence_date', 'U10'), ('remaining_lease', 'i8'),
                                                        ('resale_price', 'f8')], 
                     delimiter = ',', missing_values = ['na', '-', ''])

#### Step 3: Data Cleaning, Manipulation & Extraction

##### Use subsetting  with boolean indexing to determine the exact location of the data of an element on first month of each year and store the indexing values into assigned variables respectively

In [None]:
### get the index of an element for the first month of year 2016
index_2016 = np.where(data['month'] == '2016-01')
if len(index_2016) > 0 and len(index_2016[0]) > 0:
    position_2016 = index_2016[0][0]
print("The index of an element for the first month of year 2016 is " + str(position_2016))

### get the index of an element for the first month of year 2017
index_2017 = np.where(data['month'] == '2017-01')
if len(index_2017) > 0 and len(index_2017[0]) > 0:
    position_2017 = index_2017[0][0]
print("The index of an element for the first month of year 2017 is " + str(position_2017))

### get the index of an element for the first month of year 2018
index_2018 = np.where(data['month'] == '2018-01')
if len(index_2018) > 0 and len(index_2018[0]) > 0:
    position_2018 = index_2018[0][0]
print("The index of an element for the first month of year 2018 is " + str(position_2018))

### get the index of an element for the first month of year 2019
index_2019 = np.where(data['month'] == '2019-01')
if len(index_2019) > 0 and len(index_2019[0]) > 0:
    position_2019 = index_2019[0][0]
print("The index of an element for the first month of year 2019 is " + str(position_2019))

##### Extract the relevant data using the indexing values above (through slicing) for each year and store them into assigned variables respectively

In [None]:
year_2016 = data[position_2016:position_2017]
year_2017 = data[position_2017:position_2018] 
year_2018 = data[position_2018:position_2019]

print(year_2016)
print()
print(year_2017)
print()
print(year_2018)
print()

##### Extract the unique town name from the dataset

In [None]:
unique_town_name = np.unique(data['town'])
print(unique_town_name)

##### Use subsetting to extract the resale price based on the town name, and compute the whole set of data corresponding to the respective unique town name to derive the total average resale price and store the total average resale price corresponding to the unique town name through the empty numpy array

In [None]:
# for year 2016
avg_resale_price_2016 = np.array([])
for i in unique_town_name:
    resale_price_2016 = year_2016[year_2016['town'] == i]['resale_price'] # by subsetting
    avg_values = np.mean(resale_price_2016) # compute the average flat resale prices
    avg_resale_price_2016 = np.append(avg_resale_price_2016, avg_values)

# for year 2017
avg_resale_price_2017 = np.array([])
for i in unique_town_name:
    resale_price_2017 = year_2017[year_2017['town'] == i]['resale_price'] # by subsetting 
    avg_values = np.mean(resale_price_2017) # compute the average flat resale prices
    avg_resale_price_2017 = np.append(avg_resale_price_2017, avg_values)
    
# for year 2018 
avg_resale_price_2018 = np.array([])
for j in unique_town_name:
    resale_price_2018 = year_2018[year_2018['town'] == j]['resale_price'] # by subsetting
    avg_values = np.mean(resale_price_2018) # compute the average flat resale prices 
    avg_resale_price_2018 = np.append(avg_resale_price_2018, avg_values)

##### use logical-non operator, ~ to get an array with True everywhere that an array of elements are valid number
##### use logical array to index to the original array to retrieve just the non-NaN values for year 2016, 2017  and 2018

In [None]:
new_avg_resale_price_2016 = avg_resale_price_2016[~np.isnan(avg_resale_price_2016)]
new_avg_resale_price_2017 = avg_resale_price_2017[~np.isnan(avg_resale_price_2017)]
new_avg_resale_price_2018 = avg_resale_price_2018[~np.isnan(avg_resale_price_2018)]

#### Step 4: Data Visualization on Matplotlib

#### Define the function to create and dsiplay the multiple grouped bar chart

In [None]:
def bar_chart(new_avg_resale_price_2016, new_avg_resale_price_2017, new_avg_resale_price_2018, unique_town_name):    
    
    ind = np.arange(len(unique_town_name)) # the y locations for the groups
    width = 0.3 # the width of the bars

    fig, ax = plt.subplots(1, figsize = (12,15))
    bar_2016 = ax.barh(ind, avg_resale_price_2016, width, color = '#FF4C33', edgecolor = 'white', label = '2016')
    bar_2017 = ax.barh(ind + width, avg_resale_price_2017, width, color = '#337CFF', edgecolor = 'white', label = '2017')
    bar_2018 = ax.barh(ind + 2*width, avg_resale_price_2018, width, color = '#57F281', edgecolor = 'white', label = '2018')
    
    ### add title and axes labels 
    ax.set_title("Average Resale Flat Price Across Towns, 2016, 2017 & 2018", fontsize = 15, fontweight = 'bold', color = 'black')
    ax.set_xlabel('Average Resale Price (S$)', fontsize = 15, fontweight = 'bold')
    ax.set_ylabel('Towns', fontsize = 15, fontweight = 'bold')
    
    ### adjust both axes ticks values
    ax.tick_params(axis = "y", labelsize = 12, length = 10, width = 2.0, labelcolor = 'black', colors = 'red')
    ax.tick_params(axis = "x", labelsize = 12, length = 10, width = 2.0, labelcolor = 'black', colors = 'red', rotation = 45)
    ax.set_yticks(ind+width)
    ax.set_yticklabels(unique_town_name)
    
    ### removing top and right borders
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    
    ### add minorticks and gridlines
    ax.minorticks_on()
    ax.grid(axis = 'both', color = 'blue', which = 'major', linestyle = '--', linewidth = 0.8, alpha = 0.5)
    
    ### add legend
    ax.legend(ncol = 3,  facecolor = 'white', edgecolor = 'red', shadow = True, fontsize = 10)
    
    fig.savefig('barchart.png')
    
    plt.show()

#### Call the function to create and display the multiple grouped bar chart

In [None]:
### call the function to display the bar chart
bar_chart(new_avg_resale_price_2016, new_avg_resale_price_2017, new_avg_resale_price_2018, unique_town_name)

#### Simple Text-Based Analysis using Numpy

In [None]:
combined_values = np.array([new_avg_resale_price_2016, new_avg_resale_price_2017, new_avg_resale_price_2018])
combined_data = np.array([year_2016, year_2017, year_2018])
years = np.array(['2016', '2017', '2018'])

In [None]:
print("****** HDB Flat Resale Price Across Towns *******")
print()

### display the total number of rows of data from csv file 
num_of_rows = len(data)
print("There is a total of " + str(num_of_rows) + " rows in this dataset from " + filename)
print()

### display the total number of rows of data for year 2016, 2017 and 2018
print("Year                                 |     2016     |     2017     |     2018     |")
print("------------------------------------------------------------------------------------------")
print("Total Number of Data Extracted:         " + str(len(combined_data[0])) + "            " + str(len(combined_data[1])) + 
        "         " + str(len(combined_data[2])))
print()

### display the total number of unique town name
print("There is a total of " + str(len(unique_town_name)) + " unique towns name in this dataset.")
print()

### display the total average resale price for all unique towns over the last three years
print("Total Average Flat Resale Price For All " + str(len(unique_town_name)) + " unique towns over the last three years: ")
print("--------------------------------------------------------------------------------------------------------")
print("Town Name                            |     2016(S$)   |     2017(S$)   |     2018(S$)    |")
for i in range(len(unique_town_name)):
    print(unique_town_name[i] + " :" + "\t\t\t\t {:.2f}\t {:.2f}\t  {:.2f}".format(new_avg_resale_price_2016[i], 
                                                                        new_avg_resale_price_2017[i],
                                                                       new_avg_resale_price_2018[i]))
print()


### display the highest average resale price for year 2016, 2017 and 2018
print("Towns with the Top Highest Average Flat Resale Price Over The Last Three Years:")
print("--------------------------------------------------------------------------------------")
for i in (np.arange(3)):
    max_values = np.max(combined_values[i])
    max_values_index = np.argmax(combined_values[i])
    print(unique_town_name[max_values_index] + " is the top highest average flat resale price with S${:.2f}".format(max_values) 
          + " in year " + years[i])
print()

### display the lowest average resale price for year 2016, 2017 and 2018
print("Towns with the Lowest Average Flat Resale Price Over The Last Three Years: ")
print("--------------------------------------------------------------------------------------")
for i in (np.arange(3)):
    min_values = np.min(combined_values[i])
    min_values_index = np.argmin(combined_values[i])
    print(unique_town_name[min_values_index] + " is the lowest average flat resale price with S${:.2f}".format(min_values)
         + " in year " + years[i])
print()

### display the standard deviation, variation, range, 25th, 50th, 75th percentile for year 2016, 2017 and 2018
print("Basic Descriptive Statistics: ")
print("--------------------------------------------")
for i in (np.arange(3)):
    standard_deviation = np.std(combined_values[i], axis = 0) # compute standard deviation along the row axis 
    variation = np.var(combined_values[i],axis = 0) # compute the variation along the row axis 
    median = np.median(combined_values[i], axis = 0) # comptue the median along the row axis
    percentile_25 = np.percentile(combined_values[i], 25, axis = 0) # compute the 25th percentile of the data along the row axis
    percentile_75 = np.percentile(combined_values[i], 75, axis = 0) # compute the 75th percentile of the data along the row axis 
    
    ### display all the values
    print("For year " + str(years[i]) + " : ")
    print("-------------------")
    print("Standard Deviation\t: S$ {:.3f}".format(standard_deviation))
    print("Variation        \t: S$ {:.3f}".format(variation))
    print("Median           \t: S$ {:.3f}".format(median))
    print("25th Percentile  \t: S$ {:.3f}".format(percentile_25))
    print("75th Percentile  \t: S$ {:.3f}".format(percentile_75))
    print()
    