## Processing rental rates data from [HDB's website](https://www.hdb.gov.sg/cs/infoweb/residential/renting-a-flat/renting-from-hdb/parenthood-provisional-housing-schemepphs/rents-and-deposits)

This notebook processes the rental rates data so that I can merge it with the main dataset.

In [1]:
import pandas as pd
rental_df = pd.read_csv('raw data/rental_rates.csv')
rental_df

Unnamed: 0,town,2room,3room,4room
0,Ang Mo Kio,$500,$800,-
1,Balestier,-,$700,-
2,Bishan,-,$800,-
3,Bedok,$500,$700,-
4,Bukit Batok,$500,$700,-
5,Bukit Merah/ Lengkok Bahru,-,$700,-
6,Bukit Ho Swee/ Bukit Purmei/ Telok Blangah/ Ti...,-,$900,"$1,500"
7,Petir Road,-,$600,-
8,Choa Chu Kang/ Teck Whye,$400,$600,-
9,Jalan Kukoh/ Jalan Berseh,$550,$800,-


In [2]:

# Strip all text
rental_df = rental_df.applymap(lambda x: x.replace("\xa0", "") if isinstance(x, str) else x)
rental_df = rental_df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

# Replace '-' and 'NaN' values with 0
rental_df.replace(['-', 'NaN', 'nan'], 0, inplace=True)

# Replace currency-formatted numbers with integers
def convert_currency(value) -> int:
    if isinstance(value, str):
        # Handle multiple prices by taking the highest value
        value = max(map(lambda x: int(x.replace('"', '').replace('$', '').replace(',', '').strip()), value.split('/')))
    else:
        value = 0
    return value

rental_df[['2room', '3room', '4room']] = rental_df[['2room', '3room', '4room']].applymap(convert_currency)

# Split rows with multiple towns into individual rows
rental_df = rental_df.assign(town=rental_df['town'].str.split('/')).explode('town').reset_index(drop=True)

# Strip all text
rental_df = rental_df.applymap(lambda x: x.replace("\xa0", "") if isinstance(x, str) else x)
rental_df = rental_df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

rental_df

Unnamed: 0,town,2room,3room,4room
0,Ang Mo Kio,500,800,0
1,Balestier,0,700,0
2,Bishan,0,800,0
3,Bedok,500,700,0
4,Bukit Batok,500,700,0
5,Bukit Merah,0,700,0
6,Lengkok Bahru,0,700,0
7,Bukit Ho Swee,0,900,1500
8,Bukit Purmei,0,900,1500
9,Telok Blangah,0,900,1500


## Exporting processed data

In [3]:
rental_df.to_csv('processed data/rental_rates.csv', index=False)

## Note
Some manual processing will be required. E.g. Woodlands Street 13's 4 room should be under Woodlands