# Feature Engineering for Dublin Rent Tracker

In this notebook, we focus on transforming and engineering new features from the cleaned Daft.ie rental listings dataset. The goal is to enhance the predictive power of our dataset for downstream modeling tasks such as rent price prediction and neighborhood clustering.

We will:
- Load the post-EDA cleaned dataset
- Identify useful transformations
- Create new features based on domain knowledge
- Handle categorical variables for modeling
- Save the final feature-engineered dataset for modeling


### Load Cleaned Dataset and Set Up

We begin by importing libraries and loading the cleaned dataset (`daft_listings_post_eda.xlsx`) to prepare it for feature engineering. This helps ensure all column types are correctly set before transformation begins.


In [61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load cleaned post-EDA dataset
file_path = "../01_data/cleaned/daft_listings_post_eda.xlsx"
df = pd.read_excel(file_path)

# Preview
df.head()


Unnamed: 0,id,address,bedrooms,bathrooms,monthly_price,ber,category,daft_link,agent_name,has_video,has_virtual_tour,total_images,dublin_subcode,price_bin,district,ber_encoded
0,3916914,"Quayside Quarter, Dublin Landings, North Wall ...",1.0,1.0,2615,A3,Rent,http://www.daft.ie/for-rent/apartment-1-bedroo...,Cian Hynes,False,True,12,1,2600-2900,North,3.0
1,3916952,"Quayside Quarter, Dublin Landings, North Wall ...",2.0,2.0,3355,A3,Rent,http://www.daft.ie/for-rent/apartment-2-bedroo...,Cian Hynes,False,True,17,1,3200-3500,North,3.0
2,3917034,"Quayside Quarter, Dublin Landings, North Wall ...",3.0,3.0,4805,A3,Rent,http://www.daft.ie/for-rent/apartment-e2-602-3...,Cian Hynes,False,False,18,1,4700-5000,North,3.0
3,5705486,"Niche Living, Serviced Studio Apartments, Arde...",1.0,,1990,A3,Rent,http://www.daft.ie/for-rent/studio-apartment-s...,Niche Living enquiries Rathmines,False,False,22,6,1700-2000,South,3.0
4,5987932,"O'Neill Court , Main Street, Belmayne, Dublin 13",1.0,1.0,1950,A1,Rent,http://www.daft.ie/for-rent/apartment-1-bed-on...,O'Neill Court,False,True,8,13,1700-2000,North,1.0


### Feature Engineering — Price per Bedroom and Studio Flag

We create:
- `price_per_bedroom`: Total rent normalized by number of bedrooms (to understand pricing efficiency).
- `is_studio`: Boolean flag for listings with exactly 1 bedroom and 1 bathroom.



In [62]:
# Monthly price per bedroom
df['price_per_bedroom'] = df['monthly_price'] / df['bedrooms'].replace(0, np.nan)

# Studio flag = 1 bed and 1 bath
df['is_studio'] = ((df['bedrooms'] == 1) & (df['bathrooms'] == 1)).astype(int)

# Check result
df[['monthly_price', 'bedrooms', 'bathrooms', 'price_per_bedroom', 'is_studio']].head()


Unnamed: 0,monthly_price,bedrooms,bathrooms,price_per_bedroom,is_studio
0,2615,1.0,1.0,2615.0,1
1,3355,2.0,2.0,1677.5,0
2,4805,3.0,3.0,1601.666667,0
3,1990,1.0,,1990.0,0
4,1950,1.0,1.0,1950.0,1


In [63]:
df.head()

Unnamed: 0,id,address,bedrooms,bathrooms,monthly_price,ber,category,daft_link,agent_name,has_video,has_virtual_tour,total_images,dublin_subcode,price_bin,district,ber_encoded,price_per_bedroom,is_studio
0,3916914,"Quayside Quarter, Dublin Landings, North Wall ...",1.0,1.0,2615,A3,Rent,http://www.daft.ie/for-rent/apartment-1-bedroo...,Cian Hynes,False,True,12,1,2600-2900,North,3.0,2615.0,1
1,3916952,"Quayside Quarter, Dublin Landings, North Wall ...",2.0,2.0,3355,A3,Rent,http://www.daft.ie/for-rent/apartment-2-bedroo...,Cian Hynes,False,True,17,1,3200-3500,North,3.0,1677.5,0
2,3917034,"Quayside Quarter, Dublin Landings, North Wall ...",3.0,3.0,4805,A3,Rent,http://www.daft.ie/for-rent/apartment-e2-602-3...,Cian Hynes,False,False,18,1,4700-5000,North,3.0,1601.666667,0
3,5705486,"Niche Living, Serviced Studio Apartments, Arde...",1.0,,1990,A3,Rent,http://www.daft.ie/for-rent/studio-apartment-s...,Niche Living enquiries Rathmines,False,False,22,6,1700-2000,South,3.0,1990.0,0
4,5987932,"O'Neill Court , Main Street, Belmayne, Dublin 13",1.0,1.0,1950,A1,Rent,http://www.daft.ie/for-rent/apartment-1-bed-on...,O'Neill Court,False,True,8,13,1700-2000,North,1.0,1950.0,1


### Encode BER Ratings

We convert BER energy efficiency labels into ordinal values from 1 (A1) to 15 (G), preserving their ordering for predictive modeling. Missing values are left as NaN.


In [64]:
ber_order = {
    'A1': 1, 'A2': 2, 'A3': 3,
    'B1': 4, 'B2': 5, 'B3': 6,
    'C1': 7, 'C2': 8, 'C3': 9,
    'D1': 10, 'D2': 11,
    'E1': 12, 'E2': 13,
    'F': 14, 'G': 15
}
df['ber_encoded'] = df['ber'].map(ber_order)


### Encode Rental Category

Although most listings are for rent, we include a one-hot encoding of the `category` column to keep modeling logic general.


In [65]:
df = pd.get_dummies(df, columns=['category'], prefix='cat')


### Encode District as One-Hot

We convert the `district` column into multiple binary columns using one-hot encoding. This prevents the model from interpreting geographic areas as numeric.


In [66]:
df = pd.get_dummies(df, columns=['district'], prefix='dist')



In [67]:
df.head()

Unnamed: 0,id,address,bedrooms,bathrooms,monthly_price,ber,daft_link,agent_name,has_video,has_virtual_tour,total_images,dublin_subcode,price_bin,ber_encoded,price_per_bedroom,is_studio,cat_Rent,dist_North,dist_South,dist_Unknown
0,3916914,"Quayside Quarter, Dublin Landings, North Wall ...",1.0,1.0,2615,A3,http://www.daft.ie/for-rent/apartment-1-bedroo...,Cian Hynes,False,True,12,1,2600-2900,3.0,2615.0,1,True,True,False,False
1,3916952,"Quayside Quarter, Dublin Landings, North Wall ...",2.0,2.0,3355,A3,http://www.daft.ie/for-rent/apartment-2-bedroo...,Cian Hynes,False,True,17,1,3200-3500,3.0,1677.5,0,True,True,False,False
2,3917034,"Quayside Quarter, Dublin Landings, North Wall ...",3.0,3.0,4805,A3,http://www.daft.ie/for-rent/apartment-e2-602-3...,Cian Hynes,False,False,18,1,4700-5000,3.0,1601.666667,0,True,True,False,False
3,5705486,"Niche Living, Serviced Studio Apartments, Arde...",1.0,,1990,A3,http://www.daft.ie/for-rent/studio-apartment-s...,Niche Living enquiries Rathmines,False,False,22,6,1700-2000,3.0,1990.0,0,True,False,True,False
4,5987932,"O'Neill Court , Main Street, Belmayne, Dublin 13",1.0,1.0,1950,A1,http://www.daft.ie/for-rent/apartment-1-bed-on...,O'Neill Court,False,True,8,13,1700-2000,1.0,1950.0,1,True,True,False,False


### Save Feature-Engineered Dataset

We now save the final DataFrame with all engineered features for use in modeling and visualization.


In [68]:
df.drop(columns=[
    'id', 'address', 'ber', 'daft_link', 'agent_name', 'price_bin'
], inplace=True)


In [69]:
df['dublin_subcode'] = df['dublin_subcode'].astype('category')


In [70]:
# Save to your cleaned data folder
output_path = "../01_data/cleaned/daft_listings_featured.xlsx"
df.to_excel(output_path, index=False)

print(f"Feature-engineered dataset saved to: {output_path}")


Feature-engineered dataset saved to: ../01_data/cleaned/daft_listings_featured.xlsx
