# Data Feature Engineering and One Hot Encoding


## 1. Import Required Libraries
Import pandas for data manipulation.

In [6]:
import pandas as pd

## 2. Load Data
Load the Data_Altered dataset into a pandas DataFrame.

In [None]:
df = pd.read_csv('Data.csv')
df.head()

Unnamed: 0,Price,Square Area,Bedrooms,Bathrooms,Furnishing,school,hospital,shopping_mall,supermarket,church,...,Neighborhood_Libis,Neighborhood_Loyola Heights,Neighborhood_New Manila,Neighborhood_North Avenue Area,Neighborhood_Novaliches,Neighborhood_Santa Mesa,Neighborhood_Santa Mesa Heights,Neighborhood_Santolan,Neighborhood_Timog and South Triangle,occupancy
0,12000,25.0,0,1,Semi Furnished,1,1,1,1,1,...,0,1,0,0,0,0,0,0,0,0
1,58000,61.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,0,0,0,0,0
2,14000,25.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,1,0,0,0,0,1
3,22000,28.0,0,1,Fully Furnished,1,1,1,1,1,...,0,1,0,0,0,0,0,0,0,0
4,18000,30.0,2,1,Unfurnished,1,1,1,1,1,...,0,0,0,0,1,0,0,0,0,0


## 3. One-Hot Encode Furnishing Column
Convert the 'Furnishing' column into three separate columns: 'Fully Furnished', 'Semi Furnished', and 'Unfurnished', with 0 for False and 1 for True.

In [8]:
df['Fully Furnished'] = (df['Furnishing'] == 'Fully Furnished').astype(int)
df['Semi Furnished'] = (df['Furnishing'] == 'Semi Furnished').astype(int)
df['Unfurnished'] = (df['Furnishing'] == 'Unfurnished').astype(int)
df.head()

Unnamed: 0,Price,Square Area,Bedrooms,Bathrooms,Furnishing,school,hospital,shopping_mall,supermarket,church,...,Neighborhood_North Avenue Area,Neighborhood_Novaliches,Neighborhood_Santa Mesa,Neighborhood_Santa Mesa Heights,Neighborhood_Santolan,Neighborhood_Timog and South Triangle,occupancy,Fully Furnished,Semi Furnished,Unfurnished
0,12000,25.0,0,1,Semi Furnished,1,1,1,1,1,...,0,0,0,0,0,0,0,0,1,0
1,58000,61.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,0,0,1,0,0
2,14000,25.0,1,1,Fully Furnished,1,1,1,1,1,...,0,1,0,0,0,0,1,1,0,0
3,22000,28.0,0,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,0,0,1,0,0
4,18000,30.0,2,1,Unfurnished,1,1,1,1,1,...,0,1,0,0,0,0,0,0,0,1


## 4. Calculate Price per Square Area
Create a new column 'price_per_sqm' by dividing the 'Price' column by the 'Square Area' column.

In [11]:
df['price_per_sqm'] = df['Price'] / df['Square Area']
df.head()

Unnamed: 0,Price,Square Area,Bedrooms,Bathrooms,Furnishing,school,hospital,shopping_mall,supermarket,church,...,Neighborhood_Novaliches,Neighborhood_Santa Mesa,Neighborhood_Santa Mesa Heights,Neighborhood_Santolan,Neighborhood_Timog and South Triangle,occupancy,Fully Furnished,Semi Furnished,Unfurnished,price_per_sqm
0,12000,25.0,0,1,Semi Furnished,1,1,1,1,1,...,0,0,0,0,0,0,0,1,0,480.0
1,58000,61.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,0,1,0,0,950.819672
2,14000,25.0,1,1,Fully Furnished,1,1,1,1,1,...,1,0,0,0,0,1,1,0,0,560.0
3,22000,28.0,0,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,0,1,0,0,785.714286
4,18000,30.0,2,1,Unfurnished,1,1,1,1,1,...,1,0,0,0,0,0,0,0,1,600.0


## 5. Create Price per Square Meter x Bedrooms Column
Create a new column 'price_per_sqm_x_bedrooms' by multiplying 'price_per_sqm' by the number of bedrooms.

In [12]:
df['price_per_sqm_x_bedrooms'] = df['price_per_sqm'] * df['Bedrooms']
df.head()

Unnamed: 0,Price,Square Area,Bedrooms,Bathrooms,Furnishing,school,hospital,shopping_mall,supermarket,church,...,Neighborhood_Santa Mesa,Neighborhood_Santa Mesa Heights,Neighborhood_Santolan,Neighborhood_Timog and South Triangle,occupancy,Fully Furnished,Semi Furnished,Unfurnished,price_per_sqm,price_per_sqm_x_bedrooms
0,12000,25.0,0,1,Semi Furnished,1,1,1,1,1,...,0,0,0,0,0,0,1,0,480.0,0.0
1,58000,61.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,1,0,0,950.819672,950.819672
2,14000,25.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,1,1,0,0,560.0,560.0
3,22000,28.0,0,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,0,1,0,0,785.714286,0.0
4,18000,30.0,2,1,Unfurnished,1,1,1,1,1,...,0,0,0,0,0,0,0,1,600.0,1200.0


## 6. Create Price per Square Meter x Bathrooms Column
Create a new column 'price_per_sqm_x_bathrooms' by multiplying 'price_per_sqm' by the number of bathrooms.

In [13]:
df['price_per_sqm_x_bathrooms'] = df['price_per_sqm'] * df['Bathrooms']
df.head()

Unnamed: 0,Price,Square Area,Bedrooms,Bathrooms,Furnishing,school,hospital,shopping_mall,supermarket,church,...,Neighborhood_Santa Mesa Heights,Neighborhood_Santolan,Neighborhood_Timog and South Triangle,occupancy,Fully Furnished,Semi Furnished,Unfurnished,price_per_sqm,price_per_sqm_x_bedrooms,price_per_sqm_x_bathrooms
0,12000,25.0,0,1,Semi Furnished,1,1,1,1,1,...,0,0,0,0,0,1,0,480.0,0.0,480.0
1,58000,61.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,1,0,0,950.819672,950.819672,950.819672
2,14000,25.0,1,1,Fully Furnished,1,1,1,1,1,...,0,0,0,1,1,0,0,560.0,560.0,560.0
3,22000,28.0,0,1,Fully Furnished,1,1,1,1,1,...,0,0,0,0,1,0,0,785.714286,0.0,785.714286
4,18000,30.0,2,1,Unfurnished,1,1,1,1,1,...,0,0,0,0,0,0,1,600.0,1200.0,600.0


## 7. Save Transformed Data to CSV
Export the transformed DataFrame to a new CSV file.

In [None]:
df.to_csv('Data_Cleaned.csv', index=False)
print('Transformed data saved to Data_Cleaned.csv')

Transformed data saved to Data_Altered_transformed.csv
