# Feature Engineering for Boston Housing Dataset

In this notebook, we will apply various feature engineering techniques to the Boston Housing dataset to improve the performance of our machine learning models. We will explore the following steps:

1. Loading the dataset
2. Handling missing values
3. Creating new features
4. Encoding categorical variables
5. Scaling features

Let's get started!

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

# Load the dataset
data = pd.read_csv('../data/boston_housing.csv')
data.head()

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Handling missing values
imputer = SimpleImputer(strategy='mean')
data['column_name'] = imputer.fit_transform(data[['column_name']])  # Replace 'column_name' with actual column

# Create new features
data['new_feature'] = data['feature1'] / data['feature2']  # Example of creating a new feature


In [None]:
# Encoding categorical variables
categorical_features = ['categorical_column']  # Replace with actual categorical columns
one_hot_encoder = OneHotEncoder(handle_unknown='ignore')

data_encoded = one_hot_encoder.fit_transform(data[categorical_features]).toarray()


In [None]:
# Scaling features
scaler = StandardScaler()
numerical_features = ['numerical_column1', 'numerical_column2']  # Replace with actual numerical columns
data[numerical_features] = scaler.fit_transform(data[numerical_features])


## Conclusion

In this notebook, we have performed feature engineering on the Boston Housing dataset. We handled missing values, created new features, encoded categorical variables, and scaled numerical features. These steps are crucial for improving the performance of our machine learning models. Next, we can proceed to model training and evaluation.