# Exploratory Data Analysis on Boston Housing Dataset

In this notebook, we will perform exploratory data analysis (EDA) on the Boston Housing dataset. We will visualize relationships between features and the target variable, as well as examine the distributions of the features.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set the style of seaborn
sns.set(style='whitegrid')

In [2]:
# Load the dataset
data = pd.read_csv('../data/boston_housing.csv')

# Display the first few rows of the dataset
data.head()

In [3]:
# Check for missing values
data.isnull().sum()

In [4]:
# Summary statistics
data.describe()

In [5]:
# Visualize the distribution of the target variable (MEDV)
plt.figure(figsize=(10, 6))
sns.histplot(data['MEDV'], bins=30, kde=True)
plt.title('Distribution of Median Value of Homes (MEDV)')
plt.xlabel('MEDV')
plt.ylabel('Frequency')
plt.show()

In [6]:
# Correlation heatmap
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', square=True)
plt.title('Correlation Heatmap')
plt.show()

In [7]:
# Pairplot to visualize relationships between features
sns.pairplot(data, diag_kind='kde')
plt.show()

## Conclusion

In this exploratory data analysis, we have visualized the distribution of the target variable and examined the relationships between features. The correlation heatmap provides insights into which features are most strongly correlated with the median value of homes (MEDV). Further analysis and feature engineering can be performed based on these insights.