
# 🏡 EDA: Boston Housing Dataset

## Deskripsi
Notebook ini menampilkan proses **Exploratory Data Analysis (EDA)** pada dataset Boston Housing. Dataset ini sering digunakan untuk studi kasus **regresi** dalam Machine Learning.

### Tujuan:
Memprediksi harga rumah (MEDV) berdasarkan fitur-fitur lingkungan dan properti.


In [None]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')


In [None]:

boston = load_boston()
df = pd.DataFrame(boston.data, columns=boston.feature_names)
df['MEDV'] = boston.target

df.head()


In [None]:

df.info()


In [None]:

df.describe()


In [None]:

df.isnull().sum()


In [None]:

plt.figure(figsize=(12,10))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Korelasi antar fitur")
plt.show()


In [None]:

sns.histplot(df['MEDV'], bins=30, kde=True)
plt.title("Distribusi Harga Rumah (MEDV)")
plt.xlabel("MEDV")
plt.ylabel("Frekuensi")
plt.show()


In [None]:

sns.scatterplot(x=df['RM'], y=df['MEDV'])
plt.title("Jumlah Kamar (RM) vs Harga Rumah (MEDV)")
plt.xlabel("RM")
plt.ylabel("MEDV")
plt.show()

sns.scatterplot(x=df['LSTAT'], y=df['MEDV'])
plt.title("Persentase Populasi Rendah (LSTAT) vs Harga Rumah (MEDV)")
plt.xlabel("LSTAT")
plt.ylabel("MEDV")
plt.show()


In [None]:

scaler = StandardScaler()
df[['CRIM_scaled', 'AGE_scaled']] = scaler.fit_transform(df[['CRIM', 'AGE']])

# Fitur turunan
df['Room_per_age'] = df['RM'] / df['AGE']

df[['CRIM_scaled', 'AGE_scaled', 'Room_per_age']].head()
