# 🏠 House Data Analysis
This notebook provides an initial analysis and visualization of housing data.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(style="whitegrid")

# Load data
df = pd.read_csv('../data/raw/house_data.csv')
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'house_data.csv'

In [None]:
# Dataset overview
print("Dataset Information:")
print(f"Number of rows: {len(df)}")
print(f"Number of columns: {df.shape[1]}")
print("\nColumn names:", df.columns.tolist())

print("\nMissing values per column:")
print(df.isnull().sum())

print("\nBasic statistics:")
print(df.describe())

In [None]:
# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# 1. Price distribution
sns.histplot(df['price'], bins=20, kde=True, ax=axes[0, 0])
axes[0, 0].set_title('Distribution of House Prices')

# 2. Price vs. Square Footage
sns.scatterplot(data=df, x='sqft', y='price', hue='bedrooms', ax=axes[0, 1])
axes[0, 1].set_title('Price vs. Square Footage')

# 3. Average price by location
avg_price_location = df.groupby('location')['price'].mean().sort_values()
sns.barplot(x=avg_price_location.values, y=avg_price_location.index, ax=axes[1, 0])
axes[1, 0].set_title('Average Price by Location')

# 4. Boxplot: Price by Condition
sns.boxplot(data=df, x='condition', y='price', ax=axes[1, 1])
axes[1, 1].set_title('Price by House Condition')

plt.tight_layout()
plt.show()

### ✅ Summary:
- Waterfront and Downtown properties are the most expensive.
- Larger square footage and better condition typically command higher prices.

### 👥 Who creates notebooks like this?
- **Data Analysts**: Focus on insights, reports, and visualization.
- **Data Scientists**: Extend this work with modeling, automation, and experimentation.

> In many orgs, both may do this kind of EDA. It’s a key skill in both roles.