# üè† Real Estate Data Analysis - Practice Problems

This notebook contains **12 practice problems** to help you learn data analysis using a real estate dataset.

**Dataset:** `real_estate_data.csv`

**Columns:**
- `property_id` - Unique identifier
- `address` - Street address
- `city`, `state` - Location
- `price` - Listing/Sale price
- `bedrooms`, `bathrooms` - Number of beds/baths
- `sqft` - Square footage
- `lot_size` - Lot size in sqft
- `year_built` - Year property was built
- `property_type` - Single Family, Condo, Townhouse
- `listing_status` - Active, Pending, Sold
- `days_on_market` - Days listed
- `agent_name` - Real estate agent
- `sale_date` - Date sold (if applicable)

In [None]:
# Setup - Run this cell first!
import pandas as pd
import numpy as np

# Load the dataset
df = pd.read_csv('real_estate_data.csv')

# Preview the data
df.head()

In [None]:
# Check the dataset info
df.info()

---
## Problem 1: Basic Statistics

**Question:** What is the average, minimum, and maximum price of all properties in the dataset?

In [None]:
# Your code here


---
## Problem 2: Filtering Data

**Question:** Find all properties in California (CA) that are priced above $800,000. Display the address, city, price, and bedrooms.

In [None]:
# Your code here


---
## Problem 3: Grouping and Aggregation

**Question:** Calculate the average property price for each state. Sort the results from highest to lowest average price.

In [None]:
# Your code here


---
## Problem 4: Value Counts

**Question:** How many properties are there for each listing status (Active, Pending, Sold)? Show both the count and percentage.

In [None]:
# Your code here


---
## Problem 5: Creating New Columns

**Question:** Create a new column called `price_per_sqft` that calculates the price per square foot for each property. Then find the top 5 properties with the highest price per sqft.

In [None]:
# Your code here


---
## Problem 6: Multiple Conditions Filter

**Question:** Find all Single Family homes with 4+ bedrooms that are priced under $500,000. Which states have these affordable family homes?

In [None]:
# Your code here


---
## Problem 7: Handling Missing Values

**Question:** How many properties have NOT been sold (have missing sale_date)? What percentage of total properties does this represent?

In [None]:
# Your code here


---
## Problem 8: Groupby with Multiple Aggregations

**Question:** For each property type, calculate:
- Count of properties
- Average price
- Average square footage
- Average days on market

In [None]:
# Your code here


---
## Problem 9: Conditional Column Creation

**Question:** Create a new column called `price_tier` that categorizes properties as:
- 'Budget' if price < $350,000
- 'Mid-Range' if price is between $350,000 and $600,000
- 'Luxury' if price > $600,000

Then count how many properties are in each tier.

In [None]:
# Your code here


---
## Problem 10: Top Performers Analysis

**Question:** Which real estate agent has sold the most properties? Also find the total sales value for the top 5 agents by number of sales.

In [None]:
# Your code here


---
## Problem 11: Correlation Analysis

**Question:** What is the correlation between:
1. Price and Square Footage
2. Price and Number of Bedrooms
3. Days on Market and Price

Which factor has the strongest correlation with price?

In [None]:
# Your code here


---
## Problem 12: Property Age Analysis

**Question:** Create a column called `property_age` (current year 2024 minus year_built). Then:
1. What is the average age of properties by property type?
2. Find properties built in the last 10 years (2014 or later) that are still Active. How many are there and what's their average price?

In [None]:
# Your code here


---
## üéâ Bonus Challenge

**Question:** Create a comprehensive market analysis by state that includes:
- Total number of properties
- Number of sold properties
- Average sale price
- Median days on market
- Most common property type

Sort by total number of properties descending.

In [None]:
# Your code here
