# Exploratory Data Analysis & Intro

### Goal
1. Investigate Illnois housing by counties 
2. Use python to explore a live dataset from FRED
3. Discover insights to better understand the housing market in Illinois counties

### Final Deliverables:
* Create Jupyter Notebooks (showingcasing core skills in Python)
* Create a summary page (via README.md) capturing the findings
* Share project via GitHub & LinkedIn.

### Questions to Answer
1. How does affordability vary across Illinois counties?
2. How is Illinois housing inventory listings trending over time?
3. What are the population trends in Illinois counties?

## Exploratory Data Analysis for 33 Illinois Counties
### Home Listing Prices to Explore

In [None]:
# Importing Libraries
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import fredapi as fa
fred = fa.Fred(api_key=api_key)

# Loading Data
counties_data = pd.read_csv(r'/csv_files/Counties.csv')

# Extracting Data and Cleanup

dataframes = []
# Loop through each county and fetch data from FRED
for index, row in counties_data.iterrows():
    county_name = row['county_name']
    series_id = row['series_id']

    try:
        data = fred.get_series(series_id)
        data = data.to_frame(name='median_price')
        data['median_price'] = data['median_price'].astype(int) #converts median listing price to int
        data['county_name'] = county_name
        data['year'] = data.index.year
        mask = data['year'].apply(lambda x: x < 2023) #filter data by year to match other datasets
        filtered_data = data[mask]
        dataframes.append(filtered_data)
    except Exception as e:
        print(f"Error retrieving data for {county_name}: {e}")

# Combine DataFrames
combined_df = pd.concat(dataframes)
combined_df = combined_df.reset_index()
combined_df.rename(columns={'index': 'Date'}, inplace=True) #readjusts column after resetting


print(combined_df)

In [None]:
price_mean = combined_df.groupby('county_name')['median_price'].mean().sort_values(ascending=False).reset_index().head(10)

sns.barplot(x='median_price', y='county_name', data=price_mean, orient='h', palette='viridis')

plt.title('Average Median House Listing Price by County')
plt.ylabel('')
plt.xlabel('Median House Listing Price')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f'{int(x / 1000)}K'))
plt.show()

![Average Median House Listing Price by County](images\Avg-Median-Listing-Price-Counties.png)


### Household Income to Explore


In [None]:
#group by county and take mean 
household_income_mean = combined_income_df.groupby('county_name')['household_income'].mean().sort_values(ascending=False).reset_index().head(10)

sns.barplot(x='household_income', y='county_name', data=household_income_mean, orient='h', palette='viridis')

plt.title('Average Median Household Income by County')
plt.ylabel('')
plt.xlabel('Median Household Income')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f'{int(x / 1000)}K'))
plt.show()


![Average Median Household Income by County](images\Avg-Median-Household-Income-Counties.png)

### Explore Percentage of New Listings

In [None]:
#group by county and take mean 
new_listings_mean = combined_listings_df.groupby('county_name')['new_listings'].mean().sort_values(ascending=False).reset_index().head(10)

sns.barplot(x='new_listings', y='county_name', data=new_listings_mean, orient='h', palette='dark:b_r')

plt.title('Average Percentage of New Listings by County')
plt.ylabel('')
plt.xlabel('Percentage of New Listings')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f'{x}%'))
plt.show()

![Average Percentage of New Listings](images\Avg-Pct-New-Listings-Counties.png)

### Populations to Explore

In [None]:
#group by county and take mean 
population_mean = combined_pop_df.groupby('county_name')['population'].mean().sort_values(ascending=False).reset_index().head(10)

sns.barplot(x='population', y='county_name', data=population_mean, orient='h', palette='dark:b_r')

plt.title('Average Population by County')
plt.ylabel('')
plt.xlabel('Population')
plt.grid(axis='x', linestyle='--', alpha=0.7)
plt.gca().xaxis.set_major_formatter(FuncFormatter(lambda x, pos: f'{int(x / 1000)}M'))
plt.show()

![Average Population by County](images\Avg-Population-Counties.png)