# **Philanthropy Gala Planning**

## Initial Data Exploring, Filtering, Analysis, and Chart Creation

This notebook processes IRS ZIP-code-level tax data to identify the most generous neighborhoods in the US during 2022, then visualizes results.

## Step 1: Data Processing & Filtering

### 1.1 Imports & Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

# Ignore warnings
warnings.filterwarnings('ignore')

# Load dataset
# We use 'noagi' since it contains pre-calculated totals for
# every ZIP code, making analysis faster on a national level.
df = pd.read_csv('data/original/22zpallnoagi.csv')
df.head()

### 1.2 Data Sanitization

ZIP codes have a chance to be read as numbers, deleting leading zeros (ex. 08701 &rarr; 8701). We fix this and filter dataset to neighborhood-level summary rows, excluding state and other totals.

In [None]:
# Ensure ZIP codes are valid 5-digit strings
df['ZIPCODE'] = df['ZIPCODE'].astype(str).str.zfill(5)

# agi_stub 0 = summary row for entire ZIP code.
# Remove '00000' (state totals) and '99999' (other) to focus on neighborhoods.
neighborhoods = df[(df['agi_stub'] == 0) & (~df['ZIPCODE'].isin(['00000', '99999']))].copy()
print(f"Neighborhood-level rows: {len(neighborhoods):,}")

### 1.3 Calculate Proprietary Metrics

- **Generosity Index (GI):** Charity donations (A19700) as a share of total income (A00100)
    - We consider this the neighborhood's "sacrifice ratio."
- **Participation Rate (PR):** Count of donors (N19700) vs. total households (N1)
    - This tells us how ingrained giving is in the community.

In [None]:
neighborhoods['generosity_index'] = neighborhoods['A19700'] / neighborhoods['A00100']
neighborhoods['participation_rate'] = neighborhoods['N19700'] / neighborhoods['N1']
neighborhoods[['ZIPCODE', 'STATE', 'generosity_index', 'participation_rate']].head()

### 1.4 Reliability Filter & Export

We ignore ZIP codes with fewer than 500 households so our strategy is based on community behavior, not outliers. The final list is sorted by Generosity Index and exported.

In [None]:
filtered_targets = neighborhoods[neighborhoods['N1'] >= 500].copy()
final_list = filtered_targets.sort_values(by='generosity_index', ascending=False)

final_list.to_csv('data/updated_gala_list.csv', index=False)
print(f"Dataset filtered and created â€” {len(final_list):,} ZIP codes exported to 'data/updated_gala_list.csv'.")
final_list.head(10)

## Step 2: Visualization & Insight

### 2.1 Chart Setup

In [None]:
sns.set_theme(style="whitegrid")

### 2.2 The "Whale" Chart | Top 10 Bar Chart

Shows generousity leaders, the absolute best ZIP codes to target for our gala.

In [None]:
top_10 = final_list.head(10).copy()
top_10['Label'] = top_10['STATE'] + " " + top_10['ZIPCODE'].astype(str)

plt.figure(figsize=(10, 6))
sns.barplot(data=top_10, x='generosity_index', y='Label', palette='viridis')
plt.title('Top 10 Generous Donors in the US', fontsize=14)
plt.xlabel('Generosity Index (Donations as % of Income)', fontsize=12)
plt.tight_layout()
plt.savefig('images/generosity_rankings.png')
print("Visual saved: 'images/generosity_rankings.png'.")
plt.show()

### 2.3 The "Hidden Gems" | Market Map Scatter Plot

Finds "Hidden Gem" neighborhoods with high generosity but moderate income. We focus on areas earning < $500k to see cluster clearly.

In [None]:
final_list['avg_income_k'] = final_list['A00100'] / final_list['N1']

plt.figure(figsize=(10, 6))
sns.scatterplot(data=final_list[final_list['avg_income_k'] < 500], x='avg_income_k', y='generosity_index', alpha=0.4, color='teal')
plt.title('"Hidden Gem" Donors', fontsize=14)
plt.xlabel('Average Household Income ($1,000s)', fontsize=12)
plt.ylabel('Generosity Index', fontsize=12)
plt.tight_layout()
plt.savefig('images/hidden_gems_map.png')
print("Visual saved: 'images/hidden_gems_map.png'.")
plt.show()