# Building Choropleth Maps with GeoPandas
*Mapping Quantities with Color Using Spatial Joins and Data Aggregation*


So far, we've worked a lot with spatial joins—but we haven't yet created one of the most sought-after visualizations in geospatial analysis: the **choropleth**.

A **choropleth** is a type of map where numerical values are represented by varying color shades. Lighter shades represent smaller values; darker shades represent larger ones. They're useful for visualizing things like:
- Population
- Crime rates
- Number of power plants
- Yelp scores
- Goat ownership (seriously)

Let's learn how to build one.


## Step 1: Get the Joined Data
We've already performed a spatial join where each power plant has been associated with the state it's located in.

In [None]:
# Assume 'plants' and 'states' are preloaded GeoDataFrames
joined = gpd.sjoin(plants, states, how='inner', op='within')


## Step 2: Count Power Plants Per State
We now want to count how many power plants each state contains. This is a classic `.value_counts()` task.


In [None]:
plant_counts = joined['name'].value_counts()
plant_counts.head()


At this point, we have what we need to build a choropleth—almost. We have state names and their associated power plant counts, but we’re missing the **geometry** to draw them.

That means we can’t map yet. So what’s the fix?



You might think, "Let me switch the spatial join and make states the first argument." That way, each row will include **state geometry**, which is what we want to plot.


In [None]:
# Get state geometry instead of plant points
states_with_plants = gpd.sjoin(states, plants, how='inner', op='contains')


Looks good—each row now contains a **state polygon** and plant data.

BUT there's a catch: if we inspect the shape of the resulting GeoDataFrame...


In [None]:
print(states_with_plants.shape)


Uh-oh. We now have thousands of rows—one for **each power plant**, each including the same repeated state geometry.

We don’t want thousands of polygons. We want **one polygon per state**.

Time for a better approach.



## Step 3: Return to the Clean States GeoDataFrame
Remember that beautiful, clean `states` GeoDataFrame we started with? It has one shape per state. That’s what we want.

Let’s go back to that and find a way to merge in the plant counts.


In [None]:
states = states.set_index('name')


Now that state names are our index, and `plant_counts` is also indexed by state name, we can **assign the values directly**.


In [None]:
states['power_plant_count'] = plant_counts

## Step 4: Plot the Choropleth

In [None]:
states.plot(column='power_plant_count', legend=True, figsize=(20, 20))


## Bonus: Add Another Variable — Total Megawatts
Now that we know how to summarize, let’s try something more numerical: summing total megawatts by state.


In [None]:
megawatts_by_state = joined.groupby('name')['megawatts'].sum()
states['megawatts'] = megawatts_by_state

In [None]:
states.plot(column='megawatts', cmap='OrRd', legend=True, figsize=(20, 20))


## 🧠 Deep Dive: How Did That Assignment Work?
You might be wondering how GeoPandas knew where to put the counts and megawatts.

**Answer: the index.**

When you assign a Series to a DataFrame, Pandas matches rows by index—NOT by row order.

We set the index of `states` to state names. And our summaries are also indexed by state names. So assignment works smoothly:
```python
states['some_column'] = some_series
```
...works **as long as the indexes match**.


## Step 5: Reset Index (Optional)
If you're exporting to CSV or want to access the state name as a normal column, you may want to reset the index.

In [None]:
states.reset_index(inplace=True)


## ✅ Recap

- Spatial joins can link points (e.g., power plants) to areas (e.g., states).
- Aggregating values per region (e.g., `value_counts`, `groupby().sum()`) gives you something to color.
- Use the clean geometry dataset (one shape per area) for mapping.
- Always match indexes before assigning columns.

🗺️ You now know how to build choropleths from raw point data in GeoPandas!
