Lab 5: Metro Area Places
========================
We are going to  apply the skills
and techniques we have been developing
and apply them to a new data set.


- sorting data
- selecting columns
- filtering data
- creating new columns with vector operations
- creating new columns using the `apply()` method
- creating interactive maps with `explore()`
- plot points on a map
- formatting and customizing tooltip and popup options
- changing base tile maps
- using colormaps to display data

The data for this week includes all Census "places" in the New York
metro area. A "place" is a city, town, village, or other census
designated area.

In [1]:
# load the libraries
# these should be all that you need for the lab

!pip install mapclassify -q
import pandas as pd
import geopandas as gpd
from IPython.display import display, HTML, Markdown as md
import xyzservices.providers as xyz
import folium
import math
import matplotlib.pyplot as plt
from matplotlib import colors

Problem 1: load the libraries and data
======================================
Load the data from `url` into a `GeoDataFrame`.
Show a random sample of 10 rows.

In [2]:
# load the URL into a geopandas GeoDataFrame
url = "https://raw.githubusercontent.com/mcuringa/cartopy/refs/heads/main/notebooks/data/metro_places.geojson"
df = gpd.read_file(url)
df.sample(10)

Unnamed: 0,name,total_pop,population_under_18,median_inc,poverty,asian,black,indian,latino,mixed,other,pacific,white,state,county,geometry
46,Poplar Plains,1431,424,250001,36,52,0,0,145,34,20,0,1180,CT,Western Connecticut,POINT (-73.37711 41.17141)
187,Plandome,1365,439,250001,54,84,3,0,23,81,0,0,1174,NY,Nassau,POINT (-73.69973 40.80694)
175,Upper Montclair,12441,3540,234620,270,584,723,0,940,817,41,19,9317,NJ,Essex,POINT (-74.20064 40.84327)
12,Byram,5285,1318,92500,252,106,239,0,2139,60,62,0,2679,CT,Western Connecticut,POINT (-73.6528 41.0012)
108,Englewood Cliffs,5347,1154,213261,120,2290,97,0,511,254,0,5,2190,NJ,Bergen,POINT (-73.94672 40.88227)
9,Branchville,79,0,-666666666,0,0,0,0,0,0,0,0,79,CT,Western Connecticut,POINT (-73.44337 41.26784)
184,Valley Stream,40288,7819,122048,1664,6746,10249,0,9366,1623,1676,0,10628,NY,Nassau,POINT (-73.70435 40.66464)
473,Setauket,3739,952,218077,130,226,42,0,206,99,63,0,3103,NY,Suffolk,POINT (-73.11595 40.94841)
171,Singac,4253,619,96467,148,139,83,0,1350,0,0,0,2681,NJ,Passaic,POINT (-74.24297 40.8849)
13,Candlewood Isle,467,98,250001,0,0,0,0,33,0,0,0,434,CT,Western Connecticut,POINT (-73.45229 41.47771)


Problem 2: Sort, filter, select
===============================
Get a subset of the data that
only shows places in Nassau County.

Sort the data be median income (high to low),
then display a table with only the 
name of the place, the total population, and the
median income.

Also:

- give the columns nicer names
- format population with commas
- format median income as currency
- `-666666666` indicates a missing value for `median_inc`, filter out those rows

Display the first 10 rows and last 10 rows of the data.

In [3]:
nassau = df[(df.county == "Nassau") & (df.median_inc > 0)].copy()
nassau = nassau[["name", "total_pop", "median_inc"]].sort_values(by="median_inc", ascending=False)

def pop(x):
    return f"{x:,}"

def inc(x):
    return f"${x:,}"
nassau.total_pop = nassau.total_pop.apply(pop)
nassau.median_inc = nassau.median_inc.apply(inc)
nassau.columns = ["Place", "Population", "Median Income"]
display(md("## 10 wealthiest places in Nassau County"))
display(nassau.head(10))
display(md("## 10 poorest places in Nassau County"))
display(nassau.tail(10))

## 10 wealthiest places in Nassau County

Unnamed: 0,Place,Population,Median Income
187,Plandome,1365,"$250,001"
211,Roslyn Estates,1350,"$250,001"
205,Plandome Heights,938,"$250,001"
204,Old Westbury,4410,"$250,001"
223,Oyster Bay Cove,1942,"$250,001"
228,Munsey Park,2792,"$250,001"
218,Woodsburgh,887,"$250,001"
215,Sands Point,2702,"$250,001"
295,Laurel Hollow,2050,"$250,001"
301,Flower Hill,4787,"$250,001"


## 10 poorest places in Nassau County

Unnamed: 0,Place,Population,Median Income
546,University Gardens,4171,"$104,844"
500,Point Lookout,1110,"$104,386"
292,Cedarhurst,7307,"$102,561"
334,Bay Park,1389,"$100,972"
210,Roslyn,2928,"$97,073"
280,Great Neck Plaza,7443,"$97,022"
285,Island Park,4947,"$96,875"
306,Manorhaven,6929,"$95,493"
281,Hempstead,58557,"$80,350"
422,Inwood,11156,"$67,058"


Problem 3: Plot the places on a map
-----------------------------------
Plot all of the places (not just Nassau County) on an
interactive map. Use the `place` value for the tooltip.
Use `place`, `total_pop`, and `median_inc` for the popup.

In [4]:
df.explore(popup=["name", "total_pop", "median_inc"], tooltip="name", tooltip_kwds={"labels": False})


Problem 4: Colored points
=========================
In this problem you will separate the places
into 3 DataFrames, one for each state.

Plot them on the same map, using the following colors:

- New York: blue
- Connecticut: green
- New Jersey: orange

Show the map, use `name` and `state` in the tooltip,

In [5]:
ny = df[df.state == "NY"]
nj = df[df.state == "NJ"]
ct = df[df.state == "CT"]

m = ny.explore(tooltip=["name", "state"], style_kwds={"fillColor": "blue", "color":"blue", "fillOpacity": 1})   
nj.explore(m=m, tooltip=["name", "state"], style_kwds={"fillColor": "orange", "color":"orange", "fillOpacity": 1})   
ct.explore(m=m, tooltip=["name", "state"], style_kwds={"fillColor": "green", "color":"green", "fillOpacity": 1})   
m

Problem 5: Median income colormap
---------------------------------
Use a divergent colormap to plot the places on the map
by median income. Use `name` and `median_inc` in the tooltip.

- Filter out missing values (where median inc is at or below zero)
- Format the median income as currency.


In [6]:

data = df.copy()
# filter out the missing values
data = data[data.median_inc > 0]

lower = data.median_inc.min()
upper = data.median_inc.max()
cmap = plt.get_cmap("seismic_r")
norm = colors.Normalize(vmin=lower, vmax=upper)


def get_color(pop_change):
    return colors.rgb2hex(cmap(norm(pop_change)))

data["color"] = data.median_inc.apply(get_color)

title = f"""
Median Income in NYC Metro Area
===============================
Poorer areas in red, wealthier areas in blue.
"""
display(md(title))

def mk_tooltip(row):
    return f"<b>{row["name"]}:</b> ${row.median_inc:,}"

data["tooltip"] = data.apply(mk_tooltip, axis=1)

data.explore(color=data['color'], tooltip="tooltip", tooltip_kwds={ "labels": False}, style_kwds={"radius": 5, "fillOpacity": .8})


Median Income in NYC Metro Area
===============================
Poorer areas in red, wealthier areas in blue.


Problem 6: Racial/Ethnic Plurality by place
============================================
Working form last week's demonstration
of creating a categorical color map from racial
pluralities, make a similar map with our data set.

The map can be an exact copy of the demo one from lab 4,
with the following additions/changes:

- filter out all data with zero population
- use `name` as the tooltip
- create a new column for the popup:
  - use markdown to format the popup
  - show the **percentages** of each ethnicity in the popup

Example of popup:

<b>New York City</b><br>
<br>
Asian: 14.36%<br>
Black/African American: 21.00%<br>
Hispanic/Latinx: 29.03%<br>
White: 31.16%<br>
American Indian: 0.18%<br>
Pacific Island: 0.04%<br>
Mixed: 3.09%<br>
Other: 1.14%<br>

_Note: put <br> at the end of the line to force a line break in the popup._


In [7]:
# let's make a categorical map showing which racial/ethnic group
# have a plurality in each city

data = df.copy()
data = data[data.total_pop > 0]

# we have these categories in our columns
# this dict maps the column name to a numerical category
ethnic_cats = {
    'asian': 0,
    'black': 1,
    'latino': 2,
    'white': 3
}

def get_plurality(row):
    max_cat = row[["asian", "black", "latino", "white"]].idxmax()
    return max_cat

data["plurality"] = data.apply(get_plurality, axis=1)

# tab10 has 10 distinct colors
cmap = plt.get_cmap('tab10')

def get_color(plurality):
    category_number = ethnic_cats[plurality]
    return colors.rgb2hex(cmap(category_number))

data["color"] = data.plurality.apply(get_color)


def mk_popup(row):
    pop = f"""<b>{row["name"]}</b><br>
<br>
Asian: {(row.asian / row.total_pop) * 100:.2f}%<br>
Black/African American: {(row.black / row.total_pop) * 100:.2f}%<br>
Hispanic/Latinx: {(row.latino / row.total_pop) * 100:.2f}%<br>
White: {(row.white / row.total_pop) * 100:.2f}%<br>
American Indian: {(row.indian / row.total_pop) * 100:.2f}%<br>
Pacific Island: {(row.pacific / row.total_pop) * 100:.2f}%<br>
Mixed: {(row.mixed / row.total_pop) * 100:.2f}%<br>
Other: {(row.other / row.total_pop) * 100:.2f}%<br>
"""
    return pop

data["popup"] = data.apply(mk_popup, axis=1)

legend_data = {
    'Asian/Pacific Islander': colors.rgb2hex(cmap(0)),
    'Black/African American': colors.rgb2hex(cmap(1)),
    'Latinx/Chicanx/Hispanic': colors.rgb2hex(cmap(2)),
    f'White{"&nbsp;"*20}': colors.rgb2hex(cmap(3))
}
legend_df = pd.DataFrame([legend_data])
legend = legend_df.style.apply(lambda row: [f'background-color: {color}' for color in row], axis=0)

title = f"""
Racial/ethnic pluralities
=================================================
"""
display(md(title))
display(legend)

data.explore(color=data['color'], tooltip="name", popup="popup", style_kwds={"radius": 5, "fillOpacity": .8})




Racial/ethnic pluralities
=================================================


Unnamed: 0,Asian/Pacific Islander,Black/African American,Latinx/Chicanx/Hispanic,White
0,#1f77b4,#ff7f0e,#2ca02c,#d62728


Bonus Problem: Plot population on a log scale
=============================================
Population size in our data set
varies widely from a handful of people in
small towns on Fire Island, to NYC
with more than 8 million people. Most of the places
are clustered at the lower end of the scale. If we
use a sequential color map, almost all of the places 
are white or very faint.

This means we can't meaningfully use a sequential
color scale to get a sense of how place
size varies in the region. One way to
get a better sense is to use a logarithmic
scale rather than a linear scale.

Try this:

- filter out NYC (it's still too big)
- filter out all of the zerop popluation towns
- create a new column called `log_pop` and set it to the math.log() of total_pop
- plot `log_pop` on the map, using a sequential colormap of your choice

In [8]:
# copy the data and filter out NYC
data = df.copy()
data = data[(data.total_pop < 500_000) & (data.total_pop > 0)]
# data = data[data.total_pop > 0]

data["pop_log"] = data.total_pop.apply(math.log2)
lower = data.pop_log.min()
upper = data.pop_log.max()

# use the "coolwarm" color map from matplotlib
cmap = plt.get_cmap("Purples")
norm = colors.Normalize(vmin=lower, vmax=upper)


def get_color(pop_change):
    return colors.rgb2hex(cmap(norm(pop_change)))


data["color"] = data.pop_log.apply(get_color)


title = f"""
Place size New York Metro Area
=================================================
"""
display(md(title))

# make a tooltip with both the population change percentage and the number of people


def mk_tooltip(row):
    return f"<b>{row['name']}:</b> {row.total_pop:,} people."


data["tooltip"] = data.apply(mk_tooltip, axis=1)


data.explore(color=data['color'], tooltip="tooltip", tooltip_kwds={"labels": False}, style_kwds={"radius": 5, "fillOpacity": 1})


Place size New York Metro Area
=================================================
