# ZIP correspondence to PUMA

Here we use a PUMA-to-ZIP crosswalk generated by the [GeoCorr 2014](https://mcdc.missouri.edu/applications/geocorr2014.html) tool from the University of Missouri.
We don't actually need to do any pre-processing here, but want to do a little bit of exploratory analysis.

Steps to re-generate:

* Shift-click to select all states.
* Source geography: PUMA (under 2012 geographies)
* Target geography: ZIP/ZCTA (under 2010 geographies)
* Weighting variable: housing units
* Uncheck "generate a report"
* Click "run request"

In [1]:
import pandas as pd

! whoami
! date

zmbc
Wed Dec  7 15:46:30 PST 2022


In [2]:
puma_to_zip = pd.read_csv('../../data/raw/geocorr2014_2230702381.csv', skiprows=[1])
puma_to_zip

Unnamed: 0,state,puma12,zcta5,stab,zipname,PUMAname,hus10,afact
0,1,100,35543,AL,"Bear Creek, AL","Lauderdale, Colbert, Franklin & Marion (Northe...",521,0.0060
1,1,100,35564,AL,"Hackleburg, AL","Lauderdale, Colbert, Franklin & Marion (Northe...",1151,0.0132
2,1,100,35565,AL,"Haleyville, AL","Lauderdale, Colbert, Franklin & Marion (Northe...",1725,0.0197
3,1,100,35570,AL,"Hamilton, AL","Lauderdale, Colbert, Franklin & Marion (Northe...",180,0.0021
4,1,100,35571,AL,"Hodges, AL","Lauderdale, Colbert, Franklin & Marion (Northe...",447,0.0051
...,...,...,...,...,...,...,...,...
44408,56,500,82945,WY,"Superior, WY","Sweetwater, Fremont, Uinta, Sublette & Hot Spr...",193,0.0036
44409,56,500,83001,WY,"Jackson, WY","Sweetwater, Fremont, Uinta, Sublette & Hot Spr...",5,0.0001
44410,56,500,83113,WY,"Big Piney, WY","Sweetwater, Fremont, Uinta, Sublette & Hot Spr...",1316,0.0246
44411,56,500,83115,WY,"Daniel, WY","Sweetwater, Fremont, Uinta, Sublette & Hot Spr...",651,0.0121


## If someone moved within the same PUMA, what would be their probability of retaining the same ZIP3?

In [3]:
puma_to_zip3 = (
    puma_to_zip
        .assign(zip3=lambda x: x.zcta5.astype(str).str[:3])
        .drop(columns=['zipname', 'PUMAname', 'state', 'zcta5'])
        .groupby(['stab', 'puma12', 'zip3'])
        .sum()
)
puma_to_zip3

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,hus10,afact
stab,puma12,zip3,Unnamed: 3_level_1,Unnamed: 4_level_1
AK,101,995,50364,0.9997
AK,101,999,15,0.0003
AK,102,995,62653,0.9999
AK,200,995,2832,0.0394
AK,200,996,68733,0.9560
...,...,...,...,...
WY,500,826,646,0.0121
WY,500,829,30851,0.5756
WY,500,830,5,0.0001
WY,500,831,1967,0.0367


In [4]:
puma_to_prob = (
    puma_to_zip3.assign(proportion_staying_in=lambda x: x.afact * x.afact)
        .drop(columns=['hus10', 'afact'])
        .groupby(['stab', 'puma12'])
        .sum()
)
puma_to_prob

Unnamed: 0_level_0,Unnamed: 1_level_0,proportion_staying_in
stab,puma12,Unnamed: 2_level_1
AK,101,0.999400
AK,102,0.999800
AK,200,0.915506
AK,300,0.453276
AK,400,0.219195
...,...,...
WY,100,0.259934
WY,200,0.395604
WY,300,0.989646
WY,400,0.709283


In [5]:
# Assuming PUMAs have the same population (weight), which they should approximately have,
# and the same rates of migration within them (which they probably won't, but close enough)
puma_to_prob.mean()

proportion_staying_in    0.762855
dtype: float64

## Same question, but for MIGPUMA instead of PUMA

In [6]:
# MIGPUMA estimate based on mean PUMAs per MIGPUMA
probability_of_same_PUMA = (1 / 2.4)
probability_of_same_PUMA * puma_to_prob.mean()

proportion_staying_in    0.317856
dtype: float64