<a href="https://colab.research.google.com/github/PUBPOL-2130/notebooks/blob/main/Week3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [34]:
!pip install -q census us folium geopandas colorcet

In [35]:
%config InlineBackend.figure_formats = ["retina"]

import matplotlib.pyplot as plt
import pandas as pd
import geopandas as gpd
import folium
import colorcet as cc

from census import Census

In [36]:
census = Census("", year=2020)

# RM NOTES PLZ READ:

This code should work if you are able to run Week 3 from Moon's PubPol class. This code downloads tract level MENA ancestry and race/ethnicity data for Michigan, creates a few different MENA options based off the table in this article (https://en.wikipedia.org/wiki/Middle_East_and_North_Africa) and saves as a cvs.  Skip the geographical parts (left in, but commented out, in case we want to make a map).

## Loading geographical data

Census geographies are published as _shapefiles_. The [ESRI shapefile format](https://en.wikipedia.org/wiki/Shapefile) is extremely popular among GIS (geographic information systems) practictioners, and any serious mapping tool (such as [ArcGIS](https://www.arcgis.com/index.html)) can read shapefiles. [GeoPandas](https://geopandas.org/en/stable/getting_started/introduction.html) is a Python package that extends Pandas to support reading shapefiles and plotting geographies; we'll use this package extensively in the coming weeks to build geospatial visualizations. Let's start by downloading a shapefile containing the outlines of all U.S. counties from the Census website.

In [37]:
#!curl -O https://web.archive.org/web/20241002004532if_/https://www2.census.gov/geo/tiger/TIGER2024/COUNTY/tl_2024_us_county.zip
!curl -O https://www2.census.gov/geo/tiger/TIGER2024/COUNTY/tl_2024_us_county.zip



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  4 80.0M    4 3662k    0     0  15.7M      0  0:00:05 --:--:--  0:00:05 15.8M
 67 80.0M   67 54.2M    0     0  44.2M      0  0:00:01  0:00:01 --:--:-- 44.2M
100 80.0M  100 80.0M    0     0  46.8M      0  0:00:01  0:00:01 --:--:-- 46.8M


In [38]:
#county_gdf = gpd.read_file("tl_2024_us_county.zip").set_index("GEOID")

In [39]:
#county_gdf.head(5)

GeoPandas includes a `.plot()` function (similar to the `.plot.pie()` and `.plot.bar()` Pandas functions we saw last week). With this function, we can quickly get a sense of what `county_gdf` contains.

In [40]:
#county_gdf.plot(figsize=(40, 80))
#plt.show()

There's a problem with this initial visualization: it's mostly empty space! This is because the county shapefile covers _all_ territories of the U.S., including some Pacific territories (Guam, American Samoa, and the Northern Mariana Islands) that appear in the upper right of this plot. Let's narrow down the visualization to the continental U.S by filtering on FIPS code (conveniently, the outlying territories all have FIPS codes above 56).

In [41]:
# Filter out American Samoa, U.S. Virgin Islands, etc.
#continental_gdf = county_gdf[
#      (county_gdf.STATEFP <= "56")
#    & (county_gdf.STATEFP != "02")  # exclude Alaska
#    & (county_gdf.STATEFP != "15")  # exclude Hawaii
#]

In [42]:
#continental_gdf.plot(figsize=(20, 40))
#plt.show()

That's better! Now, let's color the map by state (using the `STATEFP` column, which contains the state FIPS code for each county). Note that some of the state borders look a little odd (this effect is most pronounced in Michigan); this is because the Census geographies include some water areas.

In [43]:
#continental_gdf.plot(figsize=(20, 40), column="STATEFP", cmap=cc.cm.glasbey_hv)
#plt.show()

## Choosing a map projection

You may have noticed something about the shape of the country not looking "right" -- that's because this is what the geographers call "unprojected," meaning that it uses latitude and longitude for its x and y coordinates directly, rather than transforming them to better represent either angles or areas.

ref:
* [Choosing the right map projection](https://source.opennews.org/articles/choosing-right-map-projection/)
* [geopandas.GeoDataFrame.plot()](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html)

In [44]:
#continental_gdf.crs

In [45]:
#continental_gdf = continental_gdf.to_crs("EPSG:2163")

In [46]:
#continental_gdf.crs

In [47]:
#continental_gdf.plot(figsize=(20, 40), column="STATEFP", cmap=cc.cm.glasbey_hv, edgecolor="0.2", linewidth=0.5)
#plt.axis("off")
#plt.show()

## The Census unit hierarchy

The core Census geographic units are organized into hierarchy (sometimes referred to as _the central spine_): states contain counties, which contain _tracts_, which contain _block groups_, which contain _blocks_. States and counties are (mostly) static political units with boundaries generally not determined by the Census; tracts, block groups, and blocks are statistical units defined by the Census. These statistical units are subject to change for each decennial Census. Blocks are the most granular unit in this hierarchy, and [many Census blocks are unpopulated](https://mapsbynik.com/maps/census0pop/).

![U.S. Census central spine (source: University of Missouri)](https://mcdc.missouri.edu/geography/sumlevs/censusgeochart.png)

_(image credit: University of Missouri)_

The Census also publishes data for political and statistical areas that do not fall neatly in this spine. For instance, school districts nest in states, but they do not necessarily nest in counties; voting districts nest in counties, but not necessarily in tracts.

Let's examine how the state of New York is broken up into various Census units, starting with counties—the least granular unit (below the state level) on the central spine.

### Counties
The U.S. Census releases a county shapefile at the national level; to plot just the counties in New York, we need to filter by state FIPS code.

### County subdivisions

For reporting purposes, the Census divides counties into _county subdivisions_. These subdivisions generally avoid dividing up preexisting political divisions: towns, cities, incorporated places, and the like. (For more on how county subdivisions are defined, see [chapter 8 of the _Geographic Areas Reference Manual_](https://www2.census.gov/geo/pdfs/reference/GARM/Ch8GARM.pdf).)

### Tracts
Census tracts are immediately below counties on the central spine. While counties in New York (and most states) vary wildly in population, the populations of Census tracts are much more constrained: [according to the U.S. Census](https://www.census.gov/programs-surveys/geography/about/glossary.html#par_textimage_13), "Census tracts generally have a population size between 1,200 and 8,000 people, with an optimum size of 4,000 people". Because of this uniformity, the density of Census tracts roughly corresponds to population density. In the map below, observe the high density of small tracts in major population centers like New York City and the low density of large tracts in the sparsely populated upstate regions.

In [48]:
# alt source: https://web.archive.org/web/20241003052404if_/https://www2.census.gov/geo/tiger/TIGER2024/TRACT/tl_2024_36_tract.zip
#!curl -O https://www2.census.gov/geo/tiger/TIGER2024/TRACT/tl_2024_26_tract.zip

In [49]:
#Mich_tract_gdf = gpd.read_file("tl_2024_26_tract.zip").set_index("GEOID").to_crs("EPSG:2163")

In [50]:
#Mich_tract_gdf.head(5)

In [51]:
#Mich_tract_gdf.plot(figsize=(10, 20), column="COUNTYFP", cmap=cc.cm.glasbey_hv, edgecolor="0.2", linewidth=0.5)
#plt.axis("off")
#plt.show()

We can filter by FIPS code to zoom in on a particular county. Here is Kings County, which is coterminous with the borough of Brooklyn.

# Introducing ACS data

In addition to publishing decennial data, the U.S. Census Bureau publishes [American Community Survey](https://www.census.gov/programs-surveys/acs) (ACS) data every year. The ACS is sent to a random sample of U.S. addresses; ACS data includes population estimates by race and ethnicity (which is useful for understanding demographic shifts between decennial Censuses), but it is also an invaluable source for understanding economic and social trends. As a simple example, let's take a look at a subset of the means of transportation data available in the ACS.

*Note:* You may eventually want to explore different kinds of information available from the ACS. One way to do this is to look at the associated documentation. Specifically, you can find relevant variables and table IDs by looking at the ["ACS Detailed Table Shells"](https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.2023.html#list-tab-79594641) located on the Census website. Additionally, we put together a [Google sheets workbook](https://docs.google.com/spreadsheets/d/1DtGNarbQLaJdtMiINQ7brQ-Y6zawBkkBkp0VGSENsZw/edit?usp=sharing) that organizes this information for the 2023-2015 ACS and the 2020 and 2010 decennial census.

In [52]:
Mich_ancestry_raw = census.acs5.get(
    (
        # --- your ancestry fields ---
        "B04006_001E", "B04006_002E", "B04006_007E", "B04006_008E",
        "B04006_009E", "B04006_010E", "B04006_011E", "B04006_012E",
        "B04006_013E", "B04006_015E", "B04006_016E", "B04006_017E",
        "B04006_048E", "B04006_050E", "B04006_082E", "B04006_084E",
        "B04006_091E",
        # --- NEW: race & ethnicity ---
        "B02001_002E",  # White alone
        "B02001_003E",  # Black or African American alone
        "B02001_007E",  # Some Other Race alone
        "B02001_008E",  # Two or More Races
        "B03003_003E",  # Hispanic or Latino
    ),
    geo={
        "for": "tract:*",
        "in": "state:36 county:*",
    },
    year=2022,
)


In [53]:
Mich_df = pd.DataFrame(Mich_ancestry_raw).rename(
    columns={
        #"B04006_006E": "Arab total",
        "B04006_001E": "total_pop",
        "B04006_002E": "Afgan",
        "B04006_007E": "Egyption",
        "B04006_008E": "Iraqui",
        "B04006_009E": "Jordanian",
        "B04006_010E": "Lebanese",
        "B04006_011E": "Moroccan",
        "B04006_012E": "Palistinian",
        "B04006_013E": "Syrian",
        #"B04006_014E": "Arab: Arab",
        "B04006_015E": "Other Arab",
        "B04006_016E": "Armenian",
        "B04006_017E": "Assyrian/Chaldean/Syriac",
        "B04006_048E": "Iranian",
        "B04006_050E": "Israel",
        "B04006_082E": "Somali",
        "B04006_084E": "Sudan",
        "B04006_091E": "Turkey",
        "B02001_002E": "White",
        "B02001_003E": "Black",  # Black or African American alone
        "B02001_007E": "SOR",  # Some Other Race alone
        "B02001_008E": "two_or_more",  # Two or More Races
        "B03003_003E": "hispanic",  # Hispanic or Latino
    }
)

In [54]:
Mich_df["pct_white"] = Mich_df["White"] / Mich_df["total_pop"]
Mich_df["pct_black"] = Mich_df["Black"] / Mich_df["total_pop"]
Mich_df["pct_SOR"] = Mich_df["SOR"] / Mich_df["total_pop"]
Mich_df["pct_two_or_more"] = Mich_df["two_or_more"] / Mich_df["total_pop"]
Mich_df["pct_hispanic"] = Mich_df["hispanic"]/ Mich_df["total_pop"]

This MENA category is from the World Bank

In [55]:
# 1 ─ list the ancestries that are “YES” in the World‑Bank MENA column
wb_cols = [
    "Egyption",     # Egypt
    "Iraqui",       # Iraq
    "Jordanian",    # Jordan
    "Lebanese",     # Lebanon
    "Moroccan",     # Morocco
    "Palistinian",  # Palestine
    "Syrian",       # Syria
    "Iranian",      # Iran
    "Israel"        # Israel
]

# 2 ─ (optional) sanity‑check that all required columns exist
missing = [c for c in wb_cols if c not in Mich_df.columns]
if missing:
    raise KeyError(f"Expected ancestry columns not found: {missing}")

# 3 ─ create the World Bank MENA 2003 indicator
#     NaNs are treated as 0 so the sum works even with missing values
Mich_df["world_bank"] = Mich_df[wb_cols].fillna(0).sum(axis=1)

# 4 ─ (optional) inspect the result
print(Mich_df[wb_cols + ["world_bank"]].head())

# (optional) share of total population
Mich_df["pct_world_bank"] = Mich_df["world_bank"] / Mich_df["total_pop"]

   Egyption  Iraqui  Jordanian  Lebanese  Moroccan  Palistinian  Syrian  \
0       0.0    14.0        0.0       0.0       0.0          0.0     0.0   
1       0.0     0.0        0.0       0.0       0.0          0.0     0.0   
2       0.0     0.0        0.0       0.0       0.0          0.0     0.0   
3       0.0     0.0        0.0       0.0       0.0          0.0     0.0   
4       0.0     0.0        0.0       0.0       0.0          0.0     0.0   

   Iranian  Israel  world_bank  
0      0.0     0.0        14.0  
1      0.0     0.0         0.0  
2      0.0     0.0         0.0  
3      0.0     0.0         0.0  
4      0.0     0.0         0.0  


In [56]:
Mich_df.sort_values(by=["world_bank"])

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,state,county,tract,pct_white,pct_black,pct_SOR,pct_two_or_more,pct_hispanic,world_bank,pct_world_bank
1617,1947.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,36,047,044600,0.448896,0.013867,0.082178,0.068824,0.190550,0.0,0.000000
2006,1951.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,36,047,120803,0.234239,0.642235,0.091235,0.019477,0.234239,0.0,0.000000
2007,4188.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,36,047,121000,0.214661,0.575215,0.167144,0.020774,0.457975,0.0,0.000000
2008,4649.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,36,047,121400,0.111207,0.680146,0.184556,0.022370,0.375134,0.0,0.000000
2009,7200.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,36,047,122000,0.011667,0.932778,0.037222,0.012083,0.058750,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2289,6583.0,8.0,0.0,42.0,0.0,0.0,0.0,0.0,0.0,0.0,...,36,059,300400,0.767127,0.000000,0.096764,0.036458,0.092055,816.0,0.123956
4334,4592.0,0.0,0.0,0.0,0.0,0.0,262.0,0.0,0.0,0.0,...,36,087,011508,0.992160,0.001089,0.001742,0.001307,0.007186,863.0,0.187936
2288,4477.0,0.0,0.0,29.0,0.0,38.0,0.0,0.0,0.0,0.0,...,36,059,300300,0.762564,0.024793,0.055841,0.020549,0.159482,1099.0,0.245477
2290,6180.0,0.0,16.0,33.0,0.0,0.0,104.0,0.0,0.0,190.0,...,36,059,300500,0.820712,0.003236,0.013592,0.018932,0.022816,1375.0,0.222492


This MENA category is from the UN High Commisioner for Refugees. 

In [57]:
# df  = your existing DataFrame
unhcr_cols = [
    "Egyption",    # Egypt
    "Iraqui",      # Iraq
    "Jordanian",   # Jordan
    "Lebanese",    # Lebanon
    "Moroccan",    # Morocco
    "Palistinian", # Palestine
    "Syrian"       # Syria
]

# make sure all expected columns are present
missing = [c for c in unhcr_cols if c not in Mich_df.columns]
if missing:
    raise KeyError(f"Missing ancestry columns: {missing}")

# add the UNHCR MENA total
Mich_df["unhcr"] = Mich_df[unhcr_cols].fillna(0).sum(axis=1)


# (optional) share of total population
Mich_df["pct_unhcr"] = Mich_df["unhcr"] / Mich_df["total_pop"]

# 4 ─ (optional) inspect the result
print(Mich_df[unhcr_cols + ["unhcr"]+["pct_unhcr"]].head())

   Egyption  Iraqui  Jordanian  Lebanese  Moroccan  Palistinian  Syrian  \
0       0.0    14.0        0.0       0.0       0.0          0.0     0.0   
1       0.0     0.0        0.0       0.0       0.0          0.0     0.0   
2       0.0     0.0        0.0       0.0       0.0          0.0     0.0   
3       0.0     0.0        0.0       0.0       0.0          0.0     0.0   
4       0.0     0.0        0.0       0.0       0.0          0.0     0.0   

   unhcr  pct_unhcr  
0   14.0   0.006197  
1    0.0   0.000000  
2    0.0   0.000000  
3    0.0   0.000000  
4    0.0   0.000000  


In [58]:
Mich_df.sort_values(by=["unhcr"])

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,tract,pct_white,pct_black,pct_SOR,pct_two_or_more,pct_hispanic,world_bank,pct_world_bank,unhcr,pct_unhcr
1832,4087.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,078200,0.011745,0.737950,0.100318,0.124541,0.226572,0.0,0.000000,0.0,0.000000
4456,1756.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,062100,0.928246,0.030752,0.020501,0.018793,0.082574,11.0,0.006264,0.0,0.000000
4455,2625.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,062002,0.961905,0.000381,0.022857,0.010667,0.022857,0.0,0.000000,0.0,0.000000
2296,5432.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,301000,0.844072,0.008100,0.018409,0.047865,0.060015,26.0,0.004786,0.0,0.000000
2298,4950.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,31.0,...,301102,0.779192,0.025657,0.014343,0.026465,0.038182,72.0,0.014545,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2707,7417.0,0.0,0.0,0.0,0.0,176.0,0.0,0.0,504.0,0.0,...,012601,0.885803,0.000000,0.052177,0.021033,0.057705,680.0,0.091681,680.0,0.091681
887,5387.0,0.0,25.0,0.0,53.0,24.0,0.0,0.0,586.0,0.0,...,007800,0.938927,0.009096,0.000186,0.049193,0.021533,688.0,0.127715,688.0,0.127715
4293,6875.0,0.0,696.0,0.0,0.0,0.0,0.0,0.0,0.0,45.0,...,029104,0.675927,0.022400,0.048145,0.073600,0.202036,696.0,0.101236,696.0,0.101236
1490,5223.0,0.0,0.0,51.0,86.0,447.0,118.0,0.0,0.0,0.0,...,031401,0.550833,0.007276,0.147999,0.038101,0.193758,702.0,0.134406,702.0,0.134406


This MENA category is the definition from the UN Statistics Division

In [59]:
# df = your existing DataFrame of ACS ancestry counts
unsd_cols = [
    "Egyption",    # Egypt
    "Iraqui",      # Iraq
    "Jordanian",   # Jordan
    "Lebanese",    # Lebanon
    "Moroccan",    # Morocco
    "Palistinian", # Palestine
    "Syrian",      # Syria
    "Armenian",    # Armenia
    "Israel",      # Israel
    "Sudan",       # Sudan
    "Turkey"       # Turkey
]

# sanity‑check that every column exists
missing = [c for c in unsd_cols if c not in Mich_df.columns]
if missing:
    raise KeyError(f"Missing ancestry columns: {missing}")

# add the World‑Bank MENA total
Mich_df["unsd"] = Mich_df[unsd_cols].fillna(0).sum(axis=1)

# (optional) percent of total population
Mich_df["pct_unsd"] = Mich_df["unsd"] / Mich_df["total_pop"]


In [60]:
Mich_df.sort_values(by=["unsd"])

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,pct_black,pct_SOR,pct_two_or_more,pct_hispanic,world_bank,pct_world_bank,unhcr,pct_unhcr,unsd,pct_unsd
4360,3351.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.623694,0.224112,0.017010,0.262608,0.0,0.000000,0.0,0.000000,0.0,0.000000
4298,3674.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.460261,0.042188,0.203048,0.399837,0.0,0.000000,0.0,0.000000,0.0,0.000000
1469,3599.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.600722,0.102528,0.057516,0.149764,0.0,0.000000,0.0,0.000000,0.0,0.000000
4300,1133.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.379523,0.105031,0.266549,0.347749,0.0,0.000000,0.0,0.000000,0.0,0.000000
4301,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,,,0.0,,0.0,,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1747,3910.0,0.0,0.0,0.0,0.0,0.0,293.0,0.0,0.0,198.0,...,0.007161,0.000000,0.112020,0.068031,293.0,0.074936,293.0,0.074936,766.0,0.195908
4289,6265.0,0.0,267.0,0.0,0.0,0.0,8.0,0.0,103.0,260.0,...,0.026656,0.000000,0.085235,0.115882,716.0,0.114286,378.0,0.060335,791.0,0.126257
4086,3982.0,0.0,145.0,0.0,0.0,0.0,0.0,0.0,198.0,18.0,...,0.013059,0.046208,0.017830,0.080864,541.0,0.135861,343.0,0.086138,811.0,0.203666
4334,4592.0,0.0,0.0,0.0,0.0,0.0,262.0,0.0,0.0,0.0,...,0.001089,0.001742,0.001307,0.007186,863.0,0.187936,262.0,0.057056,863.0,0.187936


This MENA category is the union of all 6 definitions listed on wikipedia, plus "Other Arab".

In [61]:
# df = your existing DataFrame of ACS ancestry counts
mena_cols = [
    "Afgan",
    "Egyption",
    "Iraqui",
    "Jordanian",
    "Lebanese",
    "Moroccan",
    "Palistinian",
    "Syrian",
    "Armenian",
    "Iranian",
    "Israel",
    "Somali",
    "Sudan",
    "Turkey",
    "Other Arab"
]

# check that every column is present
missing = [c for c in mena_cols if c not in Mich_df.columns]
if missing:
    raise KeyError(f"Missing ancestry columns: {missing}")

# add a total count for the listed MENA‑overlap ancestries
Mich_df["mena_overlap"] = Mich_df[mena_cols].fillna(0).sum(axis=1)

# (optional) share of total population
Mich_df["pct_mena_overlap"] = Mich_df["mena_overlap"] / Mich_df["total_pop"]


In [62]:
Mich_df.sort_values(by=["mena_overlap"])

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,pct_two_or_more,pct_hispanic,world_bank,pct_world_bank,unhcr,pct_unhcr,unsd,pct_unsd,mena_overlap,pct_mena_overlap
4301,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,0.0,,0.0,,0.0,,0.0,
3320,1737.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.054116,0.012090,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
4376,3184.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.129397,0.270415,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
2013,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,,,0.0,,0.0,,0.0,,0.0,
2014,2501.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.014394,0.012395,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2288,4477.0,0.0,0.0,29.0,0.0,38.0,0.0,0.0,0.0,0.0,...,0.020549,0.159482,1099.0,0.245477,67.0,0.014965,148.0,0.033058,1099.0,0.245477
887,5387.0,0.0,25.0,0.0,53.0,24.0,0.0,0.0,586.0,0.0,...,0.049193,0.021533,688.0,0.127715,688.0,0.127715,1274.0,0.236495,1274.0,0.236495
1315,4949.0,0.0,317.0,0.0,8.0,0.0,35.0,290.0,0.0,620.0,...,0.069711,0.264700,657.0,0.132754,650.0,0.131340,650.0,0.131340,1277.0,0.258032
2290,6180.0,0.0,16.0,33.0,0.0,0.0,104.0,0.0,0.0,190.0,...,0.018932,0.022816,1375.0,0.222492,153.0,0.024757,207.0,0.033495,1567.0,0.253560


In [63]:
Mich_df

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,pct_two_or_more,pct_hispanic,world_bank,pct_world_bank,unhcr,pct_unhcr,unsd,pct_unsd,mena_overlap,pct_mena_overlap
0,2259.0,0.0,0.0,14.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.084108,0.153165,14.0,0.006197,14.0,0.006197,14.0,0.006197,14.0,0.006197
1,2465.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.070994,0.070588,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
2,2374.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.031171,0.018955,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
3,2837.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.113148,0.237222,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
4,3200.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.126562,0.057188,0.0,0.000000,0.0,0.000000,26.0,0.008125,26.0,0.008125
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5406,2724.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,...,0.015051,0.043319,8.0,0.002937,8.0,0.002937,8.0,0.002937,8.0,0.002937
5407,1957.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.017885,0.002555,7.0,0.003577,7.0,0.003577,7.0,0.003577,7.0,0.003577
5408,3656.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.013950,0.010394,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000
5409,2333.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.000000,0.051865,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000


In [64]:
Mich_df["GEOID"] = (
    Mich_df["state"]
    + Mich_df["county"]
    + Mich_df["tract"]
)
#Mich_df = Mich_df[["GEOID"]]
#brooklyn_subway_usage_df["subway_pct"] = 100 * brooklyn_subway_usage_df["subway_commuter_count"] / brooklyn_subway_usage_df["commuter_count"]

In [65]:
Mich_df.sort_values(by=["Afgan"])

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,pct_hispanic,world_bank,pct_world_bank,unhcr,pct_unhcr,unsd,pct_unsd,mena_overlap,pct_mena_overlap,GEOID
0,2259.0,0.0,0.0,14.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.153165,14.0,0.006197,14.0,0.006197,14.0,0.006197,14.0,0.006197,36001000100
3558,1914.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.084639,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36081018600
3557,3078.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.223847,7.0,0.002274,0.0,0.000000,31.0,0.010071,38.0,0.012346,36081018502
3556,3636.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,9.0,118.0,...,0.301430,9.0,0.002475,9.0,0.002475,251.0,0.069032,369.0,0.101485,36081018501
3555,3017.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.174345,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36081018402
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2523,6007.0,259.0,26.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,...,0.160646,26.0,0.004328,26.0,0.004328,26.0,0.004328,299.0,0.049775,36059519000
2414,8007.0,262.0,0.0,0.0,0.0,19.0,0.0,0.0,0.0,0.0,...,0.232172,29.0,0.003622,19.0,0.002373,108.0,0.013488,380.0,0.047458,36059410600
3988,3622.0,306.0,0.0,0.0,0.0,19.0,0.0,0.0,0.0,0.0,...,0.458310,19.0,0.005246,19.0,0.005246,19.0,0.005246,325.0,0.089729,36081092500
4828,7966.0,353.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.115867,0.0,0.000000,0.0,0.000000,0.0,0.000000,353.0,0.044313,36103158325


In [66]:
Mich_df = Mich_df[Mich_df.total_pop > 0]

In [67]:
Mich_df

Unnamed: 0,total_pop,Afgan,Egyption,Iraqui,Jordanian,Lebanese,Moroccan,Palistinian,Syrian,Other Arab,...,pct_hispanic,world_bank,pct_world_bank,unhcr,pct_unhcr,unsd,pct_unsd,mena_overlap,pct_mena_overlap,GEOID
0,2259.0,0.0,0.0,14.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.153165,14.0,0.006197,14.0,0.006197,14.0,0.006197,14.0,0.006197,36001000100
1,2465.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.070588,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36001000201
2,2374.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.018955,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36001000202
3,2837.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.237222,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36001000301
4,3200.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.057188,0.0,0.000000,0.0,0.000000,26.0,0.008125,26.0,0.008125,36001000302
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5406,2724.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8.0,0.0,...,0.043319,8.0,0.002937,8.0,0.002937,8.0,0.002937,8.0,0.002937,36123150301
5407,1957.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.002555,7.0,0.003577,7.0,0.003577,7.0,0.003577,7.0,0.003577,36123150302
5408,3656.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.010394,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36123150400
5409,2333.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.051865,0.0,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.000000,36123150501


In [68]:
# assume your DataFrame is called df
Mich_df.to_csv("Michigan_MENA.csv",              # output file name
          index=False,                # don’t write row numbers
          encoding="utf‑8",           # character set (default is fine for most cases)
          float_format="%.3f")        # optional: format floats
