Lab 9 Exercises
===============
Please see the example code
and companion video for an
introduction to these topics


In [1]:
# install this very alpha library directly from github
!pip install https://github.com/mcuringa/cartopy/raw/refs/heads/main/dist/maptools-latest.tar.gz -q

!pip install census us plotly nbformat -q

from maptools import census_vars
import pandas as pd
import geopandas as gpd
import plotly.express as px
import us
from census import Census

# from google.colab import userdata
# api_key = userdata.get('CENSUS_API_KEY')
import os
api_key = os.getenv('CENSUS_API_KEY')

# creaate one census object we will reuse
c = Census(api_key)


In [2]:
census_vars.search("foreign born")

Unnamed: 0,group,concept,match
10044,B05014,Sex By Age For The Foreign-Born Population,81.68%
4388,B05013,Sex By Age For The Foreign-Born Population,81.68%
2732,B05006,Place Of Birth For The Foreign-Born Population In The United States,62.87%
20632,B05015,Place Of Birth By Year Of Entry For The Foreign-Born Population,60.82%
19909,B05008,Sex By Place Of Birth By Year Of Entry For The Foreign-Born Population,58.37%
6624,B05007,Place Of Birth By Year Of Entry By Citizenship Status For The Foreign-Born Population,55.30%
21560,B25091,Mortgage Status By Selected Monthly Owner Costs As A Percentage Of Household Income In The Past 12 Months,0.00%
1944,B25092,Median Selected Monthly Owner Costs As A Percentage Of Household Income In The Past 12 Months,0.00%
8564,B25093,Age Of Householder By Selected Monthly Owner Costs As A Percentage Of Household Income In The Past 12 Months,0.00%
16210,B25094,Selected Monthly Owner Costs,0.00%


Foreign-Born Population
=======================
Table **B05006**, _Place of Birth for the Foreign-Born Population in the United States_,
provides details on the birthplace of foreign-born individuals living in the U.S.

You will see that I trimmed the fields to only include Latin American & Caribbean countries,
and only include "named" countries (not "other" or "unknown" categories).
The "universe" for this table is total foreign-born population, so I pushed in
**B05006_001E** to get the total population as well (i.e. every resident).

Note from the [Census Bureau FAQ](https://www.census.gov/topics/population/foreign-born/about/faq.html):

> The foreign-born population is composed of anyone who is 
> not a U.S. citizen at birth. This includes naturalized U.S. 
> citizens, non-citizen U.S. nationals, lawful permanent 
> residents (immigrants), temporary migrants (such as foreign 
> students), humanitarian migrants (such as refugees and 
> asylees), and unauthorized migrants.

In the code below, I create a subset of variables
from table B05006, and then load them from the census.
I create some lists of column names to make it easier
to get the subsets you might need for the exercises.

After loading the census data for every county in NYS, 
I join it with Tiger shapes for counties in NYS in 
order to add county names to the data.

The resulting DataFrame is called `df`.


In [3]:
field_names = census_vars.get_table("B05006")

# there are lots of columns in this dable
# run display() to see them all
# display(field_names)

field_names = {
    'B01003_001E': 'total_pop',
    'B05006_001E': 'total_foreign_born',
    # caribbean
    'B05006_132E': 'bahamas', 
    'B05006_133E': 'barbados',
    'B05006_134E': 'cuba',
    'B05006_135E': 'dominica',
    'B05006_136E': 'dominican_republic',
    'B05006_137E': 'grenada',
    'B05006_138E': 'haiti',
    'B05006_139E': 'jamaica',
    'B05006_140E': 'st_vincent_and_the_grenadines',
    'B05006_141E': 'trinidad_and_tobago',
    'B05006_142E': 'west_indies',
    # central america & mexico
    'B05006_150E': 'mexico',
    'B05006_145E': 'belize',
    'B05006_146E': 'costa_rica',
    'B05006_147E': 'el_salvador',
    'B05006_148E': 'guatemala',
    'B05006_149E': 'honduras',
    'B05006_151E': 'nicaragua',
    'B05006_152E': 'panama',
    # south america
    'B05006_155E': 'argentina', 
    'B05006_156E': 'bolivia',
    'B05006_157E': 'brazil',
    'B05006_158E': 'chile',
    'B05006_159E': 'colombia',
    'B05006_160E': 'ecuador',
    'B05006_161E': 'guyana',
    'B05006_162E': 'peru',
    'B05006_163E': 'uruguay',
    'B05006_164E': 'venezuela',
}

# some useful variables for getting subsets of cols
caribe = ['bahamas', 'barbados', 'cuba', 
          'dominica', 'dominican_republic', 
          'grenada', 'haiti', 'jamaica', 'st_vincent_and_the_grenadines', 
          'trinidad_and_tobago', 'west_indies']

central = ['mexico', 'belize', 'costa_rica', 'el_salvador',
           'guatemala', 'honduras', 'nicaragua', 'panama']

south = ['argentina', 'bolivia', 'brazil', 'chile', 'colombia',
         'ecuador', 'guyana', 'peru', 'uruguay', 'venezuela']

# get the census data for every county in NYS
fields = list(field_names.keys())
data = c.acs5.get(fields=fields, geo={ 'for': 'county:*', 'in': f'state:{us.states.NY.fips}'}, year=2022)

df = pd.DataFrame(data)
df.rename(columns=field_names, inplace=True)
df["statefp"] = df["state"]
df["state"] = df.statefp.apply(census_vars.lookup_state)

# merge with the county names
county_url = "https://www2.census.gov/geo/tiger/TIGER2022/COUNTY/tl_2022_us_county.zip"
counties = gpd.read_file(county_url)
ny_counties = counties[counties.STATEFP == us.states.NY.fips].copy()
ny_counties.rename(columns={"COUNTYFP": "county", "NAME": "county_name"}, inplace=True)

df = df.merge(ny_counties[["county", "county_name"]], on="county")
df.drop(columns=["statefp", "county", "state"], inplace=True)
df.rename(columns={"county_name": "county"}, inplace=True)
# reorder the columns
df = df[["county"] + list(field_names.values())]
df.head()

Unnamed: 0,county,total_pop,total_foreign_born,bahamas,barbados,cuba,dominica,dominican_republic,grenada,haiti,...,argentina,bolivia,brazil,chile,colombia,ecuador,guyana,peru,uruguay,venezuela
0,Albany,315041.0,35961.0,29.0,25.0,8.0,0.0,19.0,17.0,8341.0,...,0.0,79.0,301.0,110.0,26.0,388.0,0.0,32.0,0.0,2646.0
1,Allegany,47222.0,821.0,0.0,9.0,0.0,0.0,0.0,0.0,117.0,...,0.0,0.0,5.0,3.0,0.0,2.0,0.0,0.0,0.0,5.0
2,Bronx,1443229.0,489783.0,168.0,0.0,0.0,0.0,0.0,0.0,371366.0,...,1491.0,437.0,3705.0,4660.0,13514.0,38746.0,2045.0,1158.0,0.0,42207.0
3,Broome,198365.0,15084.0,0.0,0.0,0.0,0.0,0.0,0.0,3801.0,...,0.0,14.0,35.0,73.0,104.0,364.0,0.0,10.0,0.0,1114.0
4,Cattaraugus,77000.0,1206.0,0.0,0.0,0.0,0.0,0.0,0.0,574.0,...,0.0,2.0,6.0,16.0,3.0,60.0,0.0,0.0,0.0,128.0


Problem 1: Styler
-----------------
The code below makes a copy of the data with the columns
for the caribbean countries.
It calcualtes a total caribbean population and percent population
for each county.

Use the `style` property of DataFrame to create
a formatted table the shows the caribbean countries.

You should:
- sort the data by `caribe_pct` in descending order
- show just the county, total_pop, caribe_pop, and caribe_pct columns
- use `style` to format the table with commas and percents
- add a nice title to the table


In [4]:
# use this code to get started
table = df[["county", "total_pop"] + caribe].copy()
table["caribe_pop"] = table[caribe].sum(axis=1)
table["caribe_pct"] = table.caribe_pop / table.total_pop


Problem 2: Bar Chart
--------------------
Make a basic, horizontal bar chat that shows the total foreign born population
in the 10 NYS counties with the largest foreign-born populations.

In [5]:
# write your code here

Problem 3: Grouped Bar Chart
============================
Follow the example of calculating the caribbean populationan percentage
in **problem 1**:

```python
table["caribe_pop"] = table[caribe].sum(axis=1)
table["caribe_pct"] = table.caribe_pop / table.total_pop
```
Make a copy of `df` that has caribe_pct, central_pct,
and south_pct.

Make a grouped bar chart showing these percentages in the five NYC counties.

In [6]:
# write your code here

Bonus Problem
===============
- get only the data for the 5 NYC counties, and the LI counties (Nassau, Suffolk)
  and northern suburbs (Westchester, Rockland, Orange)
- Add a column called `region` which assigns "nyc" to the NYC counties, "li" to the LI counties,
  and "north" to the northern suburbs.
- `groupby()` region using aggregate fuctions
- show a formatted table with the results (you decide)
- make a chart with the results (you decide what to display)



In [7]:
# bonus code