## Scrapping migration flows data for The Bronx, New York 

This notebook scrapes data from the [US Census American Community Survey (ACS) Migration Flows API](https://www.census.gov/data/developers/data-sets/acs-migration-flows.html), which documents the estimate of movers between counties over a five-year period. This scrapper makes use of the US Census' 2018-2022 ACS data set. 

Although it accounts for some documented migrations from countries abroad, it does not account for all. It mainly documents state-to-county migrations, county-to-county migrations, and other smaller-scale migration flows (i.e., minor civil division (MCD) to minor civil division). 

Using this scrapper for other counties or states can be particularly helpful. However, this notebook focuses on The Bronx County in New York. 


In the row below, we import the necessary libraries:

In [1]:
import requests 
import pandas as pd

In this line of code, we create a dictionary called "target_dict" that has the target variables [listed by the US Census](https://api.census.gov/data/2022/acs/flows/variables.html) as the dictonary keys. The dictionary values are the what we'll re-name the variables once they are scrapped and added onto a dataframe. 

In [2]:
## create a dictionary to know what codes mean

target_dict = {
    "COUNTY1" : "reference_fips_county_code",
    "COUNTY1_NAME" : "reference_county_name",
    "STATE1_NAME" : "reference_state_name",
    "MCD1" : "reference_mcd_code",
    "MCD1_NAME" : "reference_mcd_name",
    "NONMOVERS" : "total_same_residence",
    "NONMOVERS_M" : "margin_error_same_residence", 
    "COUNTY2" : "migrated_from_county_code",
    "COUNTY2_NAME" : "migrated_from_county_name",
    "STATE2_NAME" : "migrated_from_location_name", 
    "FROMABROAD" : "county_total_movers_from_abroad",
    "FROMABROAD_M" : "margin_error_movers_from_abroad",
    "FROMDIFFCTY" : "county_total_movers_diff_county",
    "FROMDIFFCTY_M" : "margin_error_movers_diff_county",
    "FROMDIFFMCD" : "county_total_movers_diff_mcd",
    "FROMDIFFMCD_M" : "margin_error_movers_diff_mcd",
    "FROMDIFFMETRO" : "county_total_movers_diff_metro",
    "FROMDIFFMETRO_M" : "margin_error_diff_metro_area",
    "FROMDIFFSTATE" : "county_total_movers_diff_state",
    "FROMDIFFSTATE_M" : "margin_error_diff_state",
    "FROMELSEWHEREUSPR" : "county_total_movers_diff_country",
    "FROMELSEWHEREUSPR_M" : "margin_error_diff_country", 
    
}

In the cells below, we are adding our "target_dict" keys into a variable called "target_variables" which will be used to call up our API request later in the notebook.

In [3]:
type(target_dict.keys())

dict_keys

In [4]:
target_variables = ",".join(target_dict.keys())
target_variables

'COUNTY1,COUNTY1_NAME,STATE1_NAME,MCD1,MCD1_NAME,NONMOVERS,NONMOVERS_M,COUNTY2,COUNTY2_NAME,STATE2_NAME,FROMABROAD,FROMABROAD_M,FROMDIFFCTY,FROMDIFFCTY_M,FROMDIFFMCD,FROMDIFFMCD_M,FROMDIFFMETRO,FROMDIFFMETRO_M,FROMDIFFSTATE,FROMDIFFSTATE_M,FROMELSEWHEREUSPR,FROMELSEWHEREUSPR_M'

Below we are piecing together the components that will make our API request, including:
- a base_url.
- a target_county variable that could be substituted with another county FIPS code (not The Bronx) when needed.
- a target_state variable that could be substituted with another state FIPS code (not New York) when needed.

In [5]:
base_url = "https://api.census.gov/data/2022/acs/flows?"
target_county = "005"
target_state = "36"

Here we are creating our query_string which will input the values for target_county and target_state from the previous cell. 

In [6]:
query_string = f"get={target_variables}&for=county:{target_county}&in=state:{target_state}"
query_string

'get=COUNTY1,COUNTY1_NAME,STATE1_NAME,MCD1,MCD1_NAME,NONMOVERS,NONMOVERS_M,COUNTY2,COUNTY2_NAME,STATE2_NAME,FROMABROAD,FROMABROAD_M,FROMDIFFCTY,FROMDIFFCTY_M,FROMDIFFMCD,FROMDIFFMCD_M,FROMDIFFMETRO,FROMDIFFMETRO_M,FROMDIFFSTATE,FROMDIFFSTATE_M,FROMELSEWHEREUSPR,FROMELSEWHEREUSPR_M&for=county:005&in=state:36'

Before continuing, we must hide our US Census API key to ensure it is not publicly accessible and used without our permission. This is done by importing load_dotenv and saving the API key as .env file in the same location as this notebook.

In [7]:
pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [8]:
##Hiding my key
import os
from dotenv import load_dotenv 
load_dotenv()

True

In [9]:
census_api_key = os.getenv("CENSUS_API_KEY")

In [10]:
api_parameter = f"&key={census_api_key}"
api_parameter

'&key=INSERT CENSUS API KEY'

Finally, we are assembling our endpoint in the cell below:

In [11]:
endpoint = base_url + query_string + api_parameter
endpoint

'https://api.census.gov/data/2022/acs/flows?get=COUNTY1,COUNTY1_NAME,STATE1_NAME,MCD1,MCD1_NAME,NONMOVERS,NONMOVERS_M,COUNTY2,COUNTY2_NAME,STATE2_NAME,FROMABROAD,FROMABROAD_M,FROMDIFFCTY,FROMDIFFCTY_M,FROMDIFFMCD,FROMDIFFMCD_M,FROMDIFFMETRO,FROMDIFFMETRO_M,FROMDIFFSTATE,FROMDIFFSTATE_M,FROMELSEWHEREUSPR,FROMELSEWHEREUSPR_M&for=county:005&in=state:36&key=INSERT CENSUS API KEY'

To ensure the endpoint is accessible and the request is valid, we check the API's response status in the line below:

In [12]:
response = requests.get(endpoint)
response.status_code

200

Next, we're converting the endpoint's response into JSON and storing it under "data." Then we are calling up all the rows after the first row: 

In [13]:
data = response.json ()
data[1:]

[['005',
  'Bronx County',
  'New York',
  None,
  None,
  '1310403',
  '4638',
  None,
  None,
  'Alabama',
  '9919',
  '1396',
  '31106',
  '2730',
  None,
  None,
  None,
  None,
  '8544',
  '1037',
  '103902',
  '4627',
  '36',
  '005'],
 ['005',
  'Bronx County',
  'New York',
  None,
  None,
  '1310403',
  '4638',
  None,
  None,
  'Arizona',
  '9919',
  '1396',
  '31106',
  '2730',
  None,
  None,
  None,
  None,
  '8544',
  '1037',
  '103902',
  '4627',
  '36',
  '005'],
 ['005',
  'Bronx County',
  'New York',
  None,
  None,
  '1310403',
  '4638',
  None,
  None,
  'California',
  '9919',
  '1396',
  '31106',
  '2730',
  None,
  None,
  None,
  None,
  '8544',
  '1037',
  '103902',
  '4627',
  '36',
  '005'],
 ['005',
  'Bronx County',
  'New York',
  None,
  None,
  '1310403',
  '4638',
  None,
  None,
  'Colorado',
  '9919',
  '1396',
  '31106',
  '2730',
  None,
  None,
  None,
  None,
  '8544',
  '1037',
  '103902',
  '4627',
  '36',
  '005'],
 ['005',
  'Bronx County',
 

To make the scrapped information more comprehensible, we are adding it into a dataframe and renaming it's columns to the values listed in the "target_dict" variable:

In [14]:
df = pd.DataFrame(data [1:], columns = data[0])
df

Unnamed: 0,COUNTY1,COUNTY1_NAME,STATE1_NAME,MCD1,MCD1_NAME,NONMOVERS,NONMOVERS_M,COUNTY2,COUNTY2_NAME,STATE2_NAME,...,FROMDIFFMCD,FROMDIFFMCD_M,FROMDIFFMETRO,FROMDIFFMETRO_M,FROMDIFFSTATE,FROMDIFFSTATE_M,FROMELSEWHEREUSPR,FROMELSEWHEREUSPR_M,state,county
0,5,Bronx County,New York,,,1310403,4638,,,Alabama,...,,,,,8544,1037,103902,4627,36,5
1,5,Bronx County,New York,,,1310403,4638,,,Arizona,...,,,,,8544,1037,103902,4627,36,5
2,5,Bronx County,New York,,,1310403,4638,,,California,...,,,,,8544,1037,103902,4627,36,5
3,5,Bronx County,New York,,,1310403,4638,,,Colorado,...,,,,,8544,1037,103902,4627,36,5
4,5,Bronx County,New York,,,1310403,4638,,,Delaware,...,,,,,8544,1037,103902,4627,36,5
5,5,Bronx County,New York,,,1310403,4638,,,District of Columbia,...,,,,,8544,1037,103902,4627,36,5
6,5,Bronx County,New York,,,1310403,4638,,,Florida,...,,,,,8544,1037,103902,4627,36,5
7,5,Bronx County,New York,,,1310403,4638,,,Georgia,...,,,,,8544,1037,103902,4627,36,5
8,5,Bronx County,New York,,,1310403,4638,,,Illinois,...,,,,,8544,1037,103902,4627,36,5
9,5,Bronx County,New York,,,1310403,4638,,,Indiana,...,,,,,8544,1037,103902,4627,36,5


In [15]:
df_bronx = df.rename(columns = target_dict)
df_bronx

Unnamed: 0,reference_fips_county_code,reference_county_name,reference_state_name,reference_mcd_code,reference_mcd_name,total_same_residence,margin_error_same_residence,migrated_from_county_code,migrated_from_county_name,migrated_from_location_name,...,county_total_movers_diff_mcd,margin_error_movers_diff_mcd,county_total_movers_diff_metro,margin_error_diff_metro_area,county_total_movers_diff_state,margin_error_diff_state,county_total_movers_diff_country,margin_error_diff_country,state,county
0,5,Bronx County,New York,,,1310403,4638,,,Alabama,...,,,,,8544,1037,103902,4627,36,5
1,5,Bronx County,New York,,,1310403,4638,,,Arizona,...,,,,,8544,1037,103902,4627,36,5
2,5,Bronx County,New York,,,1310403,4638,,,California,...,,,,,8544,1037,103902,4627,36,5
3,5,Bronx County,New York,,,1310403,4638,,,Colorado,...,,,,,8544,1037,103902,4627,36,5
4,5,Bronx County,New York,,,1310403,4638,,,Delaware,...,,,,,8544,1037,103902,4627,36,5
5,5,Bronx County,New York,,,1310403,4638,,,District of Columbia,...,,,,,8544,1037,103902,4627,36,5
6,5,Bronx County,New York,,,1310403,4638,,,Florida,...,,,,,8544,1037,103902,4627,36,5
7,5,Bronx County,New York,,,1310403,4638,,,Georgia,...,,,,,8544,1037,103902,4627,36,5
8,5,Bronx County,New York,,,1310403,4638,,,Illinois,...,,,,,8544,1037,103902,4627,36,5
9,5,Bronx County,New York,,,1310403,4638,,,Indiana,...,,,,,8544,1037,103902,4627,36,5


Here we are counting the total number of movers who moved into The Bronx based on the state or location they lived in before:

In [16]:
df_bronx["migrated_from_location_name"].value_counts()

migrated_from_location_name
Alabama                 1
Virginia                1
North Dakota            1
Ohio                    1
Oklahoma                1
Oregon                  1
Pennsylvania            1
Rhode Island            1
South Carolina          1
Tennessee               1
Texas                   1
Washington              1
New York                1
West Virginia           1
Puerto Rico             1
Africa                  1
Asia                    1
Central America         1
Caribbean               1
Europe                  1
U.S. Island Areas       1
Northern America        1
North Carolina          1
New Mexico              1
Arizona                 1
Kansas                  1
California              1
Colorado                1
Delaware                1
District of Columbia    1
Florida                 1
Georgia                 1
Illinois                1
Indiana                 1
Iowa                    1
Kentucky                1
New Jersey              1
Louisiana 

We saved our df_bronx dataframe into a csv file in our output folder:

In [17]:
df_bronx.to_csv("../output/bronx_migration_flows_2022.csv")