Link for all available US Census APIs: https://www.census.gov/data/developers/data-sets.html

Convert JSON to CSV data:https://www.convertcsv.com/json-to-csv.htm

# Relevent Database
# PART A - American Community Survey (ACS)

## American Community Survey 1-Year Data (2005-2021)
The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Much of the ACS data provided on the Census Bureau's Web site are available separately by age group, race, Hispanic origin, and sex.

Detailed Tables, Subject Tables, Data Profiles, Comparison Profiles and Selected Population Profiles are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of **65,000 or more**.

## American Community Survey 1-Year Supplemental Data (2014 - 2021)
The supplemental estimates consist of high-level detailed tables tabulated on the 1-year microdata for **geographies with populations of 20,000 or more.  The intention of this product is to allow people with smaller populations key estimates that are more current than the 5-Year file**.

These high-level estimates are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 20,000 or more.

Variable coding: https://api.census.gov/data/2021/acs/acsse/variables.html

## American Community Survey 5-Year Data (2009-2021)
The 5-year estimates from the ACS are "period" estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups.

The 5-year estimates are available for all geographies down to the block group level. See Supported Geography for details on each product’s published summary levels. **In total, there are 87 different summary levels available with over 578,000 geographic areas. Unlike the 1-year estimates, geographies do not have to meet a particular population threshold in order to be published**. Detail Tables, Subject Tables, Data Profiles, and Comparison Profiles include the following geographies: nation, all states (including DC and Puerto Rico), all metropolitan areas, all congressional districts (116th congress), all counties, all places, all tracts and block groups.

ACS has non-overlapping datasets that allow comparisons of current ACS data to past ACS data.  **The 2017-2021 ACS 5-Year estimates can be compared with 2012-2016 ACS 5-Year estimates**.  For information on comparability of the 2017-2021 ACS 5-Year estimates to the 2012-2016 estimates by topic, please visit the Comparing 2021 American Community Survey Data page.

- Detailed Tables contain the most detailed cross-tabulations, many of which are published down to block groups. The data are population counts. There are over 20,000 variables in this dataset.
- Subject Tables provide an overview of the estimates available in a particular topic.  The data are presented as population counts and percentages.  There are over 18,000 variables in this dataset. 
- Data Profiles contain broad social, economic, housing, and demographic information. The data are presented as population counts and percentages. There are over 1,000 variables in this dataset.
- Comparison Profiles are similar to Data Profiles but also include comparisons with past-year data.  The current year data are compared with prior 5-Year data and include statistical significance testing.  There are over 1,000 variables in this dataset.

**I recommend using comparison profiles if we intend to use this datasets**.
**Variable coding for 2021 comparison profiles: https://api.census.gov/data/2021/acs/acs5/cprofile/variables.html**

## American Community Survey Migration Flows
These migration flows are derived from the household and group quarter locations sampled in the American Community Survey (ACS) and the responses to the migration questions on the questionnaire.

They are period estimates that measure where people lived when surveyed (current residence) and where they lived 1 year prior (residence 1 year ago). **The data are collected continuously over a 5-year period in order to provide a large enough sample for estimates in smaller geographies. The flow estimates resemble the annual number of movers between counties for the 5-year period data was collected.**

The flow files produced are:

**County-to-county** — available starting with the 2006-2010 5-year ACS
**County/minor civil division (MCD)-to-county/MCD** — provides flows between MCDs for the 12 strong-MCD states and counties in the other states. The strong-MCD states are states where the MCDs, a type of county subdivision, serve as general purpose governments and provide the same government functions as incorporated places. The strong-MCD states are Connecticut, Maine, Massachusetts, Michigan, Minnesota, New Hampshire, New Jersey, New York, Pennsylvania, Rhode Island, Wisconsin, and Vermont. These files have been available starting with the 2006-2010 5-year ACS.
**Metropolitan statistical area (MSA)-to-MSA** — available starting with the 2009-2013 5-year ACS

For each dataset starting with the 2006-2010 5-year ACS, the flow files have been crossed by selected characteristics. A characteristic will not be reused in overlapping 5-year ACS datasets. For instance, age was one of the characteristics for 2006-2010. Age flows will not be produced again until the 2011-2015 flows. These flows with characteristics are subject to suppression for disclosure avoidance purposes. County and county/MCD flows are suppressed if they consist of only one or two people in different households or group quarters. Complimentary suppression is used for the MSA flows.

More info on this dataset: https://www.census.gov/data/developers/data-sets/acs-migration-flows.html
Variable coding for 2020 dataset: http://localhost:8889/tree/Desktop/23Spring/PIC16B/GitHub/Air-Pandas/US-Census-Data

**We may or may not interested in or need this dataset. Most of the variables are population move in/out between some geographic division. I think it is hard to cooporate this dataset to our research, tho there might be some ways to do so.However, there interactive mapping tools might be helpful.** 

### Example: ACS 2021 1-year data

### Detailed Tables 
Detailed Tables contain the most detailed cross-tabulations published for areas 65k and more. The data are population counts. There are over 31,000 variables in this dataset.

Variable Coding: https://api.census.gov/data/2021/acs/acs1.html , 
https://api.census.gov/data/2021/acs/acs1/variables.html

In [5]:
from bs4 import BeautifulSoup
import requests
import csv

# Fetch the HTML content from the URL
url = 'https://api.census.gov/data/2021/acs/acs1.html'
response = requests.get(url)
html = response.content

# Parse the HTML content using BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')

# Find the table element that contains the data
table = soup.find('table')

# Extract the header row from the table
header_row = table.find('thead').find('tr')
header = [th.text for th in header_row.find_all('th')]

# Extract the data rows from the table
data_rows = table.find('tbody').find_all('tr')
data = []
for row in data_rows:
    row_data = [td.text for td in row.find_all('td')]
    data.append(row_data)

# Write the data to a CSV file
with open('output.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(header)
    writer.writerows(data)


TypeError: __init__() got an unexpected keyword argument 'strict'

Geography level examples: https://api.census.gov/data/2021/acs/acs1/examples.html

In [9]:
# e.g,: https://api.census.gov/data/2021/acs/acs1?get=NAME,B05010_001E&for=county:*&in=state:06&key=0165a43518f2f172d9f6fcd55a6da52443c35f27
# get=NAME,B05010_001E&for=county:*&in=state:06
# Get the county name and B05010_001E - RATIO OF INCOME TO POVERTY LEVEL IN THE PAST 12 MONTHS BY NATIVITY OF CHILDREN UNDER 18 YEARS IN FAMILIES AND SUBFAMILIES BY LIVING ARRANGEMENTS AND NATIVITY OF PARENTS
# for all county in california.
# We could convert this url to csv file through the link given above

Answers from CHATGPT:

In the 2021 American Community Survey 1-Year Data detailed table, there are several variables that are related to income, race, and ethnicity. Here are some examples:

1. Income-related variables:
- Median household income (in dollars)
- Per capita income (in dollars)
- Percentage of households with income below poverty level
- Percentage of individuals with income below poverty level
- Percentage of households with income between $50,000 and $99,999
- Percentage of households with income over $100,000

2. Race-related variables:
- Total population by race
- Percentage of population by race (e.g., White alone, Black or African American alone, American Indian and Alaska Native alone, Asian alone, Native Hawaiian and Other Pacific Islander alone, Some other race alone, Two or more races)
- Median age by race
- Percentage of population that identifies as Hispanic or Latino

3. Ethnicity-related variables:
- Total population by ethnicity (Hispanic or Latino and Not Hispanic or Latino)
- Percentage of population by ethnicity (Hispanic or Latino and Not Hispanic or Latino)
- Median age by ethnicity
- Percentage of households with limited English proficiency

Note that there are many more variables in the 2021 American Community Survey 1-Year Data detailed table that are related to income, race, and ethnicity. The variables listed above are just a few examples.

### Subject tables
Subject tables provide an overview of the estimates available in a particular topic. The data are presented as population counts and percentages. There are over 16,000 variables in this dataset.

Variable Coding:https://api.census.gov/data/2021/acs/acs1/subject/variables.html

Answers from CHATGPT:





1. Income: 
- Median Household Income 
- Per Capita Income 
- Income Below Poverty Level 
- Median Earnings for Full-Time, Year-Round Workers 

2. Employment: 
- Employment Status 
- Industry 
- Occupation 
- Worked Full-Time, Year-Round 
- Worked Part-Time, Less Than Year-Round 
- Unemployment Rate 

3. Commute: 
- Commuting to Work 
- Mean Travel Time to Work 
- Vehicles Available 
- Means of Transportation to Work 

4. Race: 
- Race Alone or in Combination 
- White 
- Black or African American 
- American Indian and Alaska Native 
- Asian 
- Native Hawaiian and Other Pacific Islander 

5. Ethnicity: 
- Hispanic or Latino Origin 
- Not Hispanic or Latino Origin 
- Hispanic or Latino Origin by Race 
- Not Hispanic or Latino Origin by Race

### Data profiles
Data profiles contain broad social, economic, housing, and demographic information. The data are presented as population counts and percentages. There are over 1,000 variables in this dataset.

Variable Coding:https://api.census.gov/data/2021/acs/acs1/profile/variables.html

Answers from CHATGPT:


Income: 
- Median household income 
- Median family income 
- Per capita income 
- Percentage of households with income below poverty level 
- Percentage of individuals with income below poverty level 

Employment: 
- Percentage of population 16 years and over in labor force 
- Percentage of population 16 years and over employed 
- Percentage of population 16 years and over unemployed 
- Percentage of employed population working in management, business, science, and arts occupations 

Commute: 
- Percentage of workers 16 years and over commuting by car, truck, or van 
- Percentage of workers 16 years and over commuting by public transportation 
- Average commute time in minutes 

Race: 
- Percentage of population by race: White, Black or African American, American Indian and Alaska Native, Asian, Native Hawaiian and Other Pacific Islander, and Some Other Race 

Ethnicity: 
- Percentage of population by Hispanic or Latino origin and by race 
- Percentage of households with a householder who is Hispanic or Latino origin

### Comparison profiles
Comparison profiles are similar to data profiles but also include comparisons with past-year data. The current year data are compared with each of the last four years of data and include statistical significance testing. There are over 1,000 variables in this dataset.

Variable Coding:https://api.census.gov/data/2021/acs/acs1/cprofile/variables.html

**I would recommend pulling data from Comparison profiles if we chose ACS to work on, we have 1,000 comparison variables basically cover the topics we want to work on.**

### Selected Population Profiles
Selected Population Profiles provide broad social, economic, and housing profiles for a large number of race, ethnic, ancestry, and country/region of birth groups.  The data are presented as population counts for the total population and various subgroups and percentages.

Variable Coding: https://api.census.gov/data/2021/acs/acs1/spp/variables.html

**I would also recommend pulling data from selected population profiles if we chose ACS to get some summary numbers** 

**I also attached the 2021-1yr-api-changes (variable changes) in the file.**

**All datesets from other year is pretty much the same.**

# PART B - Population Estimates and Projections

Population Estimates Program
Each year, the Census Bureau's Population Estimates Program uses current data on births, deaths, and migration to calculate population change since the most recent decennial census and produces a time series of estimates of population, demographic components of change, and housing units. The annual time series of estimates begins with the most recent decennial census data and extends to the vintage year.

**I don't think we need this dataset now. However, if we want to show some relationship between population and air quality envolvement in the future, we might go back to this dataset.**

**Link for more info: https://www.census.gov/data/developers/data-sets/popest-popproj.html** 

# PART C - Poverty Statistics: CPS & SAIPE (Time Series: various years)

Link for this dataset: https://www.census.gov/data/developers/data-sets/Poverty-Statistics.html

**Sources of Poverty Data**
- The CPS ASEC provides the most timely and accurate national data on income and is the official source of national poverty estimates, hence it is the preferred source for national analysis.  The CPS ASEC provides a consistent historical time series beginning in **1959 at the national level and can also be used to look at state-level trends and differences going back to 1980. However, the relatively large sampling errors of state-level estimates for smaller states somewhat limit their usefulness.**
- **The American Community Survey (ACS) releases annual subnational estimates of income and poverty for all places, counties, and metropolitan areas with a population of at least 65,000 as well as the nation and the states. The sample size of this survey is over 3 million addresses per year, making the ACS exceptionally useful for subnational analyses.** Three-year period estimates are available for areas and subpopulations as small as 20,000 and five-year period estimates are available for all geographies, including census tracts, block groups and small subgroups of the population. ACS estimates are updated every year. Because of its large sample size, estimates from the fully implemented ACS provide the best survey-based state level income and poverty estimates. The ACS was fully-implemented in 2006.
- The Small Area Income and Poverty Estimates (SAIPE) program produces single-year estimates of median household income and poverty for states and all counties, as well as population and poverty estimates for school districts. Since SAIPE estimates combine ACS data with administrative and other data, **SAIPE estimates generally have lower variance than ACS estimates but are released later because they incorporate ACS data in the models. For counties and school districts, particularly those with populations below 65,000, the SAIPE program provides the most accurate subnational estimates of poverty. For counties, SAIPE generally provides the best single-year estimates of median household income.**

**These datasets might be a great alternative if we want to study race and poverty level in a longer period of time, tho the variables are limited and we only have few of them.**  

# PART D - Business Dynamics Statistics (BDS) Time Series (1978 – 2020)

Link: https://www.census.gov/data/developers/data-sets/business-dynamics.html

The Business Dynamics Statistics (BDS) dataset is available as a Census Bureau API. BDS provides annual measures of business dynamics (such as job creation and destruction, establishment births and deaths, and firm startups and shutdowns) for the economy and aggregated by establishment and firm characteristics.

For more information on the BDS, see the BDS homepage. For information about some of the key differences between the data available in the BDS API and the 2020 BDS CSV Datasets, users should refer to the following document: Data Tables & CSV Datasets.

**I left this datasets here for reference if we want to do some really interesting analysis between variables like the industry evolvement, job creation/destruction, etc with air quality.**

# PART E - County Business Patterns (1986-2021) and Nonemployer Statistics (1997-2019)

Link: https://www.census.gov/data/developers/data-sets/cbp-nonemp-zbp.html

County Business Patterns provides annual statistics for businesses with paid employees within the U.S., Puerto Rico, and Island Areas at a detailed geography and industry level.

Statistics are available on business establishments at the U.S. level and by State, County, Metropolitan area, ZIP code levels, and congressional districts (beginning in the 2013 reference year). Data for Puerto Rico and the Island Areas are available at the State and county equivalent levels. CBP covers most NAICS industries.

**Same reason as above, I left these APIs here for reference.**

# PART F - Longitudinal Employer-Household Dynamics (LEHD)

## Quarterly Workforce Indicators (QWI) (Time Series: 1990 - present)

Link: https://www.census.gov/data/developers/data-sets/qwi.html

The QWI are a set of 32 economic indicators including employment, job creation/destruction, wages, hires, and other measures of employment flows. The QWI are reported based on detailed firm characteristics (geography, industry, age, size) and worker demographics (sex, age, education, race, ethnicity) and are available tabulated to national*, state, metropolitan/micropolitan areas, county, and workforce investment areas (WIA). The QWI are unique in their ability to track both firm and worker characteristics over time – enabling analyses such as a longitudinal look at wages by worker sex and age across counties, ranking job creation rates of young firms across NAICS industry groups, and comparing hiring levels by worker race and education levels across a selection of metropolitan areas.

**These datasets might be an alternative if we want to query empoyment and variables related to that in a longer period of time.**

# PART G - Annual Public Sector Statistics (1942 to present)

Link: https://www.census.gov/data/developers/data-sets/annual-public-sector-stats.html

The U.S. Census Bureau Public Sector Statistics provides annual and Census of Governments (for years ending in 2 and 7) statistics about state and local governments that are essential to understanding the American economy. These statistics provide the most comprehensive and precise measure of U.S. state and local governments’ economic activity, which comprise nearly 90 thousand governments. It also serves as a statistical benchmark for current economic activity, such as, the National Income and Product Accounts and the Gross Domestic Product. This API is presented as a time-series for data from 2012 forward for all public sector surveys, except Government Organization, which is available for 1942 forward.

Data are available for all state and local governments in the United States. Local governments include:

counties
cities
townships
special districts (such as water districts, fire districts, library districts, mosquito abatement districts, and so on)
school districts
These data provide policy analysts, researchers, and the general public with a more complete and clear picture of the public sector and  provide information to assist in addressing the issues that concern state and local governments and serve as the foundation for developing national economic and public policy.

**I don't know what we could do with this datasets right now, tho I feel like there might be some interesting connection if we want to do some analysis on policy & air quality. So I left this datasets here for reference.**

# PART H - Economic Indicators (Time Series: various years - present)

Link: https://www.census.gov/data/developers/data-sets/economic-indicators.html

The U.S. Census Bureau's economic indicator surveys provide monthly and quarterly data that are timely, reliable, and offer comprehensive measures of the U.S. economy. These surveys produce a variety of statistics covering construction, housing, international trade, retail trade, wholesale trade, services and manufacturing. The survey data provide measures of economic activity that allow analysis of economic performance and inform business investment and policy decisions. Other data included, which are not considered principal economic indicators, are the Quarterly Summary of State & Local Taxes, Quarterly Survey of Public Pensions, and the Manufactured Homes Survey.
For information on the reliability and use of the data, including important notes on estimation and sampling variance, seasonal adjustment, measures of sampling variability, and other information pertinent to the economic indicators, visit the individual programs' webpages linked from the Economic Briefing Room.

**This one is a another big APIs about US economic indicators. I left it here for reference.**

# PART I - Geography Program

## Census Geocoding Services

Link: https://www.census.gov/data/developers/data-sets/Geocoding-services.html

Geocoding Basics

Geocoding is the process of inputting an address and receiving back latitude/longitude coordinates calculated along an address range. The parts of the address provided to the geocoding application determine the level of detail of the geocode returned. The building number and street name are required. City name, state, and ZIP code are optional.

Geocoding Services Details
There are two entry points for the geocoding service: single-record submission and batch submission.

The single-record service allows for all of the address parts to be submitted in a single line or as separate fields. The batch service requires each field to exist (either with text or blank) in a delimited form, preceded by a unique ID.

The optional inclusion of the Geographic Lookup (GeoLookup) adds information to the result relating to various levels of geography that encompass the latitude/longitude coordinates. GeoLookup results can also be obtained directly by searching on the latitude/longitude coordinates.

The latitude/longitude coordinates returned are based on data loaded into the geocoding engine from a MAF/TIGER benchmark database.

Audience
This document is intended for application, website, and mobile developers

## Census TIGERweb GeoServices REST API

Link: https://www.census.gov/data/developers/data-sets/TIGERweb-map-service.html

GeoServices REST Specification
The GeoServices REST Specification provides a way for Web clients to communicate with geographic information system (GIS) servers through Representational State Transfer (REST) technology.

The specification is

A proven and easy to understand method for a broad range of clients and applications to request map, feature, attribute, and image information from a GIS server.
A JSON-based, REST-ful specification that will make the GIS server instantly usable by thousands of developers working in popular client-side development environments with the ArcGIS Web mapping APIs for JavaScript™, Flex™, Silverlight®, iOS®, and Android™.
Use of the GeoServices REST Specification is subject to the current Open Web Foundation Agreement. The Open Web Foundation (OWF) is an independent non-profit dedicated to the development and protection of open, non-proprietary specifications for web technologies. Terms and conditions of the OWF Agreement are subject to change without notice.

Deocumentation / specifications for this service are owned and maintained by ESRI and can be downloaded by esri.com/opengeoservices Off Site. Here is a current link.



**These features might be HELPFUL when we build interactive visualizations! And might also help us with the analysis process.**