# CYPLAN101 Fall 2025 - Lab 4: ACS Data Exploration with Python

## Learning Objectives:
* Define U.S. Census and ACS data sources
* Compare decennial census with ACS estimates and identify strengths and limitations
* Identify spatial extent of census data
* Discuss similarities and differences between federal data schemas
* Outline and apply tools that enable access and application of Census Data

---

## Part I. Why Geography Matters

- Geography shapes how the U.S. Census Bureau samples and aggregates data.
- **Geography ≠ visualization only**: it's embedded in data collection, sampling units, and margin-of-error computation.
- Example of misunderstanding: comparing population estimates without aligning geographic levels.

### Geographic Hierarchy of the U.S. Census

- Geographic areas are organized in a geographic hierarchy.

- Larger units, like states, include smaller units, like counties and census tracts.

- This structure is derived from the legal, administrative, or areal relationships of the entities.

<img src="https://www.census.gov/programs-surveys/acs/geography-acs/concepts-definitions/_jcr_content/root/responsivegrid/expcolaccordioncore209/section_1/imagecore.coreimg.jpeg/1678878894635/acs-geographic-hierarchy.jpeg" style="height:500px" />

<p>Image source: Census.org</p>

<!-- - Nation → State → County → Place (City) → [Zip Code Tabulation Area (ZCTA)](https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html) → Tract → Block Group -->
- Tracts: ~4,000 people; Block Groups: ~1,500
- **Note**: ACS does not publish at the *block* level.
- Explore glossary: [Census Glossary](https://www.census.gov/programs-surveys/geography/about/glossary.html)


### FIPS Codes (Federal Information Processing Series)

The Census Bureau has used **FIPS codes** for over 30 years. These are standardized numeric codes used to identify political and statistical areas such as:

* States
* Counties
* Cities and towns
* Native American areas
---


## ❗️ Try it Out ❗️
FIPS codes are usually **assigned alphabetically** within each level of geography. Importantly, they’re **nested**—for example:

* Texas has a state FIPS code of `48`
* Harris County within Texas has a county FIPS of `201`
* So Harris County's full FIPS code is `48201`

This nesting ensures each code is **unique nationwide**.

## Step 1 - Navigate to Census Geocoder Tool
Navigate to the following link:  [https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress?form](https://geocoding.geo.census.gov/geocoder/geographies/onelineaddress?form)

## Step 2 - Type in an address
Retain default values for benchmark and vintage.

## Step 3 - Enter in the GEOID and the expected FIPS code for the **TRACT** in the code cell below
Formatting instructions are included in the comments.


In [None]:
# A. Based on the tabular data returned for the address you inputted, what is the expected FIPs Code?
state_code = "06"
county_code = "001"
tract_code = "423602"

# The GEOID is the complete FIPS code (concatenation of state + county + tract)
fips_code = state_code + county_code + tract_code

# Printing your FIPS Code
print(f"FIPS Code Breakdown:")
print(f"  State Code (2 digits): {state_code}")
print(f"  County Code (3 digits): {county_code}")
print(f"  Tract Code (6 digits): {tract_code}")
print(f"  Complete FIPS Code (11 digits): {fips_code}")


# B. Based on the tabular data returned for the address you inputted, what GEOID is associated with the address?
my_GEOID = "06001423602"

print(f"The GEOID for my inputted address is: {my_GEOID}")
print(f"Are the GEOID and complete FIS Code the same? {my_GEOID == fips_code}")


FIPS Code Breakdown:
  State Code (2 digits): 06
  County Code (3 digits): 001
  Tract Code (6 digits): 423602
  Complete FIPS Code (11 digits): 06001423602
The GEOID for my inputted address is: 06001423602
Are the GEOID and complete FIS Code the same? True


---
### Geographic Identifiers (GEOIDs)


To work with U.S. Census and survey data, we need a way to *consistently identify places*. That’s where **GEOIDs** (Geographic Identifiers) come in.

GEOIDs are numeric codes that uniquely identify all administrative/legal and statistical geographic areas for which the Census Bureau tabulates data.

They are unique codes used to link demographic data to specific locations like states, counties, cities, or even blocks.

These codes are maintained by different organizations, including:

* The **U.S. Census Bureau**
* The **American National Standards Institute (ANSI)**
* The **U.S. Geological Survey (USGS)**
* The **Department of Education**
* Individual **state governments**


From Alaska, the largest state, to the smallest census block in New York City, every geographic area has a unique GEOID.

Data users rely on GEOIDs to accurately **join** demographic data from sources like the American Community Survey (ACS) to different geographic areas for analysis, interpretation, and mapping. Without a shared identifier between geographic and demographic datasets, it becomes much harder to match the right data to the right place, slowing down analysis and increasing the risk of errors.

> #### Tip: GEOIDs != Geoid
> There is a concept in GIS called **geoid**, for calculating surface elevation. [NOAA - Geoid](https://oceanservice.noaa.gov/facts/geoid.html). This has nothing to do with our GEOIDs!

### Putting It Together: How GEOIDs Are Built

GEOIDs often combine multiple codes to represent geographic nesting. For example:


| Area Type          | GEOID Structure                | Example (Alameda County, CA) |
| ------------------ | ------------------------------ | ---------------------------- |
| State              | STATE (2 digits)               | `06`                         |
| County             | STATE + COUNTY (2+3)           | `06001`                      |
| Census Tract       | STATE + COUNTY + TRACT (2+3+6) | `06001400100`                |
| Census Block Group | STATE + COUNTY + TRACT + BG    | `060014001001`               |

**Breakdown**:

* `06` = California
* `001` = Alameda County
* `400100` = Census Tract 4001.00 (in, e.g., Oakland)
* `1` = Block Group 1 within that tract

This hierarchical structure enables precise geographic targeting and seamless joining of tabular data with spatial data in GIS or data analysis tools.
---

##❗️ Try it Out ❗️
Here you will try downloading data from [data.census.gov](https://data.census.gov) and finding the GEOID in downloaded census data.

## Step 1: Begin by searching for a geography using the basic search tool


## Step 2: Locate a table of your choosing
For example, you might look for the American Community Survey (ACS) Demographic and Housing Estimates (1-year) for Berkeley in 2023--thats table DP05 for reference.

## Step 3: Download the corresponding data as a `.zip` with GEOIDS and use the code cell below to load it in using pandas.
Remember you need to upload the file into Colab to do this.


In [None]:
import pandas as pd

# Read CSV file and skip the second row (contains variable descriptions)
my_census_export = pd.read_csv('ACSDP1Y2024.DP05-Data.csv', skiprows=[1])

# Display the first 5 rows
print(my_census_export.head())

           GEO_ID                             NAME  DP05_0001E DP05_0001M  \
0  0500000US45015  Berkeley County, South Carolina      264276      *****   

   DP05_0002E  DP05_0002M  DP05_0003E  DP05_0003M  DP05_0004E  DP05_0004M  \
0      131083        1447      133193        1447        98.4         2.2   

   ...  DP05_0104PM  DP05_0105PE  DP05_0105PM  DP05_0106PE  DP05_0106PM  \
0  ...          0.9          (X)          (X)       193732          (X)   

   DP05_0107PE  DP05_0107PM  DP05_0108PE  DP05_0108PM  Unnamed: 434  
0         48.8          0.6         51.2          0.6           NaN  

[1 rows x 435 columns]


##❗️ Try it Out (Cont'd.)❗️
## Step 4: Break down those GEOIDs!
Use the code cell below to print a breakdown of one of your table's GEOIDs.  

You'll often see a column called `GEO.ID`. It contains a long identifier like:

```
0500000US06001
```

This code breaks down as follows:

* `050` = Summary level (in this case, county)
* `0000` = Geographic variant and component (often all zeroes for standard geographies)
* `US` = United States
* `06` = California (state FIPS)
* `001` = Alameda County (county FIPS)

So `0500000US06001` uniquely identifies Alameda County, CA in census tabular data. This GEOID matches the one used in TIGER/Line shapefiles, enabling clean joins between spatial and demographic data.
>#### Tip: Why GEOIDs Matter
>* GEOIDs allow us to match data to geography **accurately** and **efficiently**.
>* They help avoid confusion when different datasets refer to the same places.
>* They're essential for mapping, analysis, and policy-making across many domains—planning, education, environment, health, and more.

In [None]:
# Specify the row index you want to analyze
row_index = 0  # Change this to the row you want to examine

# Get the GEO_ID value (adjust column name if needed)
geoid = my_census_export['GEO_ID'].iloc[row_index]

print(f"Full GEO_ID: {geoid}")
print(f"Breakdown:")
print(f"  Summary Level: {geoid[:3]}")
print(f"  Geographic Variant: {geoid[3:7]}")
print(f"  Country: {geoid[7:9]}")
print(f"  State FIPS: {geoid[9:11]}")
print(f"  County FIPS: {geoid[11:14]}")

Full GEO_ID: 0500000US45015
Breakdown:
  Summary Level: 050
  Geographic Variant: 0000
  Country: US
  State FIPS: 45
  County FIPS: 015


> Tip: Does the result of this code conform to the standard described above?  Are there missing characters?  How might this be resolved?
---
## Part II. ACS Products and Geography

In this section we will load in census-related data using the census package.  We'll also introduce a few bonus tools.


> ### Understanding Vintages
> - Boundary changes happen every decade (or more frequently).
> - Use **Geocorr** to understand how 2010 vs 2020 tracts differ.
> - Always check which year a 5-year ACS product refers to (it's the *last* year).
> - Vintage example: 2021 ACS 5-year uses 2021 boundaries.
> - [Vintage FAQ](https://www.census.gov/programs-surveys/geography/technical-documentation/vintage.html)

---

## ❗️ Try it Out ❗️
In this section we'll load in some census data using the census package.

### Step 1: Package Install

We begin by installing the required Python packages. You may run the following cell to install any packages not already available in your environment.  Follow the instructions below to do this.

```
# Open the Terminal in the bottom lefthand corner of your screen.
# Run the following command by typing it in manually (case sensitive!)

pip install census
```

> **What is an API?**
API stands for Application Programming Interface.  APIs deliver requests, retrieve the data, and return it in a structured format. APIs are commonly used across many platforms and services to automate data access and make complex systems easier to work with. For example, when you use a program like Python to access U.S. Census data, you're using an API to ask the Census Bureau's database for specific information—like the population of a county or the median household income.

---

### Step 2: Retrieve & Store Census API Key

To access census data, you'll need a Census API key. You can request one from: [https://api.census.gov/data/key_signup.html](https://api.census.gov/data/key_signup.html)

> * **Organization Name:** University of California, Berkeley
* **Email Address:** [YOUR BERKELEY EMAIL]
* Aggree to TOC


Once you have your key, store it securely as an environment variable.

### Step 3: # Load Required Libraries into Your Environment
In addition to the Census, pandas, and plotly libraries, you will also be importing the ```os``` library which helps your Python code talk to the computer it's running on.


In [None]:
from census import Census
import os
import pandas as pd
import plotly.express as px

# Initialize the Census API
censusAPIKey = Census(os.getenv("930200410bc676a8f6f0e04248c9d62d4c91d981."))

##❗️ Try it Out (Cont'd.)❗️
### Step 5: Fetch County-Level Population (Decennial) data

Let's fetch total population (`P001001`) for selected counties in California.

You'll use two methods applied to the censusAPIKey variable, namely,

* ```sf1``` to pull data from the decennial census

* ```state_county_tract``` to get data from the Summary File 1 for a specific state, county, and tract

> **Curious about how to query the Census API?**  
You can access the detailed documentation, including syntax guides and learning materials on the U.S. Census Bureau's website's [Developers](https://www.census.gov/data/developers.html) page.

Using the template code below and the [documentation for the package](https://pypi.org/project/census/#description), write a query to retrieve data using the census library.

> Tip: In this code cell, the state_fips and county_fips parameters are used to tell the Census API exactly which geographic area you want data for



In [None]:
# Manipulate this template code to view a preview of a census data dataframe
popData = censusAPIKey.sf1.state_county_tract(
    fields=['P001001'],
    state_fips=state_code,
    county_fips=county_code,
    tract=tract_code,
    year=2010
)

dfPop = pd.DataFrame(popData)
print(dfPop)