# BRFSS Downloader Notebook

> Download CDC BRFSS annual survey ZIPs (1990–2023) to a local folder for later parsing.

**Output folder:** `data/raw/brfss_zips/`

Source: https://www.cdc.gov/brfss/annual_data/annual_data.htm


## 1) Overview
This notebook downloads **Behavioral Risk Factor Surveillance System (BRFSS)** annual survey archives from the CDC.

- Each year is provided as a ZIP that contains a SAS XPORT (`.XPT`) file.
- We **only download** here. Parsing to CSV/Parquet can be done in a separate step.
- Re-runs are **resume-safe** — existing files are skipped.


## 2) Setup

In [2]:
from pathlib import Path
import requests

# Output folder for ZIP files
OUT_DIR = Path("data/raw/brfss_zips")
OUT_DIR.mkdir(parents=True, exist_ok=True)

# Master URL dict (we extend decade by decade)
BRFSS_URLS = {}

def add_urls(new_dict):
    """Extend the BRFSS_URLS mapping with a new block of years; warn on duplicates."""
    overlap = set(new_dict).intersection(BRFSS_URLS)
    if overlap:
        print("⚠️ Warning: duplicate years found:", overlap)
    BRFSS_URLS.update(new_dict)

def download_urls(url_dict):
    """Download all ZIPs for the provided mapping of year -> URL (skips existing)."""
    for year, url in url_dict.items():
        out_path = OUT_DIR / Path(url).name
        if out_path.exists():
            print(f"{year}: already downloaded ({out_path.name})")
            continue
        try:
            print(f"{year}: downloading…")
            resp = requests.get(url, timeout=180)
            if resp.status_code == 200:
                out_path.write_bytes(resp.content)
                mb = round(out_path.stat().st_size / (1024*1024), 2)
                print(f"{year}: saved {out_path.name}, {mb} MB")
            else:
                print(f"{year}: failed with status {resp.status_code}")
        except Exception as e:
            print(f"{year}: error {e}")

## 3) (Optional) Quick sanity checks for a couple years
Uncomment and run to verify your environment/network access.

In [3]:
# # 1990 test
# url = "https://www.cdc.gov/brfss/annual_data/1990/files/CDBRFS90XPT.zip"
# resp = requests.get(url, timeout=60)
# print("1990 status:", resp.status_code, "| size MB:", round(len(resp.content)/(1024*1024),2))

# # 1991 test
# url = "https://www.cdc.gov/brfss/annual_data/1991/files/CDBRFS91XPT.zip"
# resp = requests.get(url, timeout=60)
# print("1991 status:", resp.status_code, "| size MB:", round(len(resp.content)/(1024*1024),2))

## 4) Download by decade
### 1990–1999

In [4]:
add_urls({
    1990: "https://www.cdc.gov/brfss/annual_data/1990/files/CDBRFS90XPT.zip",
    1991: "https://www.cdc.gov/brfss/annual_data/1991/files/CDBRFS91XPT.zip",
    1992: "https://www.cdc.gov/brfss/annual_data/1992/files/CDBRFS92XPT.zip",
    1993: "https://www.cdc.gov/brfss/annual_data/1993/files/CDBRFS93XPT.zip",
    1994: "https://www.cdc.gov/brfss/annual_data/1994/files/CDBRFS94XPT.zip",
    1995: "https://www.cdc.gov/brfss/annual_data/1995/files/CDBRFS95XPT.zip",
    1996: "https://www.cdc.gov/brfss/annual_data/1996/files/CDBRFS96XPT.zip",
    1997: "https://www.cdc.gov/brfss/annual_data/1997/files/CDBRFS97XPT.zip",
    1998: "https://www.cdc.gov/brfss/annual_data/1998/files/CDBRFS98XPT.zip",
    1999: "https://www.cdc.gov/brfss/annual_data/1999/files/CDBRFS99XPT.zip",
})
download_urls({y: BRFSS_URLS[y] for y in range(1990, 2000)})

1990: downloading…
1990: saved CDBRFS90XPT.zip, 8.79 MB
1991: downloading…
1991: saved CDBRFS91XPT.zip, 9.0 MB
1992: downloading…
1992: saved CDBRFS92XPT.zip, 9.78 MB
1993: downloading…
1993: saved CDBRFS93XPT.zip, 10.03 MB
1994: downloading…
1994: saved CDBRFS94XPT.zip, 11.89 MB
1995: downloading…
1995: saved CDBRFS95XPT.zip, 11.98 MB
1996: downloading…
1996: saved CDBRFS96XPT.zip, 16.25 MB
1997: downloading…
1997: saved CDBRFS97XPT.zip, 16.87 MB
1998: downloading…
1998: saved CDBRFS98XPT.zip, 21.42 MB
1999: downloading…
1999: saved CDBRFS99XPT.zip, 20.93 MB


### 2000–2009

In [5]:
add_urls({
    2000: "https://www.cdc.gov/brfss/annual_data/2000/files/CDBRFS00XPT.ZIP",
    2001: "https://www.cdc.gov/brfss/annual_data/2001/files/CDBRFS01XPT.zip",
    2002: "https://www.cdc.gov/brfss/annual_data/2002/files/CDBRFS02XPT.ZIP",
    2003: "https://www.cdc.gov/brfss/annual_data/2003/files/CDBRFS03XPT.ZIP",
    2004: "https://www.cdc.gov/brfss/annual_data/2004/files/CDBRFS04XPT.zip",
    2005: "https://www.cdc.gov/brfss/annual_data/2005/files/CDBRFS05XPT.zip",
    2006: "https://www.cdc.gov/brfss/annual_data/2006/files/CDBRFS06XPT.ZIP",
    2007: "https://www.cdc.gov/brfss/annual_data/2007/files/CDBRFS07XPT.ZIP",
    2008: "https://www.cdc.gov/brfss/annual_data/2008/files/CDBRFS08XPT.ZIP",
    2009: "https://www.cdc.gov/brfss/annual_data/2009/files/CDBRFS09XPT.ZIP",
})
download_urls({y: BRFSS_URLS[y] for y in range(2000, 2010)})

2000: downloading…
2000: saved CDBRFS00XPT.ZIP, 25.84 MB
2001: downloading…
2001: saved CDBRFS01XPT.zip, 32.39 MB
2002: downloading…
2002: saved CDBRFS02XPT.ZIP, 46.27 MB
2003: downloading…
2003: saved CDBRFS03XPT.ZIP, 48.19 MB
2004: downloading…
2004: saved CDBRFS04XPT.zip, 41.42 MB
2005: downloading…
2005: saved CDBRFS05XPT.zip, 66.72 MB
2006: downloading…
2006: saved CDBRFS06XPT.ZIP, 60.78 MB
2007: downloading…
2007: saved CDBRFS07XPT.ZIP, 100.81 MB
2008: downloading…
2008: saved CDBRFS08XPT.ZIP, 78.57 MB
2009: downloading…
2009: saved CDBRFS09XPT.ZIP, 107.33 MB


### 2010–2019

In [6]:
add_urls({
    2010: "https://www.cdc.gov/brfss/annual_data/2010/files/CDBRFS10XPT.zip",
    2011: "https://www.cdc.gov/brfss/annual_data/2011/files/LLCP2011XPT.ZIP",
    2012: "https://www.cdc.gov/brfss/annual_data/2012/files/LLCP2012XPT.ZIP",
    2013: "https://www.cdc.gov/brfss/annual_data/2013/files/LLCP2013XPT.ZIP",
    2014: "https://www.cdc.gov/brfss/annual_data/2014/files/LLCP2014XPT.ZIP",
    2015: "https://www.cdc.gov/brfss/annual_data/2015/files/LLCP2015XPT.zip",
    2016: "https://www.cdc.gov/brfss/annual_data/2016/files/LLCP2016XPT.zip",
    2017: "https://www.cdc.gov/brfss/annual_data/2017/files/LLCP2017XPT.zip",
    2018: "https://www.cdc.gov/brfss/annual_data/2018/files/LLCP2018XPT.zip",
    2019: "https://www.cdc.gov/brfss/annual_data/2019/files/LLCP2019XPT.zip",
})
download_urls({y: BRFSS_URLS[y] for y in range(2010, 2020)})

2010: downloading…
2010: saved CDBRFS10XPT.zip, 96.21 MB
2011: downloading…
2011: saved LLCP2011XPT.ZIP, 125.45 MB
2012: downloading…
2012: saved LLCP2012XPT.ZIP, 90.99 MB
2013: downloading…
2013: saved LLCP2013XPT.ZIP, 123.0 MB
2014: downloading…
2014: saved LLCP2014XPT.ZIP, 68.9 MB
2015: downloading…
2015: saved LLCP2015XPT.zip, 94.26 MB
2016: downloading…
2016: saved LLCP2016XPT.zip, 79.5 MB
2017: downloading…
2017: saved LLCP2017XPT.zip, 101.79 MB
2018: downloading…
2018: saved LLCP2018XPT.zip, 69.48 MB
2019: downloading…
2019: saved LLCP2019XPT.zip, 93.43 MB


### 2020–2023

In [6]:
add_urls({
    2020: "https://www.cdc.gov/brfss/annual_data/2020/files/LLCP2020XPT.zip",
    2021: "https://www.cdc.gov/brfss/annual_data/2021/files/LLCP2021XPT.zip",
    2022: "https://www.cdc.gov/brfss/annual_data/2022/files/LLCP2022XPT.zip",
    2023: "https://www.cdc.gov/brfss/annual_data/2023/files/LLCP2023XPT.zip",
})
download_urls({y: BRFSS_URLS[y] for y in range(2020, 2024)})

2020: downloading…
2020: saved LLCP2020XPT.zip, 64.37 MB
2021: downloading…
2021: saved LLCP2021XPT.zip, 77.82 MB
2022: downloading…
2022: saved LLCP2022XPT.zip, 80.69 MB
2023: downloading…
2023: saved LLCP2023XPT.zip, 88.92 MB


## 5) Next steps (separate script, 02_parse_brfss.py)

- Normalize schemas across years (e.g., `_STATE`, `_CNTY`, demographics), then save per-year CSV/Parquet to `data/raw/brfss_year/`.

Run 02_parse_brfss_xpt.py next, with the following arguments. Make sure your terminal has changed directory to download_brfss before running.