# Maryland Per Capita Personal Income (2013-2023)

Dataset Source: [Maryland Open Data Portal](https://opendata.maryland.gov/Demographic/Maryland-Per-Capita-Personal-Income-Current-Dollar/nv7y-8663/about_data)

This notebook downloads Maryland Per Capita Personal Income data (current dollars) from 2013-2023 and stores it in a SQL database.

**Dataset Overview:**
- **Time Period:** 2013-2023
- **Frequency:** Annual
- **Geographic Coverage:** Statewide and all 24 Maryland jurisdictions (23 counties + Baltimore City)
- **Data Source:** U.S. Bureau of Economic Analysis (BEA)
- **Provider:** Maryland Department of Planning

In [75]:
# Import required libraries
import os
import io
import warnings
import pandas as pd
import requests
import dotenv
import mssql_python
from math import ceil

dotenv.load_dotenv()

# API Configuration
API_URL = "https://opendata.maryland.gov/api/v3/views/nv7y-8663/query.csv" 
APP_TOKEN = os.getenv("MD_APP_TOKEN")

# Database Configuration
SQL_CONNECTION_STRING = os.getenv("SQL_CONNECTION_STRING")
if not SQL_CONNECTION_STRING:
    raise ValueError(
        "SQL_CONNECTION_STRING environment variable is required. "
        "Set it in your .env file or as an environment variable."
    )

TABLE_NAME = "[Maryland].[dbo].[PerCapitaPersonalIncome]"

print("✓ All imports and configuration loaded")
print("✓ Dataset: Maryland Per Capita Personal Income (Current Dollars)")
print("✓ Source: U.S. Bureau of Economic Analysis (BEA), Maryland Department of Planning")
print("✓ Time Period: 2013-2023")

✓ All imports and configuration loaded
✓ Dataset: Maryland Per Capita Personal Income (Current Dollars)
✓ Source: U.S. Bureau of Economic Analysis (BEA), Maryland Department of Planning
✓ Time Period: 2013-2023


## Step 1: Download Data from Maryland Open Data Portal

In [76]:
# Download data from Maryland Open Data Portal - CSV format
headers = {"X-App-Token": APP_TOKEN} if APP_TOKEN else {}

print("Downloading data from Maryland Open Data Portal...")
response = requests.get(API_URL, params={"accessType": "DOWNLOAD"}, headers=headers, timeout=120)
response.raise_for_status()

# Load CSV data into DataFrame
df = pd.read_csv(io.BytesIO(response.content))

print(f"✓ Dataset loaded: {len(df):,} rows × {len(df.columns)} columns\n")
print("Random sample of 5 records:")
df.sample(min(5, len(df)))

Downloading data from Maryland Open Data Portal...
✓ Dataset loaded: 11 rows × 33 columns

Random sample of 5 records:
✓ Dataset loaded: 11 rows × 33 columns

Random sample of 5 records:


Unnamed: 0,:id,:version,:created_at,:updated_at,date_created,reporting_period_start_date,reporting_period_end_data,year,maryland,allegany_county,...,kent_county,montgomery_county,prince_george_s_county,queen_anne_s_county,somerset_county,st_mary_s_county,talbot_county,washington_county,wicomico_county,worcester_county
8,row-if85.3t9s_rqud,rv-v6ag~fepu~ydt5,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,"september 2, 2025",2021-01-01T00:00:00.000,2021-12-31T00:00:00.000,2021,69081,47462,...,69328,90696,53353,77455,36982,64999,84902,53079,46638,64317
6,row-exc3_8jdy~j7uj,rv-kb2b~4jmi.f786,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,"september 2, 2025",2019-01-01T00:00:00.000,2019-12-31T00:00:00.000,2019,61725,41527,...,58929,84805,47116,67826,31310,58400,73957,45840,40228,56374
5,row-uhgh.k8vj-n3hy,rv-fzsj_ixqu-6c94,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,"september 2, 2025",2018-01-01T00:00:00.000,2018-12-31T00:00:00.000,2018,59919,40117,...,58110,83790,45909,63612,30871,56222,68562,44614,40087,55311
10,row-xcwx~qiah~5cw7,rv-ey6v~r5kh~pgdd,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,"september 2, 2025",2023-01-01T00:00:00.000,2023-12-31T00:00:00.000,2023,75391,49182,...,77025,100044,57096,83650,37345,70353,98166,56174,48596,68163
3,row-msff_hhe2~bufc,rv-fxd9_5xx8~6wwy,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,"september 2, 2025",2016-01-01T00:00:00.000,2016-12-31T00:00:00.000,2016,56614,37532,...,53104,80230,43585,59393,29536,53238,65325,42077,38472,52640


## Step 2: Data Preparation and Cleaning

In [77]:
# Examine the DataFrame structure
print("Dataset Information:")
print(f"Shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nData Types:")
print(df.dtypes)

# Exclude metadata columns for display
metadata_cols_display = ['date_created', 'reporting_period_start_date', 'reporting_period_end_data']
df_display = df[[col for col in df.columns if col not in metadata_cols_display]]

print(f"\nFirst few rows (excluding metadata):")
df_display.head()

Dataset Information:
Shape: (11, 33)

Columns: [':id', ':version', ':created_at', ':updated_at', 'date_created', 'reporting_period_start_date', 'reporting_period_end_data', 'year', 'maryland', 'allegany_county', 'anne_arundel_county', 'baltimore_city', 'baltimore_county', 'calvert_county', 'caroline_county', 'carroll_county', 'cecil_county', 'charles_county', 'dorchester_county', 'frederick_county', 'garrett_county', 'harford_county', 'howard_county', 'kent_county', 'montgomery_county', 'prince_george_s_county', 'queen_anne_s_county', 'somerset_county', 'st_mary_s_county', 'talbot_county', 'washington_county', 'wicomico_county', 'worcester_county']

Data Types:
:id                            object
:version                       object
:created_at                    object
:updated_at                    object
date_created                   object
reporting_period_start_date    object
reporting_period_end_data      object
year                            int64
maryland                  

Unnamed: 0,:id,:version,:created_at,:updated_at,year,maryland,allegany_county,anne_arundel_county,baltimore_city,baltimore_county,...,kent_county,montgomery_county,prince_george_s_county,queen_anne_s_county,somerset_county,st_mary_s_county,talbot_county,washington_county,wicomico_county,worcester_county
0,row-wgc2.cx8z-9r2s,rv-efbf.damm.tjm7,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,2013,51880,33017,56776,39910,50262,...,47548,72296,41436,52996,27472,49188,59495,38715,35215,47114
1,row-tjan.tw8h~sj9w,rv-izns-5bdg.zvpm,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,2014,53084,34736,58365,41963,51577,...,49822,73009,41859,54706,28917,50355,61431,40049,37096,48659
2,row-2ef6_4y4k.hwdu,rv-ng42.b538~8j55,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,2015,55041,35730,60207,43898,53029,...,51640,76987,42714,56402,30064,52185,63704,41130,38461,52995
3,row-msff_hhe2~bufc,rv-fxd9_5xx8~6wwy,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,2016,56614,37532,61667,45271,53894,...,53104,80230,43585,59393,29536,53238,65325,42077,38472,52640
4,row-5skj.favr-rtjv,rv-2gqb_k4wp~ha8c,2025-09-25T18:46:00.698Z,2025-09-25T18:46:00.698Z,2017,58251,38590,63346,46787,55677,...,55942,82175,44600,61992,30815,54151,66408,42968,39691,54627


In [78]:
# Clean column names - convert to lowercase with underscores
df.columns = [col.strip().lower().replace(" ", "_").replace("/", "_").replace("-", "_") for col in df.columns]

# Drop Socrata metadata columns (columns starting with ':')
df = df[[col for col in df.columns if not col.startswith(':')]]

# Convert year column to regular float (will be converted to int later, keeping as float to handle NaN)
if 'year' in df.columns:
    df['year'] = pd.to_numeric(df['year'], errors='coerce')

# Convert all income columns to numeric (they come as strings from API)
# All columns except metadata columns should be numeric income values
metadata_cols = ['date_created', 'reporting_period_start_date', 'reporting_period_end_data', 'year']
income_cols = [col for col in df.columns if col not in metadata_cols]

for col in income_cols:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Remove rows where year is null
df = df[df['year'].notna()]

# Sort by year
df = df.sort_values('year').reset_index(drop=True)

print(f"✓ Data cleaned and prepared")
print(f"✓ Final shape: {df.shape}")
print(f"✓ Year range: {df['year'].min():.0f} to {df['year'].max():.0f}")

# Exclude metadata columns for display
metadata_cols_display = ['date_created', 'reporting_period_start_date', 'reporting_period_end_data']
df_display = df[[col for col in df.columns if col not in metadata_cols_display]]

print(f"\nCleaned column names (excluding metadata):")
print(list(df_display.columns))
print(f"\nSample data:")
df_display.head(3)

✓ Data cleaned and prepared
✓ Final shape: (11, 29)
✓ Year range: 2013 to 2023

Cleaned column names (excluding metadata):
['year', 'maryland', 'allegany_county', 'anne_arundel_county', 'baltimore_city', 'baltimore_county', 'calvert_county', 'caroline_county', 'carroll_county', 'cecil_county', 'charles_county', 'dorchester_county', 'frederick_county', 'garrett_county', 'harford_county', 'howard_county', 'kent_county', 'montgomery_county', 'prince_george_s_county', 'queen_anne_s_county', 'somerset_county', 'st_mary_s_county', 'talbot_county', 'washington_county', 'wicomico_county', 'worcester_county']

Sample data:


Unnamed: 0,year,maryland,allegany_county,anne_arundel_county,baltimore_city,baltimore_county,calvert_county,caroline_county,carroll_county,cecil_county,...,kent_county,montgomery_county,prince_george_s_county,queen_anne_s_county,somerset_county,st_mary_s_county,talbot_county,washington_county,wicomico_county,worcester_county
0,2013,51880,33017,56776,39910,50262,52699,38922,51272,40384,...,47548,72296,41436,52996,27472,49188,59495,38715,35215,47114
1,2014,53084,34736,58365,41963,51577,54038,39963,53025,41083,...,49822,73009,41859,54706,28917,50355,61431,40049,37096,48659
2,2015,55041,35730,60207,43898,53029,56469,40828,55190,42619,...,51640,76987,42714,56402,30064,52185,63704,41130,38461,52995


## Step 3: Load Data to SQL Server Database

In [79]:
# Connect to SQL Server and load data
conn = mssql_python.connect(SQL_CONNECTION_STRING)
cursor = conn.cursor()

print("✓ Connecting to database...")

# Create table if it doesn't exist
create_table_sql = f"""
IF NOT EXISTS (SELECT * FROM sys.tables WHERE name = 'PerCapitaPersonalIncome' AND schema_id = SCHEMA_ID('dbo'))
BEGIN
    CREATE TABLE {TABLE_NAME} (
        Year INT,
        Maryland DECIMAL(18,2),
        Allegany DECIMAL(18,2),
        AnneArundel DECIMAL(18,2),
        BaltimoreCity DECIMAL(18,2),
        Baltimore DECIMAL(18,2),
        Calvert DECIMAL(18,2),
        Caroline DECIMAL(18,2),
        Carroll DECIMAL(18,2),
        Cecil DECIMAL(18,2),
        Charles DECIMAL(18,2),
        Dorchester DECIMAL(18,2),
        Frederick DECIMAL(18,2),
        Garrett DECIMAL(18,2),
        Harford DECIMAL(18,2),
        Howard DECIMAL(18,2),
        Kent DECIMAL(18,2),
        Montgomery DECIMAL(18,2),
        PrinceGeorges DECIMAL(18,2),
        QueenAnnes DECIMAL(18,2),
        Somerset DECIMAL(18,2),
        StMarys DECIMAL(18,2),
        Talbot DECIMAL(18,2),
        Washington DECIMAL(18,2),
        Wicomico DECIMAL(18,2),
        Worcester DECIMAL(18,2)
    )
    PRINT 'Table created'
END
"""

cursor.execute(create_table_sql)
conn.commit()
print("✓ Table verified/created successfully")

# Truncate table before inserting to ensure clean data load
cursor.execute(f"TRUNCATE TABLE {TABLE_NAME}")
conn.commit()
print("✓ Table truncated - ready for fresh data load")

# Prepare insert statement with all columns
# Map DataFrame columns to new camel case SQL column names (excluding metadata)
column_mapping = {
    'year': 'Year',
    'maryland': 'Maryland',
    'allegany_county': 'Allegany',
    'anne_arundel_county': 'AnneArundel',
    'baltimore_city': 'BaltimoreCity',
    'baltimore_county': 'Baltimore',
    'calvert_county': 'Calvert',
    'caroline_county': 'Caroline',
    'carroll_county': 'Carroll',
    'cecil_county': 'Cecil',
    'charles_county': 'Charles',
    'dorchester_county': 'Dorchester',
    'frederick_county': 'Frederick',
    'garrett_county': 'Garrett',
    'harford_county': 'Harford',
    'howard_county': 'Howard',
    'kent_county': 'Kent',
    'montgomery_county': 'Montgomery',
    'prince_george_s_county': 'PrinceGeorges',
    'queen_anne_s_county': 'QueenAnnes',
    'somerset_county': 'Somerset',
    'st_mary_s_county': 'StMarys',
    'talbot_county': 'Talbot',
    'washington_county': 'Washington',
    'wicomico_county': 'Wicomico',
    'worcester_county': 'Worcester'
}

# Exclude metadata columns from insert
metadata_cols_exclude = ['date_created', 'reporting_period_start_date', 'reporting_period_end_data']
ordered_cols = [col for col in df.columns if col not in metadata_cols_exclude]
sql_column_names = [column_mapping[col] for col in ordered_cols]
placeholders = ', '.join(['?'] * len(ordered_cols))
column_names = ', '.join(f'[{col}]' for col in sql_column_names)
insert_sql = f"INSERT INTO {TABLE_NAME} ({column_names}) VALUES ({placeholders})"

# Convert DataFrame to list of Python native type records
records = []
for _, row in df.iterrows():
    record = []
    for col in ordered_cols:
        val = row[col]

        # Normalize to native Python types for SQL driver
        # 1) Missing values
        if pd.isna(val):
            record.append(None)
            continue

        # 2) Explicitly handle NumPy / pandas scalar types first
        try:
            import numpy as np  # safe even if already imported
        except ModuleNotFoundError:
            np = None

        if np is not None and isinstance(val, (np.generic, np.number)):
            record.append(val.item())
            continue

        # 3) Native Python primitives
        if isinstance(val, (int, float, str)):
            record.append(val)
            continue

        # 4) Other objects exposing .item()
        if hasattr(val, "item"):
            record.append(val.item())
            continue

        # 5) Fallback: stringify
        record.append(str(val))

    records.append(tuple(record))

# Insert records one by one (executemany has type compatibility issues)
total_records = len(records)
total_inserted = 0

print(f"\n✓ Starting insert of {total_records:,} records...\n")

# Process records individually
for idx, record in enumerate(records, 1):
    cursor.execute(insert_sql, record)
    total_inserted += 1
    
    # Commit every 100 records and show progress
    if idx % 100 == 0:
        conn.commit()
        print(f"  ✓ Progress: {total_inserted:,} / {total_records:,} records inserted ({total_inserted/total_records*100:.1f}%)")

# Final commit for any remaining records
conn.commit()
print(f"\n✓ All {total_inserted:,} records committed successfully")

cursor.close()
conn.close()

print(f"\n✓ Successfully loaded {total_inserted:,} records to {TABLE_NAME}")
print(f"✓ Year range: {df['year'].min():.0f} to {df['year'].max():.0f}")


✓ Connecting to database...
✓ Table verified/created successfully
✓ Table truncated - ready for fresh data load

✓ Starting insert of 11 records...


✓ All 11 records committed successfully

✓ Successfully loaded 11 records to [Maryland].[dbo].[PerCapitaPersonalIncome]
✓ Year range: 2013 to 2023

✓ Starting insert of 11 records...


✓ All 11 records committed successfully

✓ Successfully loaded 11 records to [Maryland].[dbo].[PerCapitaPersonalIncome]
✓ Year range: 2013 to 2023


## Step 4: Verify Data in SQL Server

In [80]:
# Read data back from SQL to verify
query = f"SELECT * FROM {TABLE_NAME} ORDER BY year"

# Suppress the pandas SQLAlchemy warning for mssql_python connections
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning, message=".*pandas only supports SQLAlchemy.*")
    conn = mssql_python.connect(SQL_CONNECTION_STRING)
    try:
        df_verify = pd.read_sql(query, conn)
    finally:
        conn.close()

# Exclude metadata columns to focus on year and income data
metadata_cols_verify = ['date_created', 'reporting_period_start_date', 'reporting_period_end_data']
df_verify = df_verify[[col for col in df_verify.columns if col not in metadata_cols_verify]]

print(f"✓ Verification complete")
print(f"✓ Records in database: {len(df_verify):,}")
print(f"✓ Columns (excluding metadata): {len(df_verify.columns)}")
print(f"\nFirst 5 records from database:")
df_verify.head()

✓ Verification complete
✓ Records in database: 11
✓ Columns (excluding metadata): 26

First 5 records from database:


Unnamed: 0,Year,Maryland,Allegany,AnneArundel,BaltimoreCity,Baltimore,Calvert,Caroline,Carroll,Cecil,...,Kent,Montgomery,PrinceGeorges,QueenAnnes,Somerset,StMarys,Talbot,Washington,Wicomico,Worcester
0,2013,51880.0,33017.0,56776.0,39910.0,50262.0,52699.0,38922.0,51272.0,40384.0,...,47548.0,72296.0,41436.0,52996.0,27472.0,49188.0,59495.0,38715.0,35215.0,47114.0
1,2014,53084.0,34736.0,58365.0,41963.0,51577.0,54038.0,39963.0,53025.0,41083.0,...,49822.0,73009.0,41859.0,54706.0,28917.0,50355.0,61431.0,40049.0,37096.0,48659.0
2,2015,55041.0,35730.0,60207.0,43898.0,53029.0,56469.0,40828.0,55190.0,42619.0,...,51640.0,76987.0,42714.0,56402.0,30064.0,52185.0,63704.0,41130.0,38461.0,52995.0
3,2016,56614.0,37532.0,61667.0,45271.0,53894.0,57748.0,41882.0,56627.0,43736.0,...,53104.0,80230.0,43585.0,59393.0,29536.0,53238.0,65325.0,42077.0,38472.0,52640.0
4,2017,58251.0,38590.0,63346.0,46787.0,55677.0,59271.0,43893.0,58636.0,45066.0,...,55942.0,82175.0,44600.0,61992.0,30815.0,54151.0,66408.0,42968.0,39691.0,54627.0


## Summary

**Data Pipeline Complete! ✓**

This notebook successfully:
1. **Downloaded** Maryland Per Capita Personal Income data from the Maryland Open Data Portal
2. **Cleaned and transformed** the data (normalized column names, converted data types)
3. **Loaded** the data into SQL Server database table `[Maryland].[dbo].[PerCapitaPersonalIncome]`
4. **Verified** the data was successfully stored

**Dataset Details:**
- **Source:** U.S. Bureau of Economic Analysis (BEA), Maryland Department of Planning
- **Time Period:** 2013-2023 (Annual)
- **Geographic Coverage:** Maryland statewide + all 23 counties + Baltimore City (24 jurisdictions)
- **Data Fields:** Year, per capita personal income for each jurisdiction

**Next Steps:**
- Use this data for economic analysis and comparisons across Maryland jurisdictions
- Join with other Maryland datasets (e.g., Operating Budget) for comprehensive analysis
- Create visualizations and trend analysis for per capita income changes over time