In [1]:
'''
Mapping of the 35 CES National not-seasonally-adjusted (CEU) series IDs to
QCEW CSV industry slice parameters.

CES series ID structure (program=CE):
  positions 1-2: prefix ('CE')
  position  3:   seasonal adjustment ('S'=SA, 'U'=NSA)
  positions 4-5: supersector code
  positions 6-11: industry code (6 digits, zero-padded)
  positions 12-13: data type ('01' = all employees, thousands)


CES → QCEW exact mapping analysis. 27 of the 35 CEU series map to a single QCEW row:

 Totals:
   (none -- both Total nonfarm and Total private have ag scope issues)

 Domains:
   CEU0800000001  Private service-providing  → QCEW(102, own=5, agglvl=12)

 Supersectors (9 of 10 -- Mining & Logging is inexact):
   CEU2000000001  Construction              → QCEW(1012, own=5, agglvl=13)
   CEU3000000001  Manufacturing             → QCEW(1013, own=5, agglvl=13)
   CEU4000000001  TTU                       → QCEW(1021, own=5, agglvl=13)
   CEU5000000001  Information               → QCEW(1022, own=5, agglvl=13)
   CEU5500000001  Financial activities      → QCEW(1023, own=5, agglvl=13)
   CEU6000000001  Prof/business services    → QCEW(1024, own=5, agglvl=13)
   CEU6500000001  Ed/health services        → QCEW(1025, own=5, agglvl=13)
   CEU7000000001  Leisure/hospitality       → QCEW(1026, own=5, agglvl=13)
   CEU8000000001  Other services            → QCEW(1027, own=5, agglvl=13)

 Government (3 of 4 -- Total Government is inexact):
   CEU9091000001  Federal                   → QCEW(10, own=1, agglvl=11)
   CEU9092000001  State                     → QCEW(10, own=2, agglvl=11)
   CEU9093000001  Local                     → QCEW(10, own=3, agglvl=11)

 Sectors (14 of 16 -- Durable/Nondurable goods are inexact):
   CEU1021000001  Mining (NAICS 21)         → QCEW(21, own=5, agglvl=14)
   CEU4142000001  Wholesale trade (42)      → QCEW(42, own=5, agglvl=14)
   CEU4200000001  Retail trade (44-45)      → QCEW(44-45, own=5, agglvl=14)
   CEU4300000001  Transport/warehouse(48-49)→ QCEW(48-49, own=5, agglvl=14)
   CEU4422000001  Utilities (22)            → QCEW(22, own=5, agglvl=14)
   CEU5552000001  Finance/insurance (52)    → QCEW(52, own=5, agglvl=14)
   CEU5553000001  Real estate (53)          → QCEW(53, own=5, agglvl=14)
   CEU6054000001  Prof/tech services (54)   → QCEW(54, own=5, agglvl=14)
   CEU6055000001  Mgmt of companies (55)    → QCEW(55, own=5, agglvl=14)
   CEU6056000001  Admin/waste services (56) → QCEW(56, own=5, agglvl=14)
   CEU6561000001  Educational services (61) → QCEW(61, own=5, agglvl=14)
   CEU6562000001  Health care (62)          → QCEW(62, own=5, agglvl=14)
   CEU7071000001  Arts/entertainment (71)   → QCEW(71, own=5, agglvl=14)
   CEU7072000001  Accommodation/food (72)   → QCEW(72, own=5, agglvl=14)

QCEW CSV industry slice URL pattern:
  https://data.bls.gov/cew/data/api/{year}/{qtr}/industry/{industry_code}.csv

Within each CSV slice, filter on:
  area_fips, own_code, agglvl_code, size_code
  
CES → QCEW inexact mapping analysis. 8 CEU series that cannot be retrieved from QCEW 
in a single row-filter, and provides the exact QCEW arithmetic needed to reconstruct
the CES concept.

Key structural mismatches:
  1. Agriculture/logging scope: CES "Mining & Logging" = NAICS 21 + 1133.
     QCEW supersector 1011 = ALL of NAICS 11 + NAICS 21. This propagates
     upward to Goods-producing, Total private, and Total nonfarm.

  2. Government classification: CES treats government as a flat supersector
     (90) regardless of industry. QCEW distributes government employees
     across industries by ownership code. CES "Service-providing" =
     private service-providing + ALL government, but QCEW service-providing
     domain (102) with own=0 misses government workers in goods-producing
     industries (federal construction workers, government-owned mines, etc.)

  3. Durable/nondurable manufacturing: BLS-defined aggregations that exist
     in CES but not in QCEW's classification hierarchy.

QCEW filter notation used below:
  QCEW(industry, own, agglvl) means: filter the industry CSV slice to
  area_fips='US000', own_code=own, agglvl_code=agglvl, size_code='0'.
  When agglvl is omitted, it's implied by the industry level.
  NAICS codes at 4+ digits use agglvl=16 (4-digit) or 17 (5-digit).

References:
  - Industry codes: https://www.bls.gov/cew/classifications/industry/industry-titles.htm
  - Ownership codes: https://www.bls.gov/cew/classifications/ownerships/ownership-titles.htm
  - Aggregation levels: https://www.bls.gov/cew/classifications/aggregation/agg-level-titles.htm
  - CSV slices: https://www.bls.gov/cew/additional-resources/open-data/csv-data-slices.htm
'''

import polars as pl

In [2]:
# ── QCEW ownership codes ────────────────────────────────────────────────
# 0 = Total Covered
# 5 = Private
# 1 = Federal Government
# 2 = State Government
# 3 = Local Government
# 8 = Total Government

# ── QCEW aggregation level codes (national only) ────────────────────────
# 10 = National, Total Covered
# 11 = National, Total -- by ownership sector
# 12 = National, by Domain -- by ownership sector
# 13 = National, by Supersector -- by ownership sector
# 14 = National, by NAICS Sector -- by ownership sector

# ── QCEW industry codes (above-NAICS BLS-defined) ───────────────────────
# 10     = Total, all industries
# 101    = Goods-producing (domain)
# 102    = Service-providing (domain)
# 1011   = Natural resources and mining (supersector)
# 1012   = Construction (supersector)
# 1013   = Manufacturing (supersector)
# 1021   = Trade, transportation, and utilities (supersector)
# 1022   = Information (supersector)
# 1023   = Financial activities (supersector)
# 1024   = Professional and business services (supersector)
# 1025   = Education and health services (supersector)
# 1026   = Leisure and hospitality (supersector)
# 1027   = Other services (supersector)
# 1028   = Public administration (supersector)

# ── The mapping ──────────────────────────────────────────────────────────
# Each entry maps a CEU series ID to the QCEW CSV slice parameters needed
# to retrieve the corresponding record.
#
# Fields:
#   ces_series_id     - The not-seasonally-adjusted CES series ID
#   ces_description   - Human-readable description of the CES concept
#   ces_level         - 'total' | 'domain' | 'supersector' | 'sector' | 'government'
#   qcew_industry     - QCEW industry_code to use in the CSV slice URL
#   qcew_industry_url - URL-safe version (hyphens → underscores)
#   own_code          - QCEW ownership code to filter on
#   agglvl_code       - QCEW aggregation level code to filter on
#   area_fips         - Always 'US000' (national)
#   size_code         - Always '0' (all sizes)
#   notes             - Mapping caveats

EXACT_SERIES = [
    # ── Total / aggregate levels ─────────────────────────────────────────
    # Inexact:
    #   'ces_series_id': 'CEU0000000001'
    #   'ces_description': 'Total Non-Farm'
    # ── Domains ──────────────────────────────────────────────────────────
    # Inexact:
    #   'ces_series_id': 'CEU0500000001'
    #   'ces_description': 'Total Private'
    # Inexact:
    #   'ces_series_id': 'CEU0600000001'
    #   'ces_description': 'Goods-Producing Industries'
    # Inexact:
    #   'ces_series_id': 'CEU0700000001'
    #   'ces_description': 'Service-Providing Industries'
    {
        'ces_series_id': 'CEU0800000001',
        'ces_description': 'Private Service-Providing',
        'ces_level': 'domain',
        'ces_code': '08',
        'qcew_sign': '+',
        'qcew_industry': '102',
        'qcew_industry_url': '102',
        'own_code': '5',
        'agglvl_code': '12',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },

    # ── Supersectors (private) ───────────────────────────────────────────
    # Inexact:
    #   'ces_series_id': 'CEU1000000001'
    #   'ces_description': 'Natural Resources and Mining'
    {
        'ces_series_id': 'CEU2000000001',
        'ces_description': 'Construction',
        'ces_level': 'supersector',
        'ces_code': '20',
        'qcew_sign': '+',
        'qcew_industry': '1012',
        'qcew_industry_url': '1012',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU3000000001',
        'ces_description': 'Manufacturing',
        'ces_level': 'supersector',
        'ces_code': '30',
        'qcew_sign': '+',
        'qcew_industry': '1013',
        'qcew_industry_url': '1013',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU4000000001',
        'ces_description': 'Trade, Transportation, and Utilities',
        'ces_level': 'supersector',
        'ces_code': '40',
        'qcew_sign': '+',
        'qcew_industry': '1021',
        'qcew_industry_url': '1021',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU5000000001',
        'ces_description': 'Information',
        'ces_level': 'supersector',
        'ces_code': '50',
        'qcew_sign': '+',
        'qcew_industry': '1022',
        'qcew_industry_url': '1022',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU5500000001',
        'ces_description': 'Financial Activities',
        'ces_level': 'supersector',
        'ces_code': '55',
        'qcew_sign': '+',
        'qcew_industry': '1023',
        'qcew_industry_url': '1023',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU6000000001',
        'ces_description': 'Professional and Business Services',
        'ces_level': 'supersector',
        'ces_code': '60',
        'qcew_sign': '+',
        'qcew_industry': '1024',
        'qcew_industry_url': '1024',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU6500000001',
        'ces_description': 'Education and Health Services',
        'ces_level': 'supersector',
        'ces_code': '65',
        'qcew_sign': '+',
        'qcew_industry': '1025',
        'qcew_industry_url': '1025',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': (
            'CES "Education and health services" is private only. '
            'QCEW supersector 1025 with own_code=5 matches this.'
        ),
    },
    {
        'ces_series_id': 'CEU7000000001',
        'ces_description': 'Leisure and Hospitality',
        'ces_level': 'supersector',
        'ces_code': '70',
        'qcew_sign': '+',
        'qcew_industry': '1026',
        'qcew_industry_url': '1026',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU8000000001',
        'ces_description': 'Other Services',
        'ces_level': 'supersector',
        'ces_code': '80',
        'qcew_sign': '+',
        'qcew_industry': '1027',
        'qcew_industry_url': '1027',
        'own_code': '5',
        'agglvl_code': '13',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    # Inexact:
    #   'ces_series_id': 'CEU9000000001'
    #   'ces_description': 'Government'
    # ── Sectors within supersectors (private, NAICS-based) ───────────────
    # Inexact:
    #   'ces_series_id': 'CEU3100000001'
    #   'ces_description': 'Durable Goods'
    # Inexact:
    #   'ces_series_id': 'CEU3200000001'
    #   'ces_description': 'Nondurable Goods'
    {
        'ces_series_id': 'CEU1021000001',
        'ces_description': 'Mining, Quarrying, and Oil and Gas Extraction',
        'ces_level': 'sector',
        'ces_code': '21',
        'qcew_sign': '+',
        'qcew_industry': '21',
        'qcew_industry_url': '21',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU4142000001',
        'ces_description': 'Wholesale Trade',
        'ces_level': 'sector',
        'ces_code': '42',
        'qcew_sign': '+',
        'qcew_industry': '42',
        'qcew_industry_url': '42',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU4200000001',
        'ces_description': 'Retail Trade',
        'ces_level': 'sector',
        'ces_code': '44',
        'qcew_sign': '+',
        'qcew_industry': '44-45',
        'qcew_industry_url': '44_45',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU4300000001',
        'ces_description': 'Transportation and Warehousing',
        'ces_level': 'sector',
        'qcew_sign': '+',
        'qcew_industry': '48-49',
        'qcew_industry_url': '48_49',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU4422000001',
        'ces_description': 'Utilities',
        'ces_level': 'sector',
        'ces_code': '22',
        'qcew_sign': '+',
        'qcew_industry': '22',
        'qcew_industry_url': '22',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': (
            'Utilities (NAICS 22) is in CES supersector 40 '
            '(Trade/Transportation/Utilities) but NAICS places it '
            'in a standalone sector. QCEW sector code 22 is correct.'
        ),
    },
    {
        'ces_series_id': 'CEU5552000001',
        'ces_description': 'Finance and Insurance',
        'ces_level': 'sector',
        'ces_code': '52',
        'qcew_sign': '+',
        'qcew_industry': '52',
        'qcew_industry_url': '52',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU5553000001',
        'ces_description': 'Real Estate and Rental and Leasing',
        'ces_level': 'sector',
        'ces_code': '53',
        'qcew_sign': '+',
        'qcew_industry': '53',
        'qcew_industry_url': '53',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU6054000001',
        'ces_description': 'Professional, Scientific, and Technical Services',
        'ces_level': 'sector',
        'ces_code': '54',
        'qcew_sign': '+',
        'qcew_industry': '54',
        'qcew_industry_url': '54',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU6055000001',
        'ces_description': 'Management of Companies and Enterprises',
        'ces_level': 'sector',
        'ces_code': '55',
        'qcew_sign': '+',
        'qcew_industry': '55',
        'qcew_industry_url': '55',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU6056000001',
        'ces_description': 'Administrative and Support and Waste Management and Remediation Services',
        'ces_level': 'sector',
        'ces_code': '56',
        'qcew_sign': '+',
        'qcew_industry': '56',
        'qcew_industry_url': '56',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU6561000001',
        'ces_description': 'Private Educational Services',
        'ces_level': 'sector',
        'ces_code': '61',
        'qcew_sign': '+',
        'qcew_industry': '61',
        'qcew_industry_url': '61',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': (
            'CES private educational services only. Matches QCEW NAICS '
            '61 with own_code=5 (private).'
        ),
    },
    {
        'ces_series_id': 'CEU6562000001',
        'ces_description': 'Health Care and Social Assistance',
        'ces_level': 'sector',
        'ces_code': '62',
        'qcew_sign': '+',
        'qcew_industry': '62',
        'qcew_industry_url': '62',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU7071000001',
        'ces_description': 'Arts, Entertainment, and Recreation',
        'ces_level': 'sector',
        'ces_code': '71',
        'qcew_sign': '+',
        'qcew_industry': '71',
        'qcew_industry_url': '71',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU7072000001',
        'ces_description': 'Accommodation and Food Services',
        'ces_level': 'sector',
        'ces_code': '72',
        'qcew_sign': '+',
        'qcew_industry': '72',
        'qcew_industry_url': '72',
        'own_code': '5',
        'agglvl_code': '14',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU9091000001',
        'ces_description': 'Federal Government',
        'ces_level': 'sector',
        'ces_code': '91',
        'qcew_sign': '+',
        'qcew_industry': '10',
        'qcew_industry_url': '10',
        'own_code': '1',
        'agglvl_code': '11',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': (
            'QCEW federal government coverage excludes military and '
            'some intelligence agency employees. CES federal government '
            'also excludes military but uses different source data (OPM).'
        ),
    },
    {
        'ces_series_id': 'CEU9092000001',
        'ces_description': 'State Government',
        'ces_level': 'sector',
        'ces_code': '92',
        'qcew_sign': '+',
        'qcew_industry': '10',
        'qcew_industry_url': '10',
        'own_code': '2',
        'agglvl_code': '11',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
    {
        'ces_series_id': 'CEU9093000001',
        'ces_description': 'Local Government',
        'ces_level': 'sector',
        'ces_code': '93',
        'qcew_sign': '+',
        'qcew_industry': '10',
        'qcew_industry_url': '10',
        'own_code': '3',
        'agglvl_code': '11',
        'area_fips': 'US000',
        'size_code': '0',
        'notes': '',
    },
]

In [3]:
def get_mapping_df() -> pl.DataFrame:
    
    '''Return the CES-to-QCEW mapping as a Polars DataFrame.'''
    
    return pl.DataFrame(EXACT_SERIES)


def build_qcew_url(year: int, qtr: int, industry_url: str) -> str:
    
    '''
    Build a QCEW CSV industry slice URL.

    Parameters
    ----------
    year : int
        Reference year (e.g., 2024).
    qtr : int
        Quarter (1-4).
    industry_url : str
        URL-safe industry code (e.g., '10', '44_45', '48_49').

    Returns
    -------
    str
        Full URL to the QCEW CSV industry slice.
    '''
    
    return (
        f'https://data.bls.gov/cew/data/api/{year}/{qtr}'
        f'/industry/{industry_url}.csv'
    )


def get_mappable_series() -> pl.DataFrame:
    
    '''Return only the series that have a direct QCEW equivalent.'''
    
    return get_mapping_df().filter(pl.col('qcew_industry').is_not_null())


def get_unmappable_series() -> pl.DataFrame:
    
    '''Return series that have no direct QCEW equivalent.'''

    return get_mapping_df().filter(pl.col('qcew_industry').is_null())

In [4]:
exact_df = get_mapping_df()

print(f'Exactly Mapped CEU series: {len(exact_df)} (of 35 Total CEU series)')

Exactly Mapped CEU series: 27 (of 35 Total CEU series)


In [5]:
# ── Mismatch type A: agriculture/logging scope ───────────────────────────
#
# Root cause: CES supersector 10 ("Mining and Logging") is defined as:
#   NAICS 21 (Mining, quarrying, oil & gas) + NAICS 1133 (Logging)
#
# QCEW supersector 1011 ("Natural resources and mining") is defined as:
#   ALL of NAICS 11 (Agriculture, forestry, fishing, hunting) + NAICS 21
#
# The wedge = NAICS 11 minus NAICS 1133 (i.e., agriculture, forestry
# support, fishing, hunting, timber tract ops, forest nurseries).
#
# This difference propagates upward through every aggregate that
# contains Mining & Logging: Goods-producing, Total private, Total nonfarm.

# ── Mismatch type B: government classification ───────────────────────────
#
# Root cause: In CES, supersector 90 (Government) contains ALL government
# employees regardless of what industry function they perform. A federal
# construction worker is in CES Government, not CES Construction.
#
# In QCEW, that same federal construction worker appears under
# NAICS 23 (Construction) with own_code=1 (Federal Government).
#
# This means CES "Service-providing" (CEU07) = private service supersectors
# + ALL government, which cannot be matched by QCEW(102, own=0) because
# that misses government employees in goods-producing industries.

# ── Mismatch type C: no QCEW equivalent ──────────────────────────────────
#
# CES publishes durable and nondurable manufacturing aggregates that are
# BLS-defined groupings of 3-digit NAICS industries. QCEW has no
# corresponding aggregate code.


INEXACT_SERIES = [
    # ─────────────────────────────────────────────────────────────────────
    # TYPE A: Agriculture/logging scope mismatch
    # ─────────────────────────────────────────────────────────────────────
    {
        'ces_series_id': 'CEU1000000001',
        'ces_description': 'Natural Resources and Mining',
        'ces_level': 'supersector',
        'ces_code': '10',
        'mismatch_type': 'A',
        'mismatch_reason': (
            'CES = NAICS 21 (Mining) + NAICS 1133 (Logging). '
            'QCEW 1011 = all of NAICS 11 + NAICS 21. '
            'QCEW includes agriculture, fishing, hunting, forestry '
            'support activities, timber tract ops, and forest nurseries '
            'that CES excludes.'
        ),
        'formula': 'QCEW(21, own=5, agglvl=14) + QCEW(1133, own=5, agglvl=16)',
        'formula_verbose': (
            'Sum private NAICS 21 (mining sector) and private NAICS 1133 '
            '(logging, 4-digit industry). Requires two QCEW industry '
            'slice calls: industry/21.csv and industry/1133.csv.'
        ),
        'qcew_calls': [
            {
                'industry_url': '21',
                'own_code': '5',
                'agglvl_code': '14',
                'sign': '+',
                'description': 'Mining (NAICS sector 21, private)',
            },
            {
                'industry_url': '1133',
                'own_code': '5',
                'agglvl_code': '16',
                'sign': '+',
                'description': 'Logging (NAICS 4-digit 1133, private)',
            },
        ],
    },
    {
        'ces_series_id': 'CEU0600000001',
        'ces_description': 'Goods-producing',
        'ces_level': 'domain',
        'ces_code': '06',
        'mismatch_type': 'A',
        'mismatch_reason': (
            'CES goods-producing = Mining & Logging + Construction + '
            'Manufacturing (all private). QCEW domain 101 includes '
            'all of NAICS 11 in "Natural resources and mining". '
            'Inherits the agriculture scope mismatch from Mining & Logging.'
        ),
        'formula': (
            'QCEW(101, own=5, agglvl=12) '
            '- QCEW(11, own=5, agglvl=14) '
            '+ QCEW(1133, own=5, agglvl=16)'
        ),
        'formula_verbose': (
            'Start with QCEW goods-producing domain (private), subtract '
            'all of NAICS 11 (agriculture/forestry/fishing/hunting, private), '
            'then add back logging (NAICS 1133, private).'
        ),
        'qcew_calls': [
            {
                'industry_url': '101',
                'own_code': '5',
                'agglvl_code': '12',
                'sign': '+',
                'description': 'Goods-producing domain (private)',
            },
            {
                'industry_url': '11',
                'own_code': '5',
                'agglvl_code': '14',
                'sign': '-',
                'description': 'NAICS 11 Agriculture etc. (private)',
            },
            {
                'industry_url': '1133',
                'own_code': '5',
                'agglvl_code': '16',
                'sign': '+',
                'description': 'Logging (NAICS 1133, private)',
            },
        ],
    },
    {
        'ces_series_id': 'CEU0500000001',
        'ces_description': 'Total private',
        'ces_level': 'domain',
        'ces_code': '05',
        'mismatch_type': 'A',
        'mismatch_reason': (
            'CES total private = all private nonfarm. QCEW total private '
            '(industry=10, own=5) includes UI-covered agriculture '
            'establishments. Inherits the agriculture scope mismatch.'
        ),
        'formula': (
            'QCEW(10, own=5, agglvl=11) '
            '- QCEW(11, own=5, agglvl=14) '
            '+ QCEW(1133, own=5, agglvl=16)'
        ),
        'formula_verbose': (
            'Start with QCEW total private, subtract all of private '
            'NAICS 11, add back private logging (NAICS 1133).'
        ),
        'qcew_calls': [
            {
                'industry_url': '10',
                'own_code': '5',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'Total, all industries (private)',
            },
            {
                'industry_url': '11',
                'own_code': '5',
                'agglvl_code': '14',
                'sign': '-',
                'description': 'NAICS 11 Agriculture etc. (private)',
            },
            {
                'industry_url': '1133',
                'own_code': '5',
                'agglvl_code': '16',
                'sign': '+',
                'description': 'Logging (NAICS 1133, private)',
            },
        ],
    },
    {
        'ces_series_id': 'CEU0000000001',
        'ces_description': 'Total nonfarm',
        'ces_level': 'national',
        'ces_code': '00',
        'mismatch_type': 'A',
        'mismatch_reason': (
            'CES total nonfarm = total private nonfarm + all government. '
            'QCEW total covered includes UI-covered agriculture across '
            'all ownerships. Must subtract NAICS 11 by individual '
            'ownership (own=0 not published at agglvl=14) and add '
            'back private logging.'
        ),
        'formula': (
            'QCEW(10, own=0, agglvl=10) '
            '- sum(QCEW(11, own=x, agglvl=14) for x in [5,1,2,3]) '
            '+ QCEW(1133, own=5, agglvl=16)'
        ),
        'formula_verbose': (
            'Start with QCEW total covered (all ownerships), subtract '
            'NAICS 11 by summing across individual ownerships '
            '(no "all ownerships" row exists at agglvl=14), '
            'then add back private logging (NAICS 1133).'
        ),
        'qcew_calls': [
            {
                'industry_url': '10',
                'own_code': '0',
                'agglvl_code': '10',
                'sign': '+',
                'description': 'Total covered (all ownerships)',
            },
            {
                'industry_url': '11',
                'own_code': '5',
                'agglvl_code': '14',
                'sign': '-',
                'description': 'NAICS 11 Agriculture etc. (private)',
            },
            {
                'industry_url': '11',
                'own_code': '1',
                'agglvl_code': '14',
                'sign': '-',
                'description': 'NAICS 11 Agriculture etc. (federal)',
            },
            {
                'industry_url': '11',
                'own_code': '2',
                'agglvl_code': '14',
                'sign': '-',
                'description': 'NAICS 11 Agriculture etc. (state)',
            },
            {
                'industry_url': '11',
                'own_code': '3',
                'agglvl_code': '14',
                'sign': '-',
                'description': 'NAICS 11 Agriculture etc. (local)',
            },
            {
                'industry_url': '1133',
                'own_code': '5',
                'agglvl_code': '16',
                'sign': '+',
                'description': 'Logging (NAICS 1133, private only)',
            },
        ],
    },

    # ─────────────────────────────────────────────────────────────────────
    # TYPE B: Government classification mismatch
    # ─────────────────────────────────────────────────────────────────────
    {
        'ces_series_id': 'CEU0700000001',
        'ces_description': 'Service-providing',
        'ces_level': 'domain',
        'ces_code': '07',
        'mismatch_type': 'B',
        'mismatch_reason': (
            'CES service-providing = private service supersectors + ALL '
            'government. QCEW(102, own=0) only captures government '
            'employees in service-providing industries, missing government '
            'workers in goods-producing industries (federal construction, '
            'government mining, military base manufacturing, etc.).'
        ),
        'formula': (
            'QCEW(102, own=5, agglvl=12) '
            '+ QCEW(10, own=1, agglvl=11) '
            '+ QCEW(10, own=2, agglvl=11) '
            '+ QCEW(10, own=3, agglvl=11)'
        ),
        'formula_verbose': (
            'Sum private service-providing domain + federal + state + '
            'local government (all industries). own_code=8 (Total '
            'Government) is not published in CSV slices, so sum '
            'individual government ownerships instead.'
        ),
        'qcew_calls': [
            {
                'industry_url': '102',
                'own_code': '5',
                'agglvl_code': '12',
                'sign': '+',
                'description': 'Service-providing domain (private only)',
            },
            {
                'industry_url': '10',
                'own_code': '1',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'Federal government (all industries)',
            },
            {
                'industry_url': '10',
                'own_code': '2',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'State government (all industries)',
            },
            {
                'industry_url': '10',
                'own_code': '3',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'Local government (all industries)',
            },
        ],
    },
    {
        'ces_series_id': 'CEU9000000001',
        'ces_description': 'Government',
        'ces_level': 'supersector',
        'ces_code': '90',
        'mismatch_type': 'B',
        'mismatch_reason': (
            'own_code=8 (Total Government) is not published in QCEW '
            'CSV data slices. Must sum individual government ownerships.'
        ),
        'formula': (
            'QCEW(10, own=1, agglvl=11) '
            '+ QCEW(10, own=2, agglvl=11) '
            '+ QCEW(10, own=3, agglvl=11)'
        ),
        'formula_verbose': (
            'Sum federal + state + local government employment '
            'across all industries.'
        ),
        'qcew_calls': [
            {
                'industry_url': '10',
                'own_code': '1',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'Federal government (all industries)',
            },
            {
                'industry_url': '10',
                'own_code': '2',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'State government (all industries)',
            },
            {
                'industry_url': '10',
                'own_code': '3',
                'agglvl_code': '11',
                'sign': '+',
                'description': 'Local government (all industries)',
            },
        ],
    },

    # ─────────────────────────────────────────────────────────────────────
    # TYPE C: No QCEW equivalent — must build from 3-digit NAICS
    # ─────────────────────────────────────────────────────────────────────
    {
        'ces_series_id': 'CEU3100000001',
        'ces_description': 'Durable goods manufacturing',
        'ces_level': 'sector',
        'ces_code': '31',
        'mismatch_type': 'C',
        'mismatch_reason': (
            'BLS-defined aggregation with no QCEW equivalent. '
            'Must sum 10 individual NAICS 3-digit industries.'
        ),
        'formula': (
            'sum of QCEW(x, own=5, agglvl=15) for x in '
            '[321, 327, 331, 332, 333, 334, 335, 336, 337, 339]'
        ),
        'formula_verbose': (
            'Sum private employment across: '
            '321 Wood products, '
            '327 Nonmetallic mineral products, '
            '331 Primary metals, '
            '332 Fabricated metal products, '
            '333 Machinery, '
            '334 Computer and electronic products, '
            '335 Electrical equipment and appliances, '
            '336 Transportation equipment, '
            '337 Furniture and related products, '
            '339 Miscellaneous manufacturing.'
        ),
        'qcew_calls': [
            {
                'industry_url': code,
                'own_code': '5',
                'agglvl_code': '15',
                'sign': '+',
                'description': desc,
            }
            for code, desc in [
                ('321', 'Wood products'),
                ('327', 'Nonmetallic mineral products'),
                ('331', 'Primary metals'),
                ('332', 'Fabricated metal products'),
                ('333', 'Machinery'),
                ('334', 'Computer and electronic products'),
                ('335', 'Electrical equipment and appliances'),
                ('336', 'Transportation equipment'),
                ('337', 'Furniture and related products'),
                ('339', 'Miscellaneous manufacturing'),
            ]
        ],
    },
    {
        'ces_series_id': 'CEU3200000001',
        'ces_description': 'Nondurable goods manufacturing',
        'ces_level': 'sector',
        'ces_code': '32',
        'mismatch_type': 'C',
        'mismatch_reason': (
            'BLS-defined aggregation with no QCEW equivalent. '
            'Must sum 11 individual NAICS 3-digit industries.'
        ),
        'formula': (
            'sum of QCEW(x, own=5, agglvl=15) for x in '
            '[311, 312, 313, 314, 315, 316, 322, 323, 324, 325, 326]'
        ),
        'formula_verbose': (
            'Sum private employment across: '
            '311 Food manufacturing, '
            '312 Beverage and tobacco, '
            '313 Textile mills, '
            '314 Textile product mills, '
            '315 Apparel, '
            '316 Leather and allied products, '
            '322 Paper, '
            '323 Printing, '
            '324 Petroleum and coal products, '
            '325 Chemicals, '
            '326 Plastics and rubber products.'
        ),
        'qcew_calls': [
            {
                'industry_url': code,
                'own_code': '5',
                'agglvl_code': '15',
                'sign': '+',
                'description': desc,
            }
            for code, desc in [
                ('311', 'Food manufacturing'),
                ('312', 'Beverage and tobacco'),
                ('313', 'Textile mills'),
                ('314', 'Textile product mills'),
                ('315', 'Apparel'),
                ('316', 'Leather and allied products'),
                ('322', 'Paper'),
                ('323', 'Printing'),
                ('324', 'Petroleum and coal products'),
                ('325', 'Chemicals'),
                ('326', 'Plastics and rubber products'),
            ]
        ],
    },
]


In [6]:
def get_inexact_df() -> pl.DataFrame:

    '''Return a summary DataFrame of inexact mappings.'''
    
    rows = []
    for entry in INEXACT_SERIES:
        n_calls = len(entry['qcew_calls'])
        calls_summary = ' | '.join(
            f"{c['sign']} {c['industry_url']}(own={c['own_code']},agg={c['agglvl_code']})"
            for c in entry['qcew_calls']
        )
        rows.append({
            'ces_series_id': entry['ces_series_id'],
            'ces_description': entry['ces_description'],
            'ces_level': entry['ces_level'],
            'ces_code': entry['ces_code'],
            'mismatch_type': entry['mismatch_type'],
            'n_qcew_calls': n_calls,
            'formula': entry['formula'],
        })
    
    return pl.DataFrame(rows)


def get_qcew_calls_df() -> pl.DataFrame:
    
    '''Return a flat DataFrame of all QCEW calls needed for inexact series.'''
    
    rows = []
    for entry in INEXACT_SERIES:
        for call in entry['qcew_calls']:
            rows.append({
                'ces_series_id': entry['ces_series_id'],
                'ces_description': entry['ces_description'],
                'ces_level': entry['ces_level'],
                'ces_code': entry['ces_code'],
                'sign': call['sign'],
                'qcew_industry_url': call['industry_url'],
                'own_code': call['own_code'],
                'agglvl_code': call['agglvl_code'],
                'call_description': call['description'],
            })
    
    return pl.DataFrame(rows)


def get_unique_qcew_fetches() -> pl.DataFrame:
    
    '''
    Return the deduplicated set of QCEW CSV slices that must be fetched
    to reconstruct all inexact CES series. Useful for minimizing API calls.
    
    '''
    df = get_qcew_calls_df()
    
    return (
        df
        .select(
            ces_series_id=pl.col('ces_series_id'),
            ces_description=pl.col('ces_description'),
            ces_level=pl.col('ces_level'),
            ces_code=pl.col('ces_code'),
            qcew_sign=pl.col('sign'),
            qcew_industry=pl.col('qcew_industry_url'),
            qcew_industry_url=pl.col('qcew_industry_url'),
            own_code=pl.col('own_code'),
            agglvl_code=pl.col('agglvl_code'),
            area_fips=pl.lit('US000', pl.Utf8),
            size_code=pl.lit('0', pl.Utf8),
            notes=pl.lit('', pl.Utf8),
        )
        .unique()
        .sort('qcew_industry_url')
    )

In [7]:
print('8 of 35 CEU series require multi-call QCEW construction:')

inexact_df = get_unique_qcew_fetches()

8 of 35 CEU series require multi-call QCEW construction:


In [8]:
area_list = [
    'US', '01', '02', '04', '05', '06', '08', '09', '10', '11', '12', '13', 
    '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', 
    '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', 
    '39', '40', '41', '42', '44', '45', '46', '47', '48', '49', '50', '51', 
    '53', '54', '55', '56', '72', '78', 
]

areas = [f'{a}000' for a in area_list]

In [9]:
all_series_calls = (
    pl
    .concat([
        exact_df, 
        inexact_df
    ])
    .with_columns(
        qcew_sign=pl.when(pl.col('qcew_sign').eq('+'))
                    .then(pl.lit(1, pl.Int8))
                    .when(pl.col('qcew_sign').eq('-'))
                    .then(pl.lit(-1, pl.Int8))
                    .otherwise(pl.lit(None, pl.Int8))
    )
    .sort('ces_series_id', 'qcew_industry')
)

In [10]:
from eco_stats.api.bls.qcew import QCEWClient

START_YEAR = 2016
END_YEAR = 2025
END_QTR = 2

qcew = QCEWClient()

unique_industries = (
    all_series_calls
    .select('qcew_industry_url')
    .unique()
    .sort('qcew_industry_url')
    .to_series()
    .to_list()
)
n = len(unique_industries)
print(f'{n} unique industry slices to fetch ({START_YEAR}Q1 - {END_YEAR}Q{END_QTR})\n')

industry_cache: dict[str, pl.DataFrame] = {}
for i, code in enumerate(unique_industries, 1):
    df = (
        qcew
        .get_industry(
            industry_code=code,
            start_year=START_YEAR,
            end_year=END_YEAR,
        )
        .filter(
            pl.col('year').is_between(START_YEAR, END_YEAR)
        )
        .select(
            year=pl.col('year')
                   .cast(pl.UInt16),
            qtr=pl.col('qtr')
                  .cast(pl.UInt8),
            own_code=pl.col('own_code'),
            agglvl_code=pl.col('agglvl_code'),
            area_fips=pl.col('area_fips'),
            size_code=pl.col('size_code'),
            industry_code=pl.col('industry_code'),
            num_estabs=pl.col('qtrly_estabs'),
            emp_m1=pl.col('month1_emplvl')
                     .cast(pl.Float64),
            emp_m2=pl.col('month2_emplvl')
                     .cast(pl.Float64),
            emp_m3=pl.col('month3_emplvl')
                     .cast(pl.Float64),
        )
    )

    industry_cache[code] = df
    print(f'  [{i:2d}/{n}] {code:>5}: {df.height:>6,} rows')

print(f'\nFetched {len(industry_cache)} industry slices.')

49 unique industry slices to fetch (2016Q1 - 2025Q2)

  [ 1/49]    10: 731,634 rows
  [ 2/49]   101: 253,618 rows
  [ 3/49]  1012: 244,420 rows
  [ 4/49]  1013: 145,426 rows
  [ 5/49]   102: 516,600 rows
  [ 6/49]  1021: 380,516 rows
  [ 7/49]  1022: 225,080 rows
  [ 8/49]  1023: 211,274 rows
  [ 9/49]  1024: 257,698 rows
  [10/49]  1025: 384,072 rows
  [11/49]  1026: 243,338 rows
  [12/49]  1027: 194,190 rows
  [13/49]    11: 148,594 rows
  [14/49]  1133: 70,094 rows
  [15/49]    21: 114,764 rows
  [16/49]    22: 212,800 rows
  [17/49]   311: 120,400 rows
  [18/49]   312: 85,512 rows
  [19/49]   313: 42,318 rows
  [20/49]   314: 70,804 rows
  [21/49]   315: 52,378 rows
  [22/49]   316: 33,852 rows
  [23/49]   321: 107,552 rows
  [24/49]   322: 59,310 rows
  [25/49]   323: 102,886 rows
  [26/49]   324: 48,116 rows
  [27/49]   325: 95,688 rows
  [28/49]   326: 85,770 rows
  [29/49]   327: 112,602 rows
  [30/49]   331: 64,826 rows
  [31/49]   332: 120,774 rows
  [32/49]   333: 108,204 ro

In [15]:
results = []
missing = []

for row in all_series_calls.iter_rows(named=True):
    
    idf = industry_cache.get(row['qcew_industry_url'])
    
    if idf is None or idf.height == 0:
        missing.append((row['ces_series_id'], row['qcew_industry_url']))
        continue

    filtered = (
        idf
        .filter(
            pl.col('own_code').eq(row['own_code']),
            pl.col('agglvl_code').eq(row['agglvl_code']),
            #pl.col('area_fips').is_in(areas),
            pl.col('size_code').eq(row['size_code']),
        )
        .with_columns(
            ces_series_id=pl.lit(row['ces_series_id']),
            ces_description=pl.lit(row['ces_description']),
            ces_level=pl.lit(row['ces_level']),
            ces_code=pl.lit(row['ces_code']),
            qcew_sign=pl.lit(row['qcew_sign'])
        )
    )

    if filtered.height > 0:
        results.append(filtered)
    else:
        missing.append((row['ces_series_id'], row['qcew_industry_url']))

In [18]:
all_series_calls

ces_series_id,ces_description,ces_level,ces_code,qcew_sign,qcew_industry,qcew_industry_url,own_code,agglvl_code,area_fips,size_code,notes
str,str,str,str,i8,str,str,str,str,str,str,str
"""CEU0000000001""","""Total nonfarm""","""national""","""00""",1,"""10""","""10""","""0""","""10""","""US000""","""0""",""""""
"""CEU0000000001""","""Total nonfarm""","""national""","""00""",-1,"""11""","""11""","""1""","""14""","""US000""","""0""",""""""
"""CEU0000000001""","""Total nonfarm""","""national""","""00""",-1,"""11""","""11""","""5""","""14""","""US000""","""0""",""""""
"""CEU0000000001""","""Total nonfarm""","""national""","""00""",-1,"""11""","""11""","""2""","""14""","""US000""","""0""",""""""
"""CEU0000000001""","""Total nonfarm""","""national""","""00""",-1,"""11""","""11""","""3""","""14""","""US000""","""0""",""""""
…,…,…,…,…,…,…,…,…,…,…,…
"""CEU9000000001""","""Government""","""supersector""","""90""",1,"""10""","""10""","""1""","""11""","""US000""","""0""",""""""
"""CEU9000000001""","""Government""","""supersector""","""90""",1,"""10""","""10""","""3""","""11""","""US000""","""0""",""""""
"""CEU9091000001""","""Federal Government""","""sector""","""91""",1,"""10""","""10""","""1""","""11""","""US000""","""0""","""QCEW federal government covera…"
"""CEU9092000001""","""State Government""","""sector""","""92""",1,"""10""","""10""","""2""","""11""","""US000""","""0""",""""""


In [None]:
qcew_data = (
    pl
    .concat(
        results, 
        how='vertical_relaxed'
    )
    .with_columns(
        ref_date=pl.lit(None, pl.Date),
        ref_year=pl.col('year'),
        ref_month=pl.lit(None, pl.UInt8),
        state_fips=pl.col('area_fips')
                     .str.slice(0, 2)
    )
    .with_columns(
        geographic_type=pl.when(pl.col('state_fips').eq('US'))
                          .then(pl.lit('national', pl.Utf8))
                          .otherwise(pl.lit('state', pl.Utf8)),
        geographic_code=pl.when(pl.col('state_fips').eq('US'))
                          .then(pl.lit('00', pl.Utf8))
                          .otherwise(pl.col('state_fips'))
    )
    .with_columns(
        industry_type=pl.col('ces_level'),
        industry_code=pl.col('ces_code'),
    )
    .unpivot(
        ['emp_m1', 'emp_m2', 'emp_m3'],
        index=[
            'ref_date', 'ref_year', 'ref_month', 'qtr', 
            'ces_series_id', 'ces_description', 
            'geographic_type', 'geographic_code',
            'industry_type', 'industry_code',
            'area_fips', 'qcew_sign', 'num_estabs'
        ],
        variable_name='month',
        value_name='employment'
    )
    .with_columns(
        month=pl.col('month')
                .str.replace('emp_m', '', literal=True)
                .cast(pl.UInt8),
        employment=pl.col('employment')
                     .mul(pl.col('qcew_sign')),
        num_estabs=pl.col('num_estabs')
                     .mul(pl.col('qcew_sign'))
    )
    .with_columns(
        ref_month=pl.col('qtr')
                    .sub(1)
                    .mul(3)
                    .add(pl.col('month')),
        revision=pl.lit(None, pl.Utf8),
        vintage_date=pl.lit(None, pl.Date)
    )
    .with_columns(
        ref_date=pl.date(
            pl.col('ref_year'),
            pl.col('ref_month'),
            pl.lit(12, pl.UInt8)
        )
    )
    .sort('ces_series_id', 'ref_date')
)

n_expected = all_series_calls.height
n_series = len(results)
n_qtrs = qcew_data.select('ref_date').unique().height

print(f'{n_series} of {n_expected} series retrieved across {n_qtrs} months')
print(f'{qcew_data.height:,} total rows')

if missing:
    print(f'\nMissing ({len(missing)}):')
    for sid, ind in missing:
        print(f'  {sid} (industry/{ind})')

qcew_data

In [None]:
qcew = (
    qcew_data
    .sort(
        'ref_date', 
        'geographic_type', 'geographic_code',
        'industry_type', 'industry_code'
    )
    .group_by(
        'ref_date', 'ref_year', 'ref_month', 
        'revision', 'vintage_date',
        'geographic_type', 'geographic_code',
        'industry_type', 'industry_code',
        maintain_order=True
    )
    .agg(
        num_estabs=pl.col('num_estabs').sum(),
        employment=pl.col('employment').sum()
    )
)

In [None]:
qcew

In [None]:
3990/114