# Insights - Total population by year

## Introduction

This process is dependent on upstream processes. See the "Prerequisites" section below.

The workflow defined herein is identified as workflow ID #90 in the the [Data Team Master Document List](https://morpc1.sharepoint.com/:x:/s/GISteam/EfC4j3HhohZCrSZzxJdyt5cBFEqVD7zHick8ZW0INqgCYA?e=0WhrAI). References to document list identifiers are denoted by a number in brackets, e.g. [90].

## Process outline

  1. Load input dataset
  2. Extract required population facts
  3. Transform population facts to comply with output schema
  4. Export output dataset
  5. Create resource file

## Prerequisites and usage notes

  - Outputs of one or more upstream workflows must be available at the indicated paths. Make sure that those outputs are up to date prior to running this script. 
  - This script includes several intentional RuntimeError instances that may be triggered to alert the user to conditions that may require their attention. If the script triggers one of these errors, review the error, verify that the condition is acceptable or resolve any issues, then proceed.

## Setup

### Import required packages

In [1]:
import os
import shutil
import sys
import pandas as pd
import frictionless
import datetime
import matplotlib
from matplotlib import pyplot as plt
sys.path.append(os.path.normpath("../morpc-common"))
import morpc
import morpcCensus

### User-specified parameters

In [2]:
MORPC_ESTIMATE_YEAR_RANGE = [2024, 2024]

MORPC_FORECAST_VINTAGE = 2023
MORPC_FORECAST_YEAR_RANGE = [2025, 2050]
MORPC_FORECAST_YEAR_INTERVAL = 5

DECENNIAL_YEAR_RANGE = [1980, 2020]
INTERCENSAL_YEAR_RANGE = [2000, 2019]

PEP_YEAR_RANGE = [INTERCENSAL_YEAR_RANGE[1]+1, MORPC_ESTIMATE_YEAR_RANGE[0]-1]

MAX_POP_THRESHOLD = 5000
MIN_POP_THRESHOLD = 100

# When STALE_DATA_INTERRUPT == True, the script will produce a RuntimeError in certain situations where the input 
# data may be stale and updates might be required prior to running the script.  Otherwise, a warning will be generated 
# but script execution will continue.  Regardless of whether an error or warning occurs, be sure to verify the readiness 
# of all input data.
STALE_DATA_INTERRUPT = True

# You can change where the input data is sourced and archived by changing the following directory and file names.  
# This typically is not necessary and may break other scripts that depend on outputs from this one. Source data 
# will be copied to this location.  Input data will be deleted following successful completion of the script 
# unless PRESERVE_INPUT_DATA == True.
INPUT_DIR = "./input_data"

# You can change where the output data is stored by changing the following directory and file names.  This 
# typically is not necessary and may break other scripts that depend on outputs from this one.
OUTPUT_DIR = "./output_data"

### Static parameters

Create a map to convert human-readable source descriptions to shortened codes to save space.

In [89]:
GEO_TYPE_LIST = ["COUNTY","PLACE","COUNTY-TOWNSHIP-REMAINDER"]

SOURCE_MAP = {
    "Census Intercensal Estimates":"CENINT",
    "Census Population Estimates Program":"CENPEP",
    "Mid-Ohio Regional Planning Commission":"MORPC"
}

SOURCE_MAP_REVERSED = {value: key for key, value in SOURCE_MAP.items()}

GEO_TYPE_LABELS = {
    "REGION15":"",
    "COUNTY":"",
    "COUNTY-TOWNSHIP-REMAINDER":" (unincorporated)",
    "PLACE":""
}

CHART_DIRNAME = "charts"

### Define inputs

The following datasets are required by this notebook. They will be retrieved from the specified location and temporarily stored in INPUT_DIR. They will be deleted following successful completion of the script unless PRESERVE_INPUT_DATA == True.

#### Create input data directory

Create input data directory if it doesn't exist.

In [4]:
inputDir = os.path.normpath(INPUT_DIR)
if not os.path.exists(inputDir):
    os.makedirs(inputDir)

#### MORPC counties reference data [81]

Reference data for counties in the MORPC region will be loaded automatically as a morpc.countyLookup() object (see below).

#### MORPC combined population facts [286]

In [5]:
COMBINED_POP_FACTS_RESOURCE_PATH = "../morpc-pop-collect/output_data/morpc-pop-collect.resource.yaml"
print("Resource file: {}".format(COMBINED_POP_FACTS_RESOURCE_PATH))

Resource file: ../morpc-pop-collect/output_data/morpc-pop-collect.resource.yaml


#### MORPC geography lookup table [375]

In [6]:
GEOS_LOOKUP_RESOURCE_PATH = "../morpc-geos-collect/output_data/morpc-geos-lookup.resource.yaml"
print("Resource file: {}".format(GEOS_LOOKUP_RESOURCE_PATH))

Resource file: ../morpc-geos-collect/output_data/morpc-geos-lookup.resource.yaml


### Define outputs

#### Create output data directory

Create output data directory if it doesn't exist.

In [7]:
outputDir = os.path.normpath(OUTPUT_DIR)
if not os.path.exists(outputDir):
    os.makedirs(outputDir)   

In [8]:
chartDir = os.path.join(outputDir, CHART_DIRNAME)
if not os.path.exists(chartDir):
    os.makedirs(chartDir)    

#### Insights total population by year [287]

In [9]:
INSIGHTS_POP_TABLE_FILENAME = "morpc-insights-pop-temporal.csv"
INSIGHTS_POP_TABLE_PATH = os.path.join(outputDir, INSIGHTS_POP_TABLE_FILENAME)
INSIGHTS_POP_TABLE_SCHEMA_PATH = INSIGHTS_POP_TABLE_PATH.replace(".csv",".schema.yaml")
INSIGHTS_POP_TABLE_RESOURCE_PATH = INSIGHTS_POP_TABLE_PATH.replace(".csv",".resource.yaml")
print("Data: {}".format(INSIGHTS_POP_TABLE_PATH))
print("Schema: {}".format(INSIGHTS_POP_TABLE_SCHEMA_PATH))
print("Resource file: {}".format(INSIGHTS_POP_TABLE_RESOURCE_PATH))

Data: output_data\morpc-insights-pop-temporal.csv
Schema: output_data\morpc-insights-pop-temporal.schema.yaml
Resource file: output_data\morpc-insights-pop-temporal.resource.yaml


## Prepare input data

### Load county reference data

In [10]:
countyLookup = morpc.countyLookup(scope="15-County Region")

Loading data for MORPC 15-County region only


In [11]:
",".join(countyLookup.list_ids())

'39041,39045,39047,39049,39073,39083,39089,39091,39097,39101,39117,39127,39129,39141,39159'

### Combined population facts

In [12]:
(combinedPopRaw, combinedPopResource, combinedPopSchema) = morpc.frictionless_load_data(COMBINED_POP_FACTS_RESOURCE_PATH, archiveDir=inputDir, validate=True, verbose=True)

morpc.load_frictionless_data | INFO | Loading Frictionless Resource file at location ..\morpc-pop-collect\output_data\morpc-pop-collect.resource.yaml
morpc.load_frictionless_data | INFO | Copying data, resource file, and schema to directory input_data
morpc.load_frictionless_data | INFO | --> Data file: input_data\morpc-pop-collect.csv
morpc.load_frictionless_data | INFO | --> Resource file: input_data\morpc-pop-collect.resource.yaml
morpc.load_frictionless_data | INFO | --> Schema file: input_data\morpc-pop-collect.schema.yaml
morpc.load_frictionless_data | INFO | Validating resource including data and schema.
morpc.frictionless_validate_resource | INFO | Validating resource on disk (including data and schema). This may take some time.
morpc.frictionless_validate_resource | INFO | Resource is valid
morpc.load_frictionless_data | INFO | Loading data.
frictionless_cast_field_types | INFO | Casting field POP as type integer.
frictionless_cast_field_types | INFO | Casting field GEOIDFQ as

In [13]:
combinedPop = combinedPopRaw.copy()

In [14]:
combinedPop["GEO_TYPE"] = combinedPop["SUMLEVEL"].map(morpc.HIERARCHY_STRING_LOOKUP)

In [15]:
combinedPop = combinedPop.drop(columns=["SUMLEVEL","LAST_UPDATED"]) 

In [16]:
combinedPop = combinedPop.loc[combinedPop["GEO_TYPE"].isin(GEO_TYPE_LIST)]

In [17]:
combinedPop.head()

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE
0,111714,0500000US39041,2000,Y-JUN,2009,,ESTIMATE,,,,CENPEP,COUNTY
1,111759,0500000US39041,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY
2,119098,0500000US39041,2001,Y-JUN,2009,,ESTIMATE,,,,CENPEP,COUNTY
3,118646,0500000US39041,2001,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY
4,127011,0500000US39041,2002,Y-JUN,2009,,ESTIMATE,,,,CENPEP,COUNTY


### Geography lookup table

In [18]:
(geoLookupRaw, geoLookupResource, geoLookupSchema) = morpc.frictionless_load_data(GEOS_LOOKUP_RESOURCE_PATH, archiveDir=inputDir, validate=True, verbose=True)

morpc.load_frictionless_data | INFO | Loading Frictionless Resource file at location ..\morpc-geos-collect\output_data\morpc-geos-lookup.resource.yaml
morpc.load_frictionless_data | INFO | Copying data, resource file, and schema to directory input_data
morpc.load_frictionless_data | INFO | --> Data file: input_data\morpc-geos-lookup.csv
morpc.load_frictionless_data | INFO | --> Resource file: input_data\morpc-geos-lookup.resource.yaml
morpc.load_frictionless_data | INFO | --> Schema file: input_data\morpc-geos-lookup.schema.yaml
morpc.load_frictionless_data | INFO | Validating resource including data and schema.
morpc.frictionless_validate_resource | INFO | Validating resource on disk (including data and schema). This may take some time.
morpc.frictionless_validate_resource | INFO | Resource is valid
morpc.load_frictionless_data | INFO | Loading data.
frictionless_cast_field_types | INFO | Casting field GEOIDFQ as type string.
frictionless_cast_field_types | INFO | Casting field GEOID 

In [19]:
geoLookup = geoLookupRaw.copy() \
    .set_index("GEOIDFQ")

In [20]:
geoLookup.head()

Unnamed: 0_level_0,GEOID,SUMLEVEL,GEOTYPE,NAME,SOURCE,STATEFP,COUNTYFP,COUSUBFP,PLACEFP,TRACTCE,CLASSFP,MUNITYPE,PLACECOMBO
GEOIDFQ,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0500000US39041,39041,50,COUNTY,Delaware,CENSUS,39,41,,,,H1,,
0500000US39045,39045,50,COUNTY,Fairfield,CENSUS,39,45,,,,H1,,
0500000US39047,39047,50,COUNTY,Fayette,CENSUS,39,47,,,,H1,,
0500000US39049,39049,50,COUNTY,Franklin,CENSUS,39,49,,,,H1,,
0500000US39073,39073,50,COUNTY,Hocking,CENSUS,39,73,,,,H1,,


## Transform data

#### Load output schema

In [21]:
insightsPopSchema = morpc.frictionless_load_schema(INSIGHTS_POP_TABLE_SCHEMA_PATH)
insightsPopSchema

{'fields': [{'name': 'GEOIDFQ',
             'type': 'string',
             'description': 'Unique identifier for the geography as issued by '
                            'MORPC.  These are identical to fully-qualified '
                            'Census-issued GEOIDs for Census geographies.'},
            {'name': 'Name',
             'type': 'string',
             'description': 'Name of the geography.'},
            {'name': 'Geography type',
             'type': 'string',
             'description': 'Code which designates the summary level '
                            '(geography type) for which the GEOID applies.  '
                            'The combination of GEO_TYPE and GEOID uniquely '
                            'identify the geography for the record.'},
            {'name': 'Date',
             'type': 'date',
             'description': 'ISO8601-compliant date string that identifies the '
                            'reference date for which the estimate applies'},
  

### Create list to collect extracted data

In [22]:
extractedData = []

### Extract decennial census counts (NOT IMPLEMENTED)

### Extract intercensal estimates

Create list of years from user-specified range.

In [23]:
intercensalRange = list(range(INTERCENSAL_YEAR_RANGE[0], INTERCENSAL_YEAR_RANGE[1]+1))
print("Including intercensal estimates for years: {}".format(", ".join([str(x) for x in intercensalRange])))

Including intercensal estimates for years: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019


Extract intercensal data from combined table.

In [24]:
intercensal = combinedPop.loc[combinedPop["SOURCE"] == "CENINT"].copy()

Reference period and vintage period were stored as strings, but for intercensal estimates we can treat them as integers. Convert them now.

In [25]:
intercensal = intercensal.astype({
    "REFERENCE_PERIOD":"int",
    "VINTAGE_PERIOD":"int"
})

Verify that data is available for the specified years.

In [26]:
if(not set(intercensalRange).issubset(set(intercensal["REFERENCE_PERIOD"]))):
    print("ERROR | Set of intercensal years for which data is available does not match set derived from specified range.")
    print("Specified range: {}".format(INTERCENSAL_YEAR_RANGE))
    print("Specified set: {}".format(set(intercensalRange)))
    print("Available set: {}".format(set(intercensal["REFERENCE_PERIOD"])))
    raise RuntimeError
else:
    print("INFO | Intercensal data is available for all years in specified range.")

INFO | Intercensal data is available for all years in specified range.


Extract only the estimates for the specified years.

In [27]:
intercensal = morpc.extract_vintage(intercensal, refPeriods=intercensalRange)

Verify that all reference periods are present and that there is only a single vintage for each reference period.

In [28]:
for year in intercensal["REFERENCE_PERIOD"].unique():
    temp = intercensal.loc[intercensal["REFERENCE_PERIOD"] == year]
    print("{0}: {1}".format(year, ",".join(temp["VINTAGE_PERIOD"].unique().astype("str"))))

2000: 2012
2001: 2012
2002: 2012
2003: 2012
2004: 2012
2005: 2012
2006: 2012
2007: 2012
2008: 2012
2009: 2012
2010: 2024
2011: 2024
2012: 2024
2013: 2024
2014: 2024
2015: 2024
2016: 2024
2017: 2024
2018: 2024
2019: 2024


Construct DATE field from reference period and reference period frequency.

In [29]:
if(intercensal["REFERENCE_PERIOD_FREQ"].unique().shape[0] == 1):
    freq = intercensal["REFERENCE_PERIOD_FREQ"].iat[0]
    print("INFO | Detected reference period frequency {}".format(freq))
else:
    print("ERROR | Multiple reference period frequencies are not supported.")
    raise RuntimeError

try:
    # Hopefully this works properly with newer versions of pandas, but it has not been tested
    periodIndex = pd.PeriodIndex(intercensal["REFERENCE_PERIOD"]+1, freq=freq)
except:
    # This works with older versions of pandas. Note the +1 offset. For whatever reason, pandas converts 2000 Y-JUN (for example)
    # to a timestamp of 1999-07-01 so we add one year to the reference period so the timestamp becomes 2000-07-01.
    print("WARNING | Error occurred when attempting to create period index using 'Y-' format. Trying legacy 'A-' format.")
    periodIndex = pd.PeriodIndex(intercensal["REFERENCE_PERIOD"]+1, freq=freq.replace("Y-","A-"))
intercensal["DATE"] = periodIndex.to_timestamp()

INFO | Detected reference period frequency Y-JUN


Show the data.

In [30]:
intercensal.head()

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE,DATE
1,111759,0500000US39041,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
62,123485,0500000US39045,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
123,28495,0500000US39047,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
184,1072018,0500000US39049,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
245,28262,0500000US39073,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01


In [31]:
extractedData.append(intercensal)

### Extract PEP estimates

Create list of years from user-specified range.

In [32]:
pepRange = list(range(PEP_YEAR_RANGE[0], PEP_YEAR_RANGE[1]+1))
print("Including PEP estimates for years: {}".format(", ".join([str(x) for x in pepRange])))

Including PEP estimates for years: 2020, 2021, 2022, 2023


Extract PEP estimates from combined table.

In [33]:
pep = combinedPop.loc[combinedPop["SOURCE"] == "CENPEP"].copy()

Reference period and vintage period were stored as strings, but for Census PEP estimates we can treat them as integers. Convert them now.

In [34]:
pep = pep.astype({
    "REFERENCE_PERIOD":"int",
    "VINTAGE_PERIOD":"int"
})

Verify that data is available for the specified years.

In [35]:
if(not set(pepRange).issubset(set(pep["REFERENCE_PERIOD"]))):
    print("ERROR | Set of Census PEP years for which data is available does not match set derived from specified range.")
    print("Specified range: {}".format(PEP_YEAR_RANGE))
    print("Specified set: {}".format(set(pepRange)))
    print("Available set: {}".format(set(pep["REFERENCE_PERIOD"])))
    raise RuntimeError
else:
    print("INFO | PEP data is available for all years in specified range.")

INFO | PEP data is available for all years in specified range.


Extract only the estimates for the specified years.

In [36]:
pep = morpc.extract_vintage(pep, refPeriods=pepRange)

Verify that all reference periods are present and that there is only a single vintage for each reference period.

In [37]:
for year in pep["REFERENCE_PERIOD"].unique():
    temp = pep.loc[pep["REFERENCE_PERIOD"] == year]
    print("{0}: {1}".format(year, ",".join(temp["VINTAGE_PERIOD"].unique().astype("str"))))

2020: 2023
2021: 2023
2022: 2023
2023: 2023


Construct DATE field from reference period and reference period frequency.

In [38]:
if(pep["REFERENCE_PERIOD_FREQ"].unique().shape[0] == 1):
    freq = pep["REFERENCE_PERIOD_FREQ"].iat[0]
    print("INFO | Detected reference period frequency {}".format(freq))
else:
    print("ERROR | Multiple reference period frequencies are not supported.")
    raise RuntimeError

try:
    # Hopefully this works properly with newer versions of pandas, but it has not been tested
    periodIndex = pd.PeriodIndex(pep["REFERENCE_PERIOD"]+1, freq=freq)
except:
    # This works with older versions of pandas. Note the +1 offset. For whatever reason, pandas converts 2000 Y-JUN (for example)
    # to a timestamp of 1999-07-01 so we add one year to the reference period so the timestamp becomes 2000-07-01.
    print("WARNING | Error occurred when attempting to create period index using 'Y-' format. Trying legacy 'A-' format.")
    periodIndex = pd.PeriodIndex(pep["REFERENCE_PERIOD"]+1, freq=freq.replace("Y-","A-"))
pep["DATE"] = periodIndex.to_timestamp()

INFO | Detected reference period frequency Y-JUN


Show the data.

In [39]:
pep.head()

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE,DATE
43,215166,0500000US39041,2020,Y-JUN,2023,,ESTIMATE,,,,CENPEP,COUNTY,2020-07-01
104,159448,0500000US39045,2020,Y-JUN,2023,,ESTIMATE,,,,CENPEP,COUNTY,2020-07-01
165,28975,0500000US39047,2020,Y-JUN,2023,,ESTIMATE,,,,CENPEP,COUNTY,2020-07-01
226,1324441,0500000US39049,2020,Y-JUN,2023,,ESTIMATE,,,,CENPEP,COUNTY,2020-07-01
287,28040,0500000US39073,2020,Y-JUN,2023,,ESTIMATE,,,,CENPEP,COUNTY,2020-07-01


In [40]:
extractedData.append(pep)

### Extract MORPC county estimates

Because county estimates and sub-county estimates are generated (and regenerated) at different times, it is necessary to process each separately. Start with county estimates.

Create list of years from user-specified range.

In [41]:
morpcEstimatesRange = list(range(MORPC_ESTIMATE_YEAR_RANGE[0], MORPC_ESTIMATE_YEAR_RANGE[1]+1))
print("Including MORPC estimates for years: {}".format(", ".join([str(x) for x in morpcEstimatesRange])))

Including MORPC estimates for years: 2024


Extract MORPC county estimates from combined table.

In [42]:
morpcCountyEstimates = combinedPop.loc[
    (combinedPop["SOURCE"] == "MORPC") & 
    (combinedPop["VALUE_TYPE"] == "ESTIMATE") &
    (combinedPop["GEO_TYPE"] == "COUNTY")
].copy()

Reference period and vintage period were stored as strings, but for MORPC estimates we can treat them as integers. Convert them now.

In [43]:
morpcCountyEstimates = morpcCountyEstimates.astype({
    "REFERENCE_PERIOD":"int",
    "VINTAGE_PERIOD":"int"
})

Verify that data is available for the specified years.

In [44]:
if(not set(morpcEstimatesRange).issubset(set(morpcCountyEstimates["REFERENCE_PERIOD"]))):
    print("ERROR | Set of MORPC estimate years for which data is available does not match set derived from specified range.")
    print("Specified range: {}".format(MORPC_ESTIMATE_YEAR_RANGE))
    print("Specified set: {}".format(set(morpcEstimatesRange)))
    print("Available set: {}".format(set(morpcCountyEstimates["REFERENCE_PERIOD"])))
    raise RuntimeError
else:
    print("INFO | MORPC county estimates data is available for all years in specified range.")

INFO | MORPC county estimates data is available for all years in specified range.


Extract only the estimates for the specified years.

In [45]:
morpcCountyEstimates = morpc.extract_vintage(morpcCountyEstimates, refPeriods=morpcEstimatesRange)

Verify that all reference periods are present and that there is only a single vintage for each reference period.

In [46]:
for year in morpcCountyEstimates["REFERENCE_PERIOD"].unique():
    temp = morpcCountyEstimates.loc[morpcCountyEstimates["REFERENCE_PERIOD"] == year]
    print("{0}: {1}".format(year, ",".join(temp["VINTAGE_PERIOD"].unique().astype("str"))))

2024: 2024


Construct DATE field from reference period and reference period frequency.

In [47]:
if(morpcCountyEstimates["REFERENCE_PERIOD_FREQ"].unique().shape[0] == 1):
    freq = morpcCountyEstimates["REFERENCE_PERIOD_FREQ"].iat[0]
    print("INFO | Detected reference period frequency {}".format(freq))
else:
    print("ERROR | Multiple reference period frequencies are not supported.")
    raise RuntimeError

try:
    # Hopefully this works properly with newer versions of pandas, but it has not been tested
    periodIndex = pd.PeriodIndex(morpcCountyEstimates["REFERENCE_PERIOD"], freq=freq)
except:
    # This works with older versions of pandas.
    print("WARNING | Error occurred when attempting to create period index using 'Y-' format. Trying legacy 'A-' format.")
    periodIndex = pd.PeriodIndex(morpcCountyEstimates["REFERENCE_PERIOD"], freq=freq.replace("Y-","A-"))
morpcCountyEstimates["DATE"] = periodIndex.to_timestamp()

INFO | Detected reference period frequency Y-DEC


Show the data.

In [48]:
morpcCountyEstimates.head()

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE,DATE
54,234305,0500000US39041,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY,2024-01-01
115,166534,0500000US39045,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY,2024-01-01
176,28792,0500000US39047,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY,2024-01-01
237,1328013,0500000US39049,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY,2024-01-01
298,27505,0500000US39073,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY,2024-01-01


In [49]:
extractedData.append(morpcCountyEstimates)

### Extract MORPC sub-county estimates

Because county estimates and sub-county estimates are generated (and regenerated) at different times, it is necessary to process each separately.

Create list of years from user-specified range.

In [50]:
morpcEstimatesRange = list(range(MORPC_ESTIMATE_YEAR_RANGE[0], MORPC_ESTIMATE_YEAR_RANGE[1]+1))
print("Including MORPC estimates for years: {}".format(", ".join([str(x) for x in morpcEstimatesRange])))

Including MORPC estimates for years: 2024


Extract MORPC county estimates from combined table.

In [51]:
morpcSubCountyEstimates = combinedPop.loc[
    (combinedPop["SOURCE"] == "MORPC") & 
    (combinedPop["VALUE_TYPE"] == "ESTIMATE") &
    (combinedPop["GEO_TYPE"] != "COUNTY")
].copy()

Reference period and vintage period were stored as strings, but for MORPC estimates we can treat them as integers. Convert them now.

In [52]:
morpcSubCountyEstimates = morpcSubCountyEstimates.astype({
    "REFERENCE_PERIOD":"int",
    "VINTAGE_PERIOD":"int"
})

Verify that data is available for the specified years.

In [53]:
if(not set(morpcEstimatesRange).issubset(set(morpcSubCountyEstimates["REFERENCE_PERIOD"]))):
    print("ERROR | Set of MORPC estimate years for which data is available does not match set derived from specified range.")
    print("Specified range: {}".format(MORPC_ESTIMATE_YEAR_RANGE))
    print("Specified set: {}".format(set(morpcEstimatesRange)))
    print("Available set: {}".format(set(morpcSubCountyEstimates["REFERENCE_PERIOD"])))
    raise RuntimeError
else:
    print("INFO | MORPC sub-county estimates data is available for all years in specified range.")

INFO | MORPC sub-county estimates data is available for all years in specified range.


Extract only the estimates for the specified years.

In [54]:
morpcSubCountyEstimates = morpc.extract_vintage(morpcSubCountyEstimates, refPeriods=morpcEstimatesRange)

Verify that all reference periods are present and that there is only a single vintage for each reference period.

In [55]:
for year in morpcSubCountyEstimates["REFERENCE_PERIOD"].unique():
    temp = morpcSubCountyEstimates.loc[morpcSubCountyEstimates["REFERENCE_PERIOD"] == year]
    print("{0}: {1}".format(year, ",".join(temp["VINTAGE_PERIOD"].unique().astype("str"))))

2024: 2024


Construct DATE field from reference period and reference period frequency.

In [56]:
if(morpcSubCountyEstimates["REFERENCE_PERIOD_FREQ"].unique().shape[0] == 1):
    freq = morpcSubCountyEstimates["REFERENCE_PERIOD_FREQ"].iat[0]
    print("INFO | Detected reference period frequency {}".format(freq))
else:
    print("ERROR | Multiple reference period frequencies are not supported.")
    raise RuntimeError

try:
    # Hopefully this works properly with newer versions of pandas, but it has not been tested
    periodIndex = pd.PeriodIndex(morpcSubCountyEstimates["REFERENCE_PERIOD"], freq=freq)
except:
    # This works with older versions of pandas.
    print("WARNING | Error occurred when attempting to create period index using 'Y-' format. Trying legacy 'A-' format.")
    periodIndex = pd.PeriodIndex(morpcSubCountyEstimates["REFERENCE_PERIOD"], freq=freq.replace("Y-","A-"))
morpcSubCountyEstimates["DATE"] = periodIndex.to_timestamp()

INFO | Detected reference period frequency Y-DEC


Show the data.

In [57]:
morpcSubCountyEstimates.head()

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE,DATE
13680,6367,0700000US390410577499999,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY-TOWNSHIP-REMAINDER,2024-01-01
13690,9414,0700000US390410578899999,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY-TOWNSHIP-REMAINDER,2024-01-01
13700,1467,0700000US390410942899999,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY-TOWNSHIP-REMAINDER,2024-01-01
13710,12489,0700000US390411814099999,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY-TOWNSHIP-REMAINDER,2024-01-01
13720,2225,0700000US390412144899999,2024,Y-DEC,2024,Y-DEC,ESTIMATE,,,,MORPC,COUNTY-TOWNSHIP-REMAINDER,2024-01-01


In [58]:
extractedData.append(morpcSubCountyEstimates)

### Extract MORPC forecasts

Create list of years from user-specified range.

In [59]:
morpcForecastsRange = list(range(MORPC_FORECAST_YEAR_RANGE[0], MORPC_FORECAST_YEAR_RANGE[1]+1, MORPC_FORECAST_YEAR_INTERVAL))
print("Including MORPC forecasts for years: {}".format(", ".join([str(x) for x in morpcForecastsRange])))

Including MORPC forecasts for years: 2025, 2030, 2035, 2040, 2045, 2050


Extract MORPC forecasts from combined table.

In [60]:
morpcForecasts = combinedPop.loc[(combinedPop["SOURCE"] == "MORPC") & (combinedPop["VALUE_TYPE"] == "FORECAST")].copy()

Reference period and vintage period were stored as strings, but for MORPC forecasts we can treat them as integers. Convert them now.

In [61]:
morpcForecasts = morpcForecasts.astype({
    "REFERENCE_PERIOD":"int",
    "VINTAGE_PERIOD":"int"
})

Verify that data is available for the specified years.

In [62]:
if(not set(morpcForecastsRange).issubset(set(morpcForecasts["REFERENCE_PERIOD"]))):
    print("ERROR | Set of MORPC forecast years for which data is available does not match set derived from specified range.")
    print("Specified range: {}".format(MORPC_FORECAST_YEAR_RANGE))
    print("Specified set: {}".format(set(morpcForecastsRange)))
    print("Available set: {}".format(set(morpcForecasts["REFERENCE_PERIOD"])))
    raise RuntimeError
else:
    print("INFO | MORPC forecasts data is available for all years in specified range.")

INFO | MORPC forecasts data is available for all years in specified range.


Extract only the estimates for the specified years.  The forecasts for different geographies come from different sources, so it is necessary to extract the latest vintage separately for each geography type.

In [63]:
extractIndex = []
for group in morpcForecasts["GEO_TYPE"].unique():
    temp = morpcForecasts.loc[morpcForecasts["GEO_TYPE"] == group].copy()
    temp = morpc.extract_vintage(temp, refPeriods=morpcForecastsRange)
    extractIndex += list(temp.index)
morpcForecasts = morpcForecasts.loc[extractIndex].copy()

For each geography type, verify that all reference periods are present and that there is only a single vintage for each reference period.

In [64]:
for group in morpcForecasts["GEO_TYPE"].unique():
    print(group)
    for year in morpcForecasts["REFERENCE_PERIOD"].unique():
        temp = morpcForecasts.loc[(morpcForecasts["REFERENCE_PERIOD"] == year) & (morpcForecasts["GEO_TYPE"] == group)]
        print("{0}: {1}".format(year, ",".join(temp["VINTAGE_PERIOD"].unique().astype("str"))))

COUNTY
2025: 2023
2030: 2023
2035: 2023
2040: 2023
2045: 2023
2050: 2023
COUNTY-TOWNSHIP-REMAINDER
2025: 2024
2030: 2024
2035: 2024
2040: 2024
2045: 2024
2050: 2024
PLACE
2025: 2024
2030: 2024
2035: 2024
2040: 2024
2045: 2024
2050: 2024


Construct DATE field from reference period and reference period frequency.

In [65]:
if(morpcForecasts["REFERENCE_PERIOD_FREQ"].unique().shape[0] == 1):
    freq = morpcForecasts["REFERENCE_PERIOD_FREQ"].iat[0]
    print("INFO | Detected reference period frequency {}".format(freq))
else:
    print("ERROR | Multiple reference period frequencies are not supported.")
    raise RuntimeError

try:
    # Hopefully this works properly with newer versions of pandas, but it has not been tested
    periodIndex = pd.PeriodIndex(morpcForecasts["REFERENCE_PERIOD"]+1, freq=freq)
except:
    # This works with older versions of pandas. Note the +1 offset. For whatever reason, pandas converts 2000 Y-JUN (for example)
    # to a timestamp of 1999-07-01 so we add one year to the reference period so the timestamp becomes 2000-07-01.
    print("WARNING | Error occurred when attempting to create period index using 'Y-' format. Trying legacy 'A-' format.")
    periodIndex = pd.PeriodIndex(morpcForecasts["REFERENCE_PERIOD"]+1, freq=freq.replace("Y-","A-"))
morpcForecasts["DATE"] = periodIndex.to_timestamp()

INFO | Detected reference period frequency Y-JUN


Show the data.

In [66]:
morpcForecasts.head()

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE,DATE
55,247016,0500000US39041,2025,Y-JUN,2023,,FORECAST,0.8,248441,238464,MORPC,COUNTY,2025-07-01
116,169183,0500000US39045,2025,Y-JUN,2023,,FORECAST,0.8,171704,163990,MORPC,COUNTY,2025-07-01
177,29419,0500000US39047,2025,Y-JUN,2023,,FORECAST,0.8,29855,28912,MORPC,COUNTY,2025-07-01
238,1390127,0500000US39049,2025,Y-JUN,2023,,FORECAST,0.8,1409654,1333445,MORPC,COUNTY,2025-07-01
299,27965,0500000US39073,2025,Y-JUN,2023,,FORECAST,0.8,28660,27122,MORPC,COUNTY,2025-07-01


In [67]:
extractedData.append(morpcForecasts)

### Combine extracted data

In [68]:
combinedData = pd.concat(extractedData, axis="index")

In [69]:
combinedData

Unnamed: 0,POP,GEOIDFQ,REFERENCE_PERIOD,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LEVEL,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,SOURCE,GEO_TYPE,DATE
1,111759,0500000US39041,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
62,123485,0500000US39045,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
123,28495,0500000US39047,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
184,1072018,0500000US39049,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
245,28262,0500000US39073,2000,Y-JUN,2012,,ESTIMATE,,,,CENINT,COUNTY,2000-07-01
...,...,...,...,...,...,...,...,...,...,...,...,...,...
30534,5338,1600000US3983580,2050,Y-JUN,2024,,FORECAST,,,,MORPC,PLACE,2050-07-01
30684,210,1600000US3984182,2050,Y-JUN,2024,,FORECAST,,,,MORPC,PLACE,2050-07-01
30738,23060,1600000US3984742,2050,Y-JUN,2024,,FORECAST,,,,MORPC,PLACE,2050-07-01
30792,1032,1600000US3985414,2050,Y-JUN,2024,,FORECAST,,,,MORPC,PLACE,2050-07-01


### Compute totals for 15-county region

In [70]:
regionTotals = combinedData.copy()
regionTotals["SUMLEVEL"] = regionTotals["GEOIDFQ"].apply(lambda x:x[0:3])
regionTotals = regionTotals.loc[regionTotals["SUMLEVEL"] == morpc.SUMLEVEL_LOOKUP["COUNTY"]].copy().drop(columns="SUMLEVEL")
regionTotals = regionTotals.drop(columns=["GEOIDFQ","GEO_TYPE","SOURCE","CONF_LEVEL"]).groupby(["REFERENCE_PERIOD","DATE"]).agg({
    "POP":"sum",
    "REFERENCE_PERIOD_FREQ":"first",
    "VINTAGE_PERIOD":"first",
    "VINTAGE_PERIOD_FREQ":"first",
    "VALUE_TYPE":"first",
    "CONF_LIMIT_UPPER":"sum",
    "CONF_LIMIT_LOWER":"sum"
}).reset_index()
regionTotals["GEOIDFQ"] = "M010000US001"
regionTotals["GEO_TYPE"] = morpc.HIERARCHY_STRING_LOOKUP["M01"]
regionTotals["SOURCE"] = "MORPC"
regionTotals.loc[regionTotals["VALUE_TYPE"] == "ESTIMATE", ["CONF_LIMIT_UPPER","CONF_LIMIT_LOWER"]] = None
regionTotals["CONF_LEVEL"] = None
regionTotals["CONF_LEVEL"] = regionTotals["CONF_LEVEL"].astype("float")
regionTotals.head()

Unnamed: 0,REFERENCE_PERIOD,DATE,POP,REFERENCE_PERIOD_FREQ,VINTAGE_PERIOD,VINTAGE_PERIOD_FREQ,VALUE_TYPE,CONF_LIMIT_UPPER,CONF_LIMIT_LOWER,GEOIDFQ,GEO_TYPE,SOURCE,CONF_LEVEL
0,2000,2000-07-01,1950805,Y-JUN,2012,,ESTIMATE,,,M010000US001,REGION15,MORPC,
1,2001,2001-07-01,1976410,Y-JUN,2012,,ESTIMATE,,,M010000US001,REGION15,MORPC,
2,2002,2002-07-01,1997739,Y-JUN,2012,,ESTIMATE,,,M010000US001,REGION15,MORPC,
3,2003,2003-07-01,2022581,Y-JUN,2012,,ESTIMATE,,,M010000US001,REGION15,MORPC,
4,2004,2004-07-01,2043896,Y-JUN,2012,,ESTIMATE,,,M010000US001,REGION15,MORPC,


In [71]:
combinedData = pd.concat([combinedData, regionTotals], axis="index")

### Reformat combined data for output

Create a working dataframe to prepare for export.

In [72]:
insightsPop = combinedData.copy()

Merge the geography name and the class FIPS code from the geography lookup table, aligning on fully-qualified GEOID.

In [73]:
insightsPop = insightsPop.merge(geoLookup.reset_index()[["GEOIDFQ","NAME","SUMLEVEL"]], on="GEOIDFQ")

In [74]:
townshipsTemp = insightsPop.loc[insightsPop["SUMLEVEL"] == morpc.SUMLEVEL_LOOKUP["COUNTY-TOWNSHIP-REMAINDER"]].copy()
townshipsTemp["NAME"] = townshipsTemp["NAME"] + " Township"
insightsPop.update(townshipsTemp, overwrite=True)

In [75]:
countyTemp = insightsPop.loc[insightsPop["SUMLEVEL"] == morpc.SUMLEVEL_LOOKUP["COUNTY"]].copy()
countyTemp["NAME"] = countyTemp["NAME"] + " County"
insightsPop.update(countyTemp, overwrite=True)

In [76]:
insightsPop = insightsPop.filter(items=["POP","GEOIDFQ","GEO_TYPE","NAME","DATE","VALUE_TYPE","CONF_LIMIT_UPPER","CONF_LIMIT_LOWER","SOURCE","VINTAGE_PERIOD"], axis="columns")

In [77]:
insightsPop = insightsPop.pivot(index=["GEOIDFQ","GEO_TYPE","NAME","DATE","CONF_LIMIT_UPPER","CONF_LIMIT_LOWER","SOURCE","VINTAGE_PERIOD"], columns="VALUE_TYPE", values="POP").reset_index()

In [78]:
insightsPop = insightsPop.rename(columns={
    "NAME":"Name",
    "GEO_TYPE":"Geography type",
    "DATE":"Date",
    "ESTIMATE":"Historical",
    "FORECAST":"Forecast",
    "CONF_LIMIT_UPPER":"Confidence limit (upper)",
    "CONF_LIMIT_LOWER":"Confidence limit (lower)",
    "SOURCE":"Source",
    "VINTAGE_PERIOD":"Vintage year"
})

In [79]:
insightsPop = insightsPop.filter(items=insightsPopSchema.field_names, axis="columns")

In [80]:
insightsPop = morpc.cast_field_types(insightsPop, insightsPopSchema)

frictionless_cast_field_types | INFO | Casting field GEOIDFQ as type string.
frictionless_cast_field_types | INFO | Casting field Name as type string.
frictionless_cast_field_types | INFO | Casting field Geography type as type string.
frictionless_cast_field_types | INFO | Casting field Date as type date.
frictionless_cast_field_types | INFO | Casting field Historical as type integer.
frictionless_cast_field_types | INFO | Casting field Forecast as type integer.
frictionless_cast_field_types | INFO | Casting field Confidence limit (upper) as type integer.
frictionless_cast_field_types | INFO | Casting field Confidence limit (lower) as type integer.
frictionless_cast_field_types | INFO | Casting field Source as type string.
frictionless_cast_field_types | INFO | Casting field Vintage year as type integer.


In [81]:
insightsPop.head()

Unnamed: 0,GEOIDFQ,Name,Geography type,Date,Historical,Forecast,Confidence limit (upper),Confidence limit (lower),Source,Vintage year
0,0500000US39041,Delaware County,COUNTY,2000-07-01,111759,,,,CENINT,2012
1,0500000US39041,Delaware County,COUNTY,2001-07-01,118646,,,,CENINT,2012
2,0500000US39041,Delaware County,COUNTY,2002-07-01,126172,,,,CENINT,2012
3,0500000US39041,Delaware County,COUNTY,2003-07-01,133842,,,,CENINT,2012
4,0500000US39041,Delaware County,COUNTY,2004-07-01,141348,,,,CENINT,2012


In [82]:
insightsPop = insightsPop.sort_values(["Geography type","Name","Date"])

## Export data

In [83]:
insightsPop.to_csv(INSIGHTS_POP_TABLE_PATH, index=False)

## Create resource file for exported data

In [84]:
insightsPopResource = morpc.frictionless_create_resource(INSIGHTS_POP_TABLE_FILENAME, 
    resourcePath=INSIGHTS_POP_TABLE_RESOURCE_PATH,
    title="MORPC Insights | Historic and Forecasted Population by Year", 
    name="morpc_insights_pop_temporal", 
    description="This dataset provides the best available historical and forecasted population estimates for the Central Ohio region and the counties and communities therein.  Estimates are compiled from a variety of sources including the following: {}.  Note that different sources provide estimates as of different days throughout the year, so be sure to note the entire date in the REFERENCE_PERIOD field.  The VINTAGE_PERIOD field refers to the time that the estimate was produced or released.  Earlier and (perhaps) later vintages may be available from the original sources.".format(", ".join(["{1} ({0})".format(value, key) for key, value in SOURCE_MAP.items()])),
    writeResource=True,
    validate=True
)

morpc.frictionless_create_resource | INFO | Format not specified. Using format derived from data file extension: csv
morpc.frictionless_create_resource | INFO | Schema path not specified. Using path derived from data file path: morpc-insights-pop-temporal.schema.yaml
morpc.frictionless_create_resource | INFO | Writing Frictionless Resource file to output_data\morpc-insights-pop-temporal.resource.yaml
morpc.frictionless_create_resource | INFO | Validating resource on disk.
morpc.frictionless_validate_resource | INFO | Validating resource on disk (including data and schema). This may take some time.
morpc.frictionless_validate_resource | INFO | Resource is valid


## Generate static charts

In [91]:
%matplotlib agg

for f in os.scandir(chartDir):
    os.remove(f)

for geoid in insightsPop["GEOIDFQ"].unique():
    temp = insightsPop.loc[insightsPop["GEOIDFQ"] == geoid].copy()
    temp['year'] = temp["Date"].dt.year
    #temp = temp.loc[(temp['year'] % 5 == 0) | (temp['year'].isin(range(MORPC_ESTIMATE_YEAR_RANGE[0], MORPC_ESTIMATE_YEAR_RANGE[1]+1)))]
    forecastVintages = list(temp.loc[temp["Forecast"].notna(), "Vintage year"].unique())
    if(len(forecastVintages) > 1):
        forecastVintage = "multiple"
    elif(len(forecastVintages) == 0):
        forecastVintage = None
    else: 
        forecastVintage = forecastVintages[0]
    maxPop = temp[["Historical","Forecast","Confidence limit (upper)"]].max().max()
    minPop = temp[["Historical","Forecast","Confidence limit (upper)"]].min().min()
    if(temp.shape[0] == 1):
        continue
    if(maxPop < MAX_POP_THRESHOLD):
        continue
    if(minPop < MIN_POP_THRESHOLD):
        print(minPop)
        continue
    geoName = temp.iloc[0]["Name"]
    geoType = temp.iloc[0]["Geography type"]
    geoLabel = "Population - {}{}".format(geoName, GEO_TYPE_LABELS[geoType]) 
    PLOTWIDTH = 8
    fig,ax = plt.subplots(figsize=(PLOTWIDTH,PLOTWIDTH/16*9))
    if(not temp["Confidence limit (upper)"].isnull().all()):
        temp.plot(ax=ax, x="Date", y="Confidence limit (upper)", label="Confidence limit", color="grey")
        temp.plot(ax=ax, x="Date", y="Confidence limit (lower)", legend=False, color="grey")
    temp.plot(ax=ax, x="Date", y="Historical", marker="o", color=morpc.CONST_MORPC_COLORS["darkblue"], label="Historical best estimate")
    if(forecastVintage != None):
        temp.plot(ax=ax, x="Date", y="Forecast", marker="o", color=morpc.CONST_MORPC_COLORS["darkgreen"])
    ax.set_title(geoLabel, fontsize=14)
    ax.set_xlabel(None)
    ax.set_ylabel(None)
    ax.grid(visible=True, color="lightgrey")
    ax.set_ylim(ymin=0, ymax=maxPop*1.1)
    ax.get_yaxis().set_major_formatter(matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x), ',')))
    ax.figure.savefig(os.path.join(chartDir, "{}.svg".format(geoid)))
    plt.close(ax.figure)

%matplotlib inline

0
0


## Generate Insights catalog content

In [None]:
columnNames=["TileID","TilesetID","GeographyType","GeographyName","Category","Headline","Commentary","ThumbnailURL","Contributor","Vintage","UpdateInterval","ShareURL","DataProductURL","MoreInformationURL"]

In [None]:
catalog = insightsPop[["GEOIDFQ","Name","Geography type"]].copy() \
    .groupby("GEOIDFQ").first() \
    .reset_index() \
    .rename(columns={"Name":"GeographyName","Geography type":"GeographyType"})
catalog["GeographyType"] = catalog["GeographyType"].map({
    "REGION15":"Region",
    "COUNTY":"County",
    "COUNTY-TOWNSHIP-REMAINDER":"Community",
    "PLACE":"Community"
})
catalog["TileID"] = None
catalog["TilesetID"] = None
catalog["Category"] = None
catalog["Headline"] = "TBD"
catalog["Commentary"] = "TBD"
catalog["ThumbnailURL"] = catalog["GEOIDFQ"].apply(lambda geoid:"https://raw.githubusercontent.com/morpc-insights/pop-temporal/refs/heads/main/output_data/charts/{}.svg".format(geoid))
catalog["Contributor"] = "Mid-Ohio Regional Planning Commission"
catalog["Vintage"] = str(datetime.date.today().year)
catalog["UpdateInterval"] = "annually"
catalog["ShareURL"] = None
catalog["DataProductURL"] = catalog["GeographyName"].apply(lambda GeographyName:"https://morpc.maps.arcgis.com/apps/dashboards/d1291225631545438293bdfcfffaef6b#region={}".format(GeographyName.replace(" ","%20")))
temp["MoreInformationURL"] = "https://raw.githubusercontent.com/morpc-insights/pop-temporal/refs/heads/main/fact_sheets/population.pdf"
temp = catalog.loc[catalog["GeographyType"].isin(["COUNTY","REGION15"])].copy()
temp["MoreInformationURL"] = temp["GEOIDFQ"].apply(lambda geoid:"https://raw.githubusercontent.com/morpc-insights/pop-temporal/refs/heads/main/fact_sheets/{}.pdf".format(geoid))
catalog.update(temp)
catalog = catalog.filter(items=columnNames, axis="columns")
catalog.head()

In [None]:
catalog.to_excel("catalog.xlsx", index=False)