# Person Record File Workflow

## Overview
Functions to obtain and clean data required for the Person Record File in Python using Census API. 

The workflow produces a Person Record (PREC) file that can be linked to the Housing Unit Inventory. The file includes person level records with sex, age, race and ethnicity.

Based on block level data from the 
2010 Census. 

The output of this workflow is a CSV file with the person record file.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE) for the housing unit allocation model.

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

Users can modify the code to select one county or multiple counties.

## Description of Program
- program:    ncoda_07dv1_run_PREC_workflow
- task:       Obtain and clean data for Person Record File.
- See github commits for description of program updates
- Current Version:    2022-12-02
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE), Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

## Step 1: Select the County or Counties
The person record functions run for individual counties in the United States. 
A county is selected by entering the county FIPS code in the data dictionary below.
Multiple counties can be selected by entering a list of county FIPS codes in the data dictionary.
- For each county include the county name, this is used in the codebook.
- For each community (1 county or a group of counties) include the community name, this is used in the codebook.


### Example of data dictionary for one community with one county
```
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}}}
```

### Example of data dictionary for one community with multiple counties
```
communities = {'Joplin_MO' : {
                    'community_name' : 'Joplin, MO',
                    'counties' : { 
                        1 : {'FIPS Code' : '29097', 'Name' : 'Jasper County, MO'},
                        2 : {'FIPS Code' : '29145', 'Name' : 'Newton County, MO'}}}}
```

### How to locate your county FIPS code:
- County FIPS codes can be found online at [USDA County FIPS codes](https://www.nrcs.usda.gov/wps/portal/nrcs/detail/national/home/?cid=nrcs143_013697)
- Google search "FIPS code for county [county name]"

## Step 2: Edit the data dictionary
Modify the data dictionary below with your county FIPS Code, name and community name.

In [None]:
# Example of data dictionary for one community with one county
communities = {'Beaumont_TX' : {
                    'community_name' : 'Beaumont, TX',
                    'counties' : { 
                        1 : {'FIPS Code' : '48245', 'Name' : 'Jefferson County, TX'}}}}

## Step 3: Run all of the code blocks in the notebook
To run all of the code blocks in the notebook, find the "Run All" option:
1. If there is a "Run All" button at the top of the notebook then click it.
2. Else, in the "Run" menu, select "Run All Cells"

After all of the code runs (approximately 2 minutes per 50,000 people), the output files (CSV and codebook) will be generated and saved in the folder "OutputData" in the directory where the notebook is saved.

The notebook produces a log file which can be reviewed to see the full workflow process. The workflow depends on internet access to the Census API, which is a publicly available service.


### Setup notebook environment to access Cloned Github Package
This notebook uses functions that are in development. The current version of the package is available at:

https://github.com/npr99/intersect-community-data

Nathanael Rosenheim. (2022). npr99/intersect-community-data. Zenodo. https://doi.org/10.5281/zenodo.6476122

A permanent copy of the package and example datasets are available in the DesignSafe-CI repository:

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [None]:
import numpy as np
import pandas as pd
import os # For saving output to path
import sys
import scooby # Reports Python environment

In [None]:
# Generate report of Python environment
print(scooby.Report(additional=['pandas']))

In [None]:
#To replicate this notebook Clone the Github Package to a folder that is a sibling of this notebook.
# To access the sibling package you will need to append the parent directory ('..') to the system path list.
# append the path of the directory that includes the github repository.
# This step is not required when the package is in a folder below the notebook file.
github_code_path  = ""
sys.path.append(github_code_path)

In [None]:
os.getcwd()

In [None]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_07e_generate_prec import generate_prec_functions

In [None]:
version = '3.0.0'
version_text = 'v3-0-0'

# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - 
# this helps to follow the overall workflow
outputfolder = "OutputData"
# Make directory to save output
if not os.path.exists(outputfolder):
    os.mkdir(outputfolder)

# Set random seed for reproducibility
seed = 1000
basevintage = 2010

generate_prec_df = generate_prec_functions(
                    communities =   communities,
                    seed =          seed,
                    version =       version,
                    version_text=   version_text,
                    basevintage=    basevintage,
                    outputfolder=   outputfolder
                    )

prec_df = generate_prec_df.generate_prec_v300()


In [None]:
prec_df.head()

In [None]:
from pyncoda.ncoda_04c_poptableresults import *

In [None]:
for community in communities.keys():
    community_name = communities[community]['community_name']
print(community_name)

In [None]:
PopResultsTable.pop_results_table(
                  input_df = prec_df, 
                  who = "Total Population by Persons", 
                  what = "by Race, Ethnicity",
                  where = community_name,
                  when = "2010",
                  row_index = "Race Ethnicity",
                  col_index = 'Hispanic')

In [None]:
# List of all communities available in IN-CORE
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}},                   
                'Shelby_TN' : {
                    'community_name' : 'Memphis, TN',
                    'counties' : { 
                        1 : {'FIPS Code' : '47157', 'Name' : 'Shelby County, TN'}}},
                'Joplin_MO' : {
                    'community_name' : 'Joplin, MO',
                    'counties' : { 
                        1 : {'FIPS Code' : '29097', 'Name' : 'Jasper County, MO'},
                        2 : {'FIPS Code' : '29145', 'Name' : 'Newton County, MO'}}},
                'Seaside_OR' : {
                    'community_name' : 'Seaside, OR',
                    'counties' : { 
                        1 : {'FIPS Code' : '41007', 'Name' : 'Clatsop County, OR'}}},                   
                'Galveston_TX' : {
                    'community_name' : 'Galveston, TX',
                    'counties' : { 
                        1 : {'FIPS Code' : '48167', 'Name' : 'Galveston County, TX'}}},
                'Mobile_AL' : {
                    'community_name' : 'Mobile, AL',
                    'counties' : { 
                        1 : {'FIPS Code' : '01097', 'Name' : 'Mobile County, AL'}}}                    
                }