# Housing Unit Allocation with Person Record File Full Workflow

## Overview
This code works with the National Structures Inventory to run the housing unit allocation (HUA) and the person record file (PREC) workflow.
The HUA process is generalizable to any county in the United States. The HUA process will work for any file that has locations of structures and some basic information about the buildings.
The process is designed to work with [IN-CORE](https://incore.ncsa.illinois.edu/), a community resilience modeling environment.
Using IN-CORE requires an account and access to the IN-CORE Dataservice.

Functions are provided to obtain and clean data required for the version 2 Housing Unit Allocation. 

## Required Inputs
Program requires the following inputs:
If using the National Structures Inventory there are no required inputs.
    
## Output Description
The output of this workflow is a CSV file with the housing unit inventory allocated to a building inventory using the housing unit allocation model.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE).

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

## Description of Program
- program:    ncoda_07fv1_HUA_PREC_NSI
- task:       Start with NSI building inventory, run housing unit allocation algorithm, and then run person record file algorithm
- See github commits for description of program updates
- Current Version: v1 - 
- 2024-02-20 - Combine code from 07c, 07d, and 07e into one notebook
- 2024-05-22 - removed the drop down menu, did not work consistently
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE), Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

## Required Citations:
Rosenheim, Nathanael, Roberto Guidotti, Paolo Gardoni & Walter Gillis Peacock. (2021). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. _Sustainable and Resilient Infrastructure_. 6(6), 385-401. https://doi.org/10.1080/23789689.2019.1681821

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” _DesignSafe-CI_. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [1]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
from pyncoda.ncoda_00g_community_options import *
from IPython.display import display

### How to set up the Community Dictionary
Please review the python code in the file pyncoda/ncoda_00g_community_options.py

In this file you will find a collection of data dictionaries with various ways to setup the inputs for the Housing Unit Allocation process. 

The basic dictionary includes the name of the community, the county FIPS code, your input building inventory file, and key variables in the building inventory file.

In [2]:
# select a community from this list
# if your community is not in this list, add it to the file ncoda_00g_community_options.py
list_community_options(communities_dictionary)

['Lumberton, NC: IN-CORE Building inventory for Robeson County, NC',
 'Galveston, TX: IN-CORE Building inventory for Galveston County, TX',
 'Galveston, TX: NSI Building inventory for Galveston County, TX',
 'Galveston, TX: IN-CORE Building inventory for Galveston Island, TX',
 'Mayfield, KY: NSI Building inventory for Graves County, KY',
 'Beaumont, TX: NSI Building inventory for Jefferson County, TX',
 'Beaumont, TX: Safayet Building inventory for Jefferson County, TX',
 'Pentwater, MI: NSI Building inventory for Oceana County, MI',
 'Seaside, OR: NSI Building inventory for Clatsop County, OR',
 'Lane County, OR: NSI Building inventory for Lane County, OR',
 'Benton County, OR: NSI Building inventory for Benton County, OR',
 'Southeast Texas Urban Integrated Field Lab: NSI Building inventory for Southeast Texas',
 'Brazos County, TX: NSI Building inventory for Brazos County, TX']

In [3]:
community_id_by_name = 'Seaside, OR: NSI Building inventory for Clatsop County, OR'

In [4]:
community_id, focalplace, countyname, countyfips = get_community_id_by_name(community_id_by_name)
communities = {community_id : communities_dictionary[community_id]}

Selected community ID: Seaside_OR_NSI
Seaside, OR is in OREGON
Focal place: Seaside
Seaside, OR is in Clatsop County, OR with FIPS code 41007
Use IN-CORE: False


## Setup Python Environment

In [5]:
import pandas as pd
import geopandas as gpd # For reading in shapefiles
import numpy as np
import sys # For displaying package versions
import os # For managing directories and file paths if drive is mounted
import scooby # Reports Python environment

import contextily as cx # For adding basemap tiles to plot
import matplotlib.pyplot as plt # For plotting and making graphs

In [6]:
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_00d_cleanvarsutils import *
from pyncoda.ncoda_04c_poptableresults import *
from pyncoda.ncoda_07i_process_communities import process_community_workflow

In [7]:
# Generate report of Python environment
base_packages = ['pandas','ipyleaflet','seaborn','contextily']
incore_packages = ['pyincore','pyincore_viz']
check_packages = base_packages + incore_packages
print(scooby.Report(additional=check_packages))


--------------------------------------------------------------------------------
  Date: Thu Feb 20 13:30:39 2025 Central Standard Time

                OS : Windows (10 10.0.22631 SP0 Multiprocessor Free)
            CPU(s) : 16
           Machine : AMD64
      Architecture : 64bit
               RAM : 31.7 GiB
       Environment : Jupyter

  Python 3.10.14 | packaged by Anaconda, Inc. | (main, May  6 2024, 19:44:50)
  [MSC v.1916 64 bit (AMD64)]

            pandas : 2.2.2
        ipyleaflet : Module not found
           seaborn : 0.13.2
        contextily : 1.6.0
          pyincore : Module not found
      pyincore_viz : Module not found
             numpy : 1.26.4
             scipy : 1.13.1
           IPython : 8.25.0
        matplotlib : 3.8.4
            scooby : 0.10.0

  Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303
  for Intel(R) 64 architecture applications
--------------------------------------------------------------------------------


In [8]:
# Check working directory - good practice for relative path access
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\GitHub\\intersect-community-data'

## Run Base Housing Unit Inventory for 2020

In [9]:
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_00b_directory_design import directory_design
from pyncoda.CommunitySourceData.api_census_gov.acg_01a_BaseInventory import BaseInventory

seed = 9876
version = '2.0.0'
version_text = 'v2-0-0'
basevintage = 2010
outputfolder ="OutputData"
outputfolders = {}
savefiles = True
use_incore = False

In [10]:
mutually_exclusive_varstems_roots_dictionary_lists = {}
new_char_dictionaries = {}
new_char_dictionaries['family'] = {}
new_char_dictionaries['Hispanic'] = {}

In [11]:
from pyncoda.CommunitySourceData.api_census_gov.acg_00b_hui_block2010 import *
from pyncoda.CommunitySourceData.api_census_gov.acg_00c_hispan_block2010 import *

mutually_exclusive_varstems_roots_dictionary_lists[2010] = [tenure_size_H16_varstem_roots,
                                                    vacancy_status_H5_varstem_roots,
                                                    group_quarters_P42_varstem_roots]

new_char_dictionaries['family'][2010] = [family_byrace_P18_varstem_roots]
new_char_dictionaries['Hispanic'][2010] = [tenure_size_H16HAI_varstem_roots,
                                        hispan_byrace_H7_varstem_roots,
                                        tenure_byhispan_H15_varstem_roots
                                        ]

In [12]:
# updated for 2020
from pyncoda.CommunitySourceData.api_census_gov.acg_00b_hui_block2020 import *
from pyncoda.CommunitySourceData.api_census_gov.acg_00c_hispan_block2020 import *

mutually_exclusive_varstems_roots_dictionary_lists[2020] = [tenure_size_H12_2020_varstem_roots,
                                                    vacancy_status_H5_2020_varstem_roots,
                                                    group_quarters_P18_2020_varstem_roots]

new_char_dictionaries['family'][2020] = [family_byrace_P16_2020_varstem_roots]
new_char_dictionaries['Hispanic'][2020] = [tenure_size_H12HAI_2020_varstem_roots,
                                        hispan_byrace_H7_2020_varstem_roots,
                                        tenure_byhispan_H11_2020_varstem_roots
                                        ]


In [43]:
vintage = 2010
for community in communities.keys():

    # Create empty container to store outputs for in-core
    # Will use these to combine multiple counties
    hui_incore_county_df = {}
    title = "Housing Unit Inventory v2.0.0 data for "+communities[community]['community_name']
    print("Generating",title)
    output_filename = f'hui_{version_text}_{community}_{basevintage}_rs{seed}'

    county_list = ''

    # set output folder
    outputfolders = directory_design(
                        state_county_name = community,
                        outputfolder = outputfolder)

    # community dictionary
    community_dict = communities[community]
    use_incore = community_dict['building_inventory']['use_incore']


    # Workflow for generating HUI data for IN-CORE
    for county in communities[community]['counties'].keys():
        state_county = communities[community]['counties'][county]['FIPS Code']
        state_county_name  = communities[community]['counties'][county]['Name']
        print(state_county_name,': county FIPS Code',state_county)
        county_list = county_list + state_county_name+': county FIPS Code '+state_county

        print("Generating HUI data for",state_county_name)

        # create output folders for hui data generation
        outputfolders = directory_design(state_county_name = community,
                                            outputfolder = outputfolder)
                          
        # Generate base housing unit inventory
        block_df = {}
        block_df['core'] = BaseInventory.get_apidata(state_county = state_county, 
                                                    geo_level = 'Block',
                                                    vintage = str(vintage),
                                                    mutually_exclusive_varstems_roots_dictionaries =
                                                                        mutually_exclusive_varstems_roots_dictionary_lists[vintage],
                                                    outputfolders = outputfolders,
                                                    outputfile = "CoreHUI")
        
        block_df['family'] = BaseInventory.graft_on_new_char(base_inventory= block_df['core'],
                                                            state_county = state_county,
                                                            new_char = 'family',
                                                            new_char_dictionaries = new_char_dictionaries['family'][vintage],
                                                            basevintage = str(vintage), 
                                                            basegeolevel = 'Block',
                                                            outputfile = "hui",
                                                            outputfolders = outputfolders)

       
        block_df['hispan'] = BaseInventory.graft_on_new_char(base_inventory= block_df['family'],
                                                        state_county = state_county,
                                                        new_char = 'hispan',
                                                        new_char_dictionaries = new_char_dictionaries['Hispanic'][vintage],
                                                        basevintage = str(vintage), 
                                                        basegeolevel = 'Block',
                                                        outputfile = "hui",
                                                        outputfolders = outputfolders)


Generating Housing Unit Inventory v2.0.0 data for Seaside, OR
Clatsop County, OR : county FIPS Code 41007
Generating HUI data for Clatsop County, OR
{'top': 'OutputData/Seaside_OR_NSI', 'logfiles': 'OutputData/Seaside_OR_NSI/00_logfiles', 'CommunitySourceData': 'OutputData/Seaside_OR_NSI/01_CommunitySourceData', 'TidyCommunitySourceData': 'OutputData/Seaside_OR_NSI/02_TidyCommunitySourceData', 'BaseInventory': 'OutputData/Seaside_OR_NSI/03_BaseInventory', 'RandomMerge': 'OutputData/Seaside_OR_NSI/04_RandomMerge', 'Verify': 'OutputData/Seaside_OR_NSI/05_Verify', 'Explore': 'OutputData/Seaside_OR_NSI/06_Explore', 'Uncertainty_propagation': 'OutputData/Seaside_OR_NSI/07_Uncertainty_propagation', 'Validation': 'OutputData/Seaside_OR_NSI/08_Validation'}
['numprec', 'ownershp', 'family', 'byracehispan']

**********************************
Obtain data from Census API TENURE BY HOUSEHOLD SIZE
    Obtaining data for H016 TENURE BY HOUSEHOLD SIZE by Total
       Census API data from: https://api

In [44]:
block_df['core'].head()

Unnamed: 0,huid,Block2010,Block2010str,numprec,ownershp,family,race,hispan,vacancy,gqtype,hu_counter
0,B410079501001001H001,410079501001001,B410079501001001,1.0,1.0,0.0,1.0,0.0,,,1
1,B410079501001001H002,410079501001001,B410079501001001,2.0,1.0,-999.0,1.0,0.0,,,2
2,B410079501001001H003,410079501001001,B410079501001001,2.0,2.0,-999.0,1.0,0.0,,,3
3,B410079501001003H001,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,,1
4,B410079501001003H002,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,,2


In [47]:
block_df['family'].head()

Unnamed: 0,huid,Block2010,Block2010str,numprec,ownershp,family,race,hispan,vacancy,gqtype,hu_counter,family_flag,family_flagset,totalprob_family,familybyP18,hucount_familybyP18,sumby_familybyP18,prob_familybyP18,familybyP18_counter,hucount_familybyP18updated
0,B410079501001001H001,410079501001001,B410079501001001,1.0,1.0,0.0,1.0,0.0,,,1.0,family set to 0 by core hui,1.0,0.0,,,,,,
1,B410079501001003H001,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,,1.0,family set to 0 by core hui,1.0,0.0,,,,,,
2,B410079501001003H002,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,,2.0,family set to 0 by core hui,1.0,0.0,,,,,,
3,B410079501001004H001,410079501001004,B410079501001004,1.0,2.0,0.0,1.0,0.0,,,1.0,family set to 0 by core hui,1.0,0.0,,,,,,
4,B410079501001004H002,410079501001004,B410079501001004,1.0,2.0,0.0,1.0,0.0,,,2.0,family set to 0 by core hui,1.0,0.0,,,,,,


In [48]:
block_df['hispan'].head()

Unnamed: 0,huid,Block2010,Block2010str,numprec,ownershp,family,race,hispan,vacancy,gqtype,...,hispanbyH015,hucount_hispanbyH015,sumby_hispanbyH015,prob_hispanbyH015,hispanbyH16HAI_counter,hispanbyH007_counter,hispanbyH015_counter,hucount_hispanbyH16HAIupdated,hucount_hispanbyH007updated,hucount_hispanbyH015updated
0,B410079501001001H001,410079501001001,B410079501001001,1.0,1.0,0.0,1.0,0.0,,,...,,,,,,,,,,
1,B410079501001003H001,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,,...,,,,,,,,,,
2,B410079501001003H002,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,,...,,,,,,,,,,
3,B410079501001004H001,410079501001004,B410079501001004,1.0,2.0,0.0,1.0,0.0,,,...,,,,,,,,,,
4,B410079501001004H002,410079501001004,B410079501001004,1.0,2.0,0.0,1.0,0.0,,,...,,,,,,,,,,


In [49]:
from pyncoda.ncoda_04c_poptableresults import *
# add race ethnicity to data frame for better map legends
hui_race_df = PopResultsTable.add_race_ethnicity_to_pop_df(block_df['hispan'])

In [50]:
hui_race_df.head()

Unnamed: 0,huid,Block2010,Block2010str,numprec,ownershp,family,race,hispan,vacancy,gqtype,...,hucount_hispanbyH015,sumby_hispanbyH015,prob_hispanbyH015,hispanbyH16HAI_counter,hispanbyH007_counter,hispanbyH015_counter,hucount_hispanbyH16HAIupdated,hucount_hispanbyH007updated,hucount_hispanbyH015updated,Race Ethnicity
0,B410079501001001H001,410079501001001,B410079501001001,1.0,1.0,0.0,1.0,0.0,,0.0,...,,,,,,,,,,"1 White alone, Not Hispanic"
1,B410079501001003H001,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,0.0,...,,,,,,,,,,"1 White alone, Not Hispanic"
2,B410079501001003H002,410079501001003,B410079501001003,1.0,2.0,0.0,1.0,0.0,,0.0,...,,,,,,,,,,"1 White alone, Not Hispanic"
3,B410079501001004H001,410079501001004,B410079501001004,1.0,2.0,0.0,1.0,0.0,,0.0,...,,,,,,,,,,"1 White alone, Not Hispanic"
4,B410079501001004H002,410079501001004,B410079501001004,1.0,2.0,0.0,1.0,0.0,,0.0,...,,,,,,,,,,"1 White alone, Not Hispanic"


In [51]:
where = communities[community_id]['community_name']
print(where, focalplace, countyname, countyfips)

Seaside, OR Seaside Clatsop County, OR 41007


In [52]:
PopResultsTable.pop_results_table(
                  input_df = hui_race_df, 
                  who = "Total Population by Households", 
                  what = "by Race, Ethnicity",
                  where = countyname,
                  when = "2020",
                  row_index = "Race Ethnicity",
                  col_index = 'Tenure Status')

Tenure Status,1 Owner Occupied (%),2 Renter Occupied (%),Total Population by Households (%)
Race Ethnicity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"1 White alone, Not Hispanic","21,017 (93.7%)","11,027 (81.6%)","32,044 (89.2%)"
"2 Black alone, Not Hispanic",55 (0.2%),42 (0.3%),97 (0.3%)
"3 American Indian and Alaska Native alone, Not Hispanic",136 (0.6%),149 (1.1%),285 (0.8%)
"4 Asian alone, Not Hispanic",248 (1.1%),92 (0.7%),340 (0.9%)
"5 Other Race, Not Hispanic",369 (1.6%),424 (3.1%),793 (2.2%)
"6 Any Race, Hispanic",607 (2.7%),"1,772 (13.1%)","2,379 (6.6%)"
Total,"22,432 (100.0%)","13,506 (100.0%)","35,938 (100.0%)"


In [53]:
block_df['core']['gqtype'].value_counts()

gqtype
7.0    26
3.0     3
1.0     1
6.0     1
2.0     1
5.0     1
Name: count, dtype: int64

In [54]:
# fill in missing values for group quarters with 0
block_df['core']['gqtype'] = block_df['core']['gqtype'].fillna(0)
# make an integer
block_df['core']['gqtype'] = block_df['core']['gqtype'].astype(int)
block_df['core']['gqtype'].value_counts()

gqtype
0    21546
7       26
3        3
1        1
6        1
2        1
5        1
Name: count, dtype: int64

In [55]:
block_df['family']['gqtype'].value_counts()

gqtype
7.0    26
3.0     3
1.0     1
6.0     1
2.0     1
5.0     1
Name: count, dtype: int64

In [56]:
block_df['hispan']['gqtype'].value_counts()

gqtype
0.0    15041
1.0      701
Name: count, dtype: int64

In [57]:
block_df['core']['All'] = 'All'

PopResultsTable.pop_results_table(
                  input_df = block_df['core'], 
                  who = "Total Population by Households", 
                  what = "by Group Quarters Type",
                  where = countyname,
                  when = "2020",
                  row_index = 'Group Quarters Type',
                  col_index = 'All')

All,All (%),Total Population by Households (%)
Group Quarters Type,Unnamed: 1_level_1,Unnamed: 2_level_1
0. NA (non-group quarters),"32,044 (96.7%)","32,044 (96.7%)"
1. Correctional facilities for adults,"1,080 (3.3%)","1,080 (3.3%)"
Total,"33,124 (100.0%)","33,124 (100.0%)"


In [None]:
hui_race_df['All'] = 'All'

PopResultsTable.pop_results_table(
                  input_df = hui_race_df, 
                  who = "Total Population by Households", 
                  what = "by Group Quarters Type",
                  where = countyname,
                  when = "2020",
                  row_index = 'Group Quarters Type',
                  col_index = 'All')

In [36]:
# list if HUID = B410079502001025H099
block_df['core'][['Block2020','huid','gqtype']][block_df['core']['Block2020'] == '410079502001025']

Unnamed: 0,Block2020,huid,gqtype
1726,410079502001025,B410079502001025H001,
1727,410079502001025,B410079502001025H002,
1728,410079502001025,B410079502001025H003,
1729,410079502001025,B410079502001025H004,
1730,410079502001025,B410079502001025H005,
...,...,...,...
15229,410079502001025,B410079502001025H098,
23017,410079502001025,B410079502001025H099,7.0
23018,410079502001025,B410079502001025H100,7.0
23019,410079502001025,B410079502001025H101,7.0


In [30]:
# list if HUID = B410079502001025H099
block_df['family'][['Block2020','huid','gqtype']][block_df['family']['Block2020'] == '410079502001025'].sort_values(by = 'huid')

Unnamed: 0,Block2020,huid,gqtype
11194,410079502001025,B410079502001025H001,
11193,410079502001025,B410079502001025H002,
17440,410079502001025,B410079502001025H003,
17439,410079502001025,B410079502001025H004,
577,410079502001025,B410079502001025H005,
...,...,...,...
11202,410079502001025,B410079502001025H098,
10911,410079502001025,B410079502001025H099,7.0
10912,410079502001025,B410079502001025H100,7.0
10913,410079502001025,B410079502001025H101,7.0


In [31]:
# list if HUID = B410079502001025H099
block_df['hispan'][['Block2020','huid','gqtype']][block_df['hispan']['Block2020'] == '410079502001025'].sort_values(by = 'huid')

Unnamed: 0,Block2020,huid,gqtype
10677,410079502001025,B410079502001025H001,0.0
10676,410079502001025,B410079502001025H002,0.0
15602,410079502001025,B410079502001025H003,0.0
15601,410079502001025,B410079502001025H004,0.0
577,410079502001025,B410079502001025H005,0.0
...,...,...,...
10685,410079502001025,B410079502001025H098,1.0
10416,410079502001025,B410079502001025H099,
10417,410079502001025,B410079502001025H100,
10418,410079502001025,B410079502001025H101,


In [23]:
# describe the variable gqtype
hui_race_df['gqtype'].value_counts()

gqtype
0.0    16502
1.0     1031
Name: count, dtype: int64

In [24]:

# describe the variable vacancy
hui_race_df['vacancy'].value_counts()

vacancy
5.0    4128
7.0     499
1.0     451
3.0     222
2.0     105
4.0      79
Name: count, dtype: int64

In [25]:
hui_race_df = PopResultsTable.add_vacancy_to_pop_df(hui_race_df)

PopResultsTable.pop_results_table(
                  input_df = hui_race_df, 
                  who = "Total Households", 
                  what = "by Vacancy Type",
                  where = countyname,
                  when = "2020",
                  row_index = 'Vacancy Type',
                  col_index = 'All')

All,All (%),Total Households (%)
Vacancy Type,Unnamed: 1_level_1,Unnamed: 2_level_1
1 For Rent,451 (8.2%),451 (8.2%)
"2 Rented, not occupied",105 (1.9%),105 (1.9%)
3 For sale only,222 (4.0%),222 (4.0%)
"4 Sold, not occupied",79 (1.4%),79 (1.4%)
"5 For seasonal, recreational, or occasional use","4,128 (75.3%)","4,128 (75.3%)"
7 Other vacant,499 (9.1%),499 (9.1%)
Total,"5,484 (100.0%)","5,484 (100.0%)"


In [26]:
print("Total Population by Race and Ethnicity:")
print(f"https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=DECENNIALDHC2020.P5")

print("Total Vacancy Status:")
print(f"https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=DECENNIALDHC2020.H5")

Total Population by Race and Ethnicity:
https://data.census.gov/cedsci/table?g=050XX00US41007&tid=DECENNIALDHC2020.P5
Total Vacancy Status:
https://data.census.gov/cedsci/table?g=050XX00US41007&tid=DECENNIALDHC2020.H5
