# Housing Unit Allocation with Person Record File Full Workflow

## Overview
This code works with the National Structures Inventory to run the housing unit allocation (HUA) and the person record file (PREC) workflow.
The HUA process is generalizable to any county in the United States. The HUA process will work for any file that has locations of structures and some basic information about the buildings.
The process is designed to work with [IN-CORE](https://incore.ncsa.illinois.edu/), a community resilience modeling environment.
Using IN-CORE requires an account and access to the IN-CORE Dataservice.

Functions are provided to obtain and clean data required for the version 2 Housing Unit Allocation. 

## Required Inputs
Program requires the following inputs:
If using the National Structures Inventory there are no required inputs.
    
## Output Description
The output of this workflow is a CSV file with the housing unit inventory allocated to a building inventory using the housing unit allocation model.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE).

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

## Description of Program
- program:    ncoda_07fv1_HUA_PREC_NSI
- task:       Start with NSI building inventory, run housing unit allocation algorithm, and then run person record file algorithm
- See github commits for description of program updates
- Current Version: v1 - 
- 2024-02-20 - Combine code from 07c, 07d, and 07e into one notebook
- 2024-05-22 - removed the drop down menu, did not work consistently
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE), Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

## Required Citations:
Rosenheim, Nathanael, Roberto Guidotti, Paolo Gardoni & Walter Gillis Peacock. (2021). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. _Sustainable and Resilient Infrastructure_. 6(6), 385-401. https://doi.org/10.1080/23789689.2019.1681821

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” _DesignSafe-CI_. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [None]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
from pyncoda.ncoda_00g_community_options import *
from IPython.display import display

### How to set up the Community Dictionary
Please review the python code in the file pyncoda/ncoda_00g_community_options.py

In this file you will find a collection of data dictionaries with various ways to setup the inputs for the Housing Unit Allocation process. 

The basic dictionary includes the name of the community, the county FIPS code, your input building inventory file, and key variables in the building inventory file.

In [None]:
# select a community from this list
# if your community is not in this list, add it to the file ncoda_00g_community_options.py
list_community_options(communities_dictionary)

In [None]:
community_id_by_name = 'Brazos County, TX: NSI Building inventory for Brazos County, TX'

In [None]:
community_id, focalplace, countyname, countyfips = get_community_id_by_name(community_id_by_name)
communities = {community_id : communities_dictionary[community_id]}

## Setup Python Environment

In [None]:
import pandas as pd
import geopandas as gpd # For reading in shapefiles
import numpy as np
import sys # For displaying package versions
import os # For managing directories and file paths if drive is mounted
import scooby # Reports Python environment

import contextily as cx # For adding basemap tiles to plot
import matplotlib.pyplot as plt # For plotting and making graphs

In [None]:
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_00d_cleanvarsutils import *
from pyncoda.ncoda_04c_poptableresults import *
from pyncoda.ncoda_07i_process_communities import process_community_workflow

In [None]:
# Generate report of Python environment
base_packages = ['pandas','ipyleaflet','seaborn','contextily']
incore_packages = ['pyincore','pyincore_viz']
check_packages = base_packages + incore_packages
print(scooby.Report(additional=check_packages))

In [None]:
# Check working directory - good practice for relative path access
os.getcwd()

## Run Housing Unit Allocation
The following code will produce the following outputs:
1. Housing Unit Inventory
2. Address Point Inventory
3. Housing Unit Allocation

In [None]:
workflow = process_community_workflow(communities)
hua_hui_gdf = workflow.process_communities()

## Run Person Record File

In [None]:
version = '3.0.0'
version_text = 'v3-0-0'

# open, read, and execute python program with reusable commands
from pyncoda.ncoda_07e_generate_prec import generate_prec_functions

# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - 
# this helps to follow the overall workflow
outputfolder = "OutputData"
# Make directory to save output
if not os.path.exists(outputfolder):
    os.mkdir(outputfolder)

# Set random seed for reproducibility
seed = 1000
basevintage = 2010

generate_prec_df = generate_prec_functions(
                    communities =   communities,
                    seed =          seed,
                    version =       version,
                    version_text=   version_text,
                    basevintage=    basevintage,
                    outputfolder=   outputfolder
                    )

prec_df = generate_prec_df.generate_prec_v300()

## Explore and Validate Housing Unit Allocation


### Look at population characteristics and compare to US Census

In [None]:
where = communities[community_id]['community_name']
print(where, focalplace, countyname, countyfips)

In [None]:
# add race ethnicity to data frame for better map legends
hua_hui_race_gdf = PopResultsTable.add_race_ethnicity_to_pop_df(hua_hui_gdf)
hua_hui_race_gdf = PopResultsTable.add_hhinc_df(hua_hui_gdf)

# add category for missing building id
bldg_uniqueid = communities[community_id]['building_inventory']['bldg_uniqueid']
# add category for missing building id
buildingdata_conditions = {'cat_var' : {'variable_label' : 'Building Data Availability',
                         'notes' : 'Does Housing Unit have building data?'},
              'condition_list' : {
                1 : {'condition': f"(df['{bldg_uniqueid}'] == 'missing building id')", 'value_label': "0 Missing Building Data"},
                2 : {'condition': f"(df['{bldg_uniqueid}'] != 'missing building id')", 'value_label': "1 Building Data Available"}}
            }
hua_hui_gdf = add_label_cat_conditions_df(hua_hui_gdf, conditions = buildingdata_conditions)

In [None]:
hua_hui_gdf.head()

In [None]:
# set dataframe for focal place
focalplace_hua_hui_gdf =  hua_hui_gdf.loc[hua_hui_gdf['placeNAME10'] == focalplace].copy(deep=True)

In [None]:
PopResultsTable.pop_results_table(
                  input_df = hua_hui_gdf, 
                  who = "Total Population by Households", 
                  what = "by Race, Ethnicity",
                  where = countyname,
                  when = "2010",
                  row_index = "Race Ethnicity",
                  col_index = 'Tenure Status')

In [None]:
PopResultsTable.pop_results_table(
                  input_df = hua_hui_gdf, 
                  who = "Total Population by Households", 
                  what = "by Race, Ethnicity",
                  where = countyname,
                  when = "2010",
                  row_index = "Race Ethnicity",
                  col_index = 'Hispanic')

In [None]:
PopResultsTable.pop_results_table(
                  input_df = prec_df, 
                  who = "Total Population by Persons", 
                  what = "by Race, Ethnicity",
                  where = countyname,
                  when = "2010",
                  row_index = "Race",
                  col_index = 'Hispanic')

In [None]:
PopResultsTable.pop_results_table(hua_hui_gdf, 
                  who = "Total Population by Households", 
                  what = "by Race, Ethnicity",
                  where = where,
                  when = "2010",
                  row_index = "Race Ethnicity",
                  col_index = 'Building Data Availability_str',
                  row_percent = '0 Missing Building Data')

In [None]:
try:
    print("Attempting to generate the population results table...")
    table1 = PopResultsTable.pop_results_table(
        focalplace_hua_hui_gdf,
        who="Total Population by Households",
        what="by Tenure",
        where=focalplace,
        when="2010",
        row_index="Tenure Status",
        col_index="Building Data Availability_str",
        row_percent="0 Missing Building Data"
    )
    print("Population results table generated successfully.")
except Exception as e:
    table1 = "no table generated"
    print(f"No Missing Building Data: {e}")

table1

In [None]:
PopResultsTable.pop_results_table(hua_hui_gdf, 
                   who = "Median Household Income", 
                  what = "by Race, Ethnicity",
                  where = where,
                  when = "2010",
                  row_index = "Race Ethnicity",
                  col_index = 'Tenure Status')

In [None]:
PopResultsTable.pop_results_table(focalplace_hua_hui_gdf, 
                   who = "Median Household Income", 
                  what = "by Race, Ethnicity",
                  where = focalplace,
                  when = "2010",
                  row_index = "Race Ethnicity",
                  col_index = 'Tenure Status')

#### Validate the Housing Unit Allocation has worked
Notice that the population count totals for the community
should match (pretty closely) data collected for the 2010 Decennial Census.
This can be confirmed by going to data.census.gov

In [None]:
print("Total Population by Race and Ethnicity:")
print(f"https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=DECENNIALSF12010.P5")

print("Median Income by Race and Ethnicity:")
print(f"All Households: https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=ACSDT5Y2012.B19013")
print(f"Black Households: https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=ACSDT5Y2012.B19013B")
print(f"White, not Hispanic Households: https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=ACSDT5Y2012.B19013H")
print(f"Hispanic Households: https://data.census.gov/cedsci/table?g=050XX00US{countyfips}&tid=ACSDT5Y2012.B19013I")

Differences in the housing unit allocation and the Census count may be due to differences between political boundaries and the building inventory. See Rosenheim et al 2019 for more details.

The housing unit allocation, plus the building results will become the input for the social science models such as the population dislocation model.

## Explore Maps

In [None]:
#mapname = 'hhincdotmap'
mapname = 'hhracedotmap'
# Map column
#map_var = 'Household Income Group'
map_var = 'Race Ethnicity'
place = focalplace

condition1 = "(hua_hui_race_gdf.race >= 1)"
condition2 = f"(hua_hui_race_gdf.placeNAME10 == '{place}')"
conditions = f"{condition1} & {condition2}"

county_hua_gdf = hua_hui_race_gdf.loc[eval(condition1)].copy(deep=True)
county_hua_gdf = county_hua_gdf.to_crs(epsg=4326)
focal_place_hua_gdf = hua_hui_race_gdf.loc[eval(conditions)].copy(deep=True)
focal_place_hua_gdf = focal_place_hua_gdf.to_crs(epsg=4326)

In [None]:
from pyncoda.ncoda_04b_foliummaps import *
# plot png file
from IPython.display import Image

bldg_inv_id = communities[community_id]['building_inventory']['id']
outputfolder = 'OutputData'
community = communities[community_id]['community_name']

county_map = plot_dotmap_map(gdf=county_hua_gdf,
                        mapname=mapname,
                        map_var=map_var,
                        bldg_inv_id=bldg_inv_id,
                        community=community_id,
                        place = community,
                        outputfolder=outputfolder,
                        condition_id = "1",
                        basemap_source = cx.providers.CartoDB.Positron)

In [None]:
# get xlim and ylim for focal place
xlim = [focal_place_hua_gdf.total_bounds[0], focal_place_hua_gdf.total_bounds[2]]
ylim = [focal_place_hua_gdf.total_bounds[1], focal_place_hua_gdf.total_bounds[3]]
print(xlim, ylim)

In [None]:

focal_place_map = plot_dotmap_map(gdf=county_hua_gdf,
                        mapname=mapname,
                        map_var=map_var,
                        bldg_inv_id=bldg_inv_id,
                        community=community_id,
                        place = focalplace,
                        outputfolder=outputfolder,
                        condition_id = "2",
                        basemap_source = cx.providers.CartoDB.Positron,
                        xlim = xlim,
                        ylim = ylim,
                        focal_gdf = focal_place_hua_gdf)

In [None]:
Image(focal_place_map+'.png', width= 800, height=800)

### Explore by Income

In [None]:
mapname = 'hhincdotmap'

# Map column
map_var = 'Household Income Group'

# Ensure 'hhinc' is numeric
hua_hui_race_gdf['hhinc'] = pd.to_numeric(hua_hui_race_gdf['hhinc'], errors='coerce')

condition1 = "(hua_hui_race_gdf.hhinc >= 1)"

county_hua_gdf = hua_hui_race_gdf.loc[eval(condition1)].copy(deep=True)
county_hua_gdf = county_hua_gdf.to_crs(epsg=4326)

In [None]:
# get xlim and ylim for focal place
xlim = [focal_place_hua_gdf.total_bounds[0], focal_place_hua_gdf.total_bounds[2]]
ylim = [focal_place_hua_gdf.total_bounds[1], focal_place_hua_gdf.total_bounds[3]]

focal_place_map = plot_dotmap_map(gdf=county_hua_gdf,
                        mapname=mapname,
                        map_var=map_var,
                        bldg_inv_id=bldg_inv_id,
                        community=community_id,
                        place = focalplace,
                        outputfolder=outputfolder,
                        condition_id = "2",
                        basemap_source = cx.providers.CartoDB.Positron,
                        xlim = xlim,
                        ylim = ylim,
                        focal_gdf = focal_place_hua_gdf)

In [None]:
Image(focal_place_map+'.png', width= 800, height=800)

## View Codebook
The Housing Unit Allocation methodology generates a codebook for the housing unit inventory.

Look in the OutputData folder to find the codebook.