# Housing Unit Allocation v2 Workflow

## Overview
Functions to obtain and clean data required for the version 2 Housing Unit Allocation. 
The workflow improves on the [housing unit allocation algorithm found in pyincore](https://github.com/IN-CORE/pyincore/blob/develop/pyincore/analyses/housingunitallocation/housingunitallocation.py).

### Found issues in the original HUA:
1. Over predicting housing units in a building 
2. Not assigning one housing unit per structure before assigning more housing units
3. Matching extra housing units based on tenure - if there are 2 renters in block then match the extra renter with the other renter.

### Resources and references:
For an overview of the housing unit allocation method see:

Rosenheim, N., Guidotti, R., Gardoni, P., & Peacock, W. G. (2021). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. Sustainable and Resilient Infrastructure, 6(6), 385-401.

## Required Inputs
Program requires the following inputs:
1. Housing unit inventory file from ncoda_06dv1_run_HUI_v2_workflow.ipynb
2. Building inventory file from pyincore
    - Future version of ICD will provide tools for generating a building inventory file
    - Current version will require users to have an IN-CORE account
    
## Output Description
The output of this workflow is a CSV file with the housing unit inventory merged with a building inventory and a codebook that describes the data.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE) for the housing unit allocation model.

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

## Description of Program
- program:    ncoda_06ev2_run_HUA_workflow
- task:       Run the Housing Unit Allocation Workflow
- See github commits for description of program updates
- Current Version:    2022-06-22 - v2 workflow
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE), Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

## Setup Python Environment

In [1]:
import pandas as pd
import geopandas as gpd # For reading in shapefiles
import numpy as np
import sys # For displaying package versions
import os # For managing directories and file paths if drive is mounted

from pyincore import IncoreClient, Dataset, FragilityService, MappingSet, DataService
from pyincore.analyses.buildingdamage.buildingdamage import BuildingDamage

from pyincore_viz.geoutil import GeoUtil as viz

In [2]:
import scooby # Reports Python environment

In [3]:
# Generate report of Python environment
print(scooby.Report(additional=['pandas']))


--------------------------------------------------------------------------------
  Date: Wed Jun 22 12:40:15 2022 Eastern Daylight Time

                OS : Windows
            CPU(s) : 12
           Machine : AMD64
      Architecture : 64bit
               RAM : 31.6 GiB
       Environment : Jupyter

  Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:45)
  [MSC v.1929 64 bit (AMD64)]

            pandas : 1.4.2
             numpy : 1.22.3
             scipy : 1.8.0
           IPython : 8.3.0
        matplotlib : 3.5.2
            scooby : 0.5.12
--------------------------------------------------------------------------------


In [4]:
# Check working directory - good practice for relative path access
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data'

## Step 1: Set up pyincore and read in data
IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/

Registration is free.

In [5]:
client = IncoreClient()
# IN-CORE caches files on the local machine, it might be necessary to clear the memory
#client.clear_cache() 

Connection successful to IN-CORE services. pyIncore version detected: 1.4.1


In [6]:
# create data_service object for loading files
data_service = DataService(client)

### Read in Building Inventory

In [7]:
# Building inventory
# bldg_inv_id = "62ab7dcbf328861e25ffea9e" # New building inventory
bldg_inv_id = "6036c2a9e379f22e1658d451" # Old building inventory
# load building inventory
bldg_inv = Dataset.from_data_service(bldg_inv_id, data_service)
filename = bldg_inv.get_file_path('shp')
print("The IN-CORE Dataservice has saved the Building Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Building Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\6036c2a9e379f22e1658d451\lumberton_building_inventory_w_strcid\lumberton_building_inventory_w_strcid.shp


In [8]:
bldg_inv_gdf = gpd.read_file(filename)

from pyproj import CRS
bldg_inv_gdf.crs = CRS("epsg:4326")

In [9]:
# Check Unique ID
bldg_inv_gdf[['guid','strctid']].astype(str).describe().T

Unnamed: 0,count,unique,top,freq
guid,20091,20091,efd13166-d7a0-476b-ada5-c55cea1f0184,1
strctid,20091,20091,STefd13166-d7a0-476b-ada5-c55cea1f0184,1


### Read in Housing Unit Inventory

For more information see:

Rosenheim, Nathanael, Roberto Guidotti, Paolo Gardoni & Walter Gillis Peacock. (2019). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. Sustainable and Resilient Infrastructure. doi.org/10.1080/23789689.2019.1681821

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. https://doi.org/10.17603/ds2-jwf6-s535.

In [10]:
# Housing Unit inventory
housing_unit_inv_id = "6262ef3204ce841cbeb30993"
# load housing unit inventory as pandas dataframe
housing_unit_inv = Dataset.from_data_service(housing_unit_inv_id, data_service)
filename = housing_unit_inv.get_file_path('csv')
print("The IN-CORE Dataservice has saved the Housing Unit Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Housing Unit Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\6262ef3204ce841cbeb30993\hui_v2-0-0_Lumberton_NC_2010_rs1000\hui_v2-0-0_Lumberton_NC_2010_rs1000.csv


In [11]:
housing_unit_inv_df = pd.read_csv(filename, header="infer")

In [12]:
housing_unit_inv_df['huid'].describe()

count                    52801
unique                   52801
top       B371559601011003H001
freq                         1
Name: huid, dtype: object

### Read in Address Point Inventory
The address point inventory is an intermediate file based on the building inventory. The address point inventory acts as the bridge between the building inventory and the housing unit inventory.

In [13]:
# Address Point inventory
addpt_inv_id = "60aac382088dfa3b65030b16"
# load housing unit inventory as pandas dataframe
addpt_inv = Dataset.from_data_service(addpt_inv_id, data_service)
filename = addpt_inv.get_file_path('csv')
print("The IN-CORE Dataservice has saved the Address Point Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Address Point Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\60aac382088dfa3b65030b16\IN-CORE_2fv1_Lumberton_Inventories_addresspointinventory\IN-CORE_2fv1_Lumberton_Inventories_addresspointinventory.csv


In [15]:
addpt_inv_df = pd.read_csv(filename, header="infer")
addpt_inv_df['addrptid'].describe()

count                                              61505
unique                                             61505
top       ST2d32aeff-7b75-47e6-b7a5-4f4adca4b021AP000000
freq                                                   1
Name: addrptid, dtype: object

## Step 2: Housing Unit Allocation v2

### Setup notebook environment to access Cloned Github Package
This notebook uses functions that are in development. The current version of the package is available at:

https://github.com/npr99/intersect-community-data

Nathanael Rosenheim. (2022). npr99/intersect-community-data. Zenodo. https://doi.org/10.5281/zenodo.6476122

A permanent copy of the package and example datasets are available in the DesignSafe-CI repository:

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [16]:
#To replicate this notebook Clone the Github Package to a folder that is a sibling of this notebook.
# To access the sibling package you will need to append the parent directory ('..') to the system path list.
# append the path of the directory that includes the github repository.
# This step is not required when the package is in a folder below the notebook file.
github_code_path  = ""
sys.path.append(github_code_path)

In [17]:
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data'

In [18]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_05a_hua_functions \
    import hua_workflow_functions
from pyncoda.ncoda_00b_directory_design import directory_design
from pyncoda.ncoda_06c_Codebook import *
from pyncoda.ncoda_04a_Figures import *

from pyncoda.CommunitySourceData.api_census_gov.acg_00e_incore_huiv2 \
    import incore_v2_DataStructure

### Setup Housing Unit Allocation

In [19]:
# Example of data dictionary for one community with one county
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}}}

In [20]:
version = '2.0.0'
version_text = 'v2-0-0'

# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - 
# this helps to follow the overall workflow
outputfolder = "OutputData"
# Make directory to save output
if not os.path.exists(outputfolder):
    os.mkdir(outputfolder)

# Set random seed for reproducibility
seed = 1000
basevintage = 2010


### Run Housing Unit Allocation

In [54]:
for community in communities.keys():
    # Create empty container to store outputs for in-core
    # Will use these to combine multiple counties
    hua_incore_county_df = {}
    print("Setting up Housing Unit Inventory for",communities[community]['community_name'])
    for county in communities[community]['counties'].keys():
        state_county = communities[community]['counties'][county]['FIPS Code']
        state_county_name  = communities[community]['counties'][county]['Name']
        print(state_county_name,': county FIPS Code',state_county)
    
        outputfolders = directory_design(state_county_name = state_county_name,
                                            outputfolder = outputfolder)
                                            
        generate_df = hua_workflow_functions(
            hui_df = housing_unit_inv_df,
            addpt_df=addpt_inv_df,
            bldg_df=bldg_inv_gdf,
            state_county = state_county,
            state_county_name= state_county_name,
            seed = seed,
            version = version,
            version_text = version_text,
            basevintage = basevintage,
            outputfolder = outputfolder,
            outputfolders = outputfolders)

        # Generate base housing unit inventory
        base_hua_df = generate_df.run_hua_workflow(savelog=False)

        # Save version for IN-CORE in v2 format
        hua_incore_county_df[state_county] = base_hua_df['primary']

    # combine multiple counties
    hua_incore_df = pd.concat(hua_incore_county_df.values(), 
                                    ignore_index=True, axis=0)

    # Convert HUA to geodataframe format
    hua_incore_gdf = gpd.GeoDataFrame(
        hua_incore_df, geometry=gpd.points_from_xy(hua_incore_df.x, hua_incore_df.y))

    # Merge building inventory with housing unit allocation results
    huav2_gdf = pd.merge(left = hua_incore_gdf, 
                        right = bldg_inv_gdf[['guid','archetype','geometry']], 
                        on='guid', how='outer')

    # If Geometry is null, use X,Y coordinates from Address Point
    # use geometry_y unless missing - then use geometry_x
    huav2_gdf['geometry'] = huav2_gdf['geometry_y']
    huav2_gdf.loc[huav2_gdf['geometry'].isnull(), 'geometry'] = huav2_gdf['geometry_x']
    # drop geometry_x and geometry_y columns
    huav2_gdf.drop(columns=['geometry_x','geometry_y'], inplace=True)

    # Convert Block2010 to string
    # fill in missing values
    huav2_gdf['Block2010'] = huav2_gdf['Block2010'].fillna(371550000000000)
    huav2_gdf['Block2010'] = huav2_gdf['Block2010'].apply(lambda x : str(int(x)).zfill(15))

    #Save results for community name
    output_filename = f'hua_{version_text}_{community}_{basevintage}_rs{seed}'
    csv_filepath = outputfolders['top']+"/"+output_filename+'.csv'
    savefile = sys.path[0]+"/"+csv_filepath
    huav2_gdf.to_csv(savefile, index=False)

    # Save second set of files in common directory
    common_directory = outputfolders['top']+"/../"+output_filename
    huav2_gdf.to_csv(common_directory+'.csv', index=False)


Setting up Housing Unit Inventory for Lumberton, NC
Robeson County, NC : county FIPS Code 37155

***************************************
    Run Housing Unit Allocation for Robeson County, NC
***************************************


***************************************
    Merge housing unit and address point data with first 3 counters.
***************************************

Round 1

***************************************
***************************************

Performing random merge at geography level: Block

***************************************
***************************************


***************************************
***************************************

Attempt to merge hui on all common group vars.

***************************************
***************************************

Running random merge by ['Block2010', 'huicounter1', 'ownershp1']

***************************************
    Setting up  primary data with primary key and flags
********************

In [55]:
from pyncoda.ncoda_04b_foliummaps import *

In [56]:
# Condition 1  GUID is not missing
condition1 = (huav2_gdf['Block2010'] == '371559608012070')
gdf1 = huav2_gdf.loc[condition1].copy()
gdf1[['huid','guid','addrptid']].astype(str).describe().T

Unnamed: 0,count,unique,top,freq
huid,4,4,B371559608012070H004,1
guid,4,4,,1
addrptid,4,4,,1


In [57]:
folium_marker_layer_map(gdf = gdf1,
                        gdfvar = 'numprec',
                        layername = "households",
                        color_levels = [0,1,2,3,4,5,6,7])