# Address Point Inventory Workflow

## Overview
Functions to obtain and clean data required for the Address Point Inventory - which is a key component of the Housing Unit Allocation process. The address point inventory predicts the number of housing units in each structure in a building inventory. 


### Resources and references:
For an overview of the address point invenotry and housing unit allocation method see:

Rosenheim, N., Guidotti, R., Gardoni, P., & Peacock, W. G. (2021). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. Sustainable and Resilient Infrastructure, 6(6), 385-401.

## Required Inputs
Program requires the following inputs:
1. program obtains and cleans US Census Block Data
2. program requires a geocoded building inventory
    - Future version of ICD will provide tools for generating a building inventory file
    - Current version will require users to have an IN-CORE account
3. Housing Unit Inventory - for expected address point counts by block
4. program will use the block data, expected counts, and building inventory to generate an address point inventory.
    
## Output Description
The output of this workflow is a CSV file with the address point inventory and a codebook that describes the data.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE) for the housing unit allocation model.

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

## Description of Program
- program:    ncoda_07bv1_addresspoint_workflow
- task:       Run the Address Point Workflow
- See github commits for description of program updates
- Current Version:    
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE), Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

## Setup Python Environment

In [1]:
import pandas as pd
import geopandas as gpd # For reading in shapefiles
import numpy as np
import sys # For displaying package versions
import os # For managing directories and file paths if drive is mounted

from pyincore import IncoreClient, Dataset, FragilityService, MappingSet, DataService
from pyincore.analyses.buildingdamage.buildingdamage import BuildingDamage

from pyincore_viz.geoutil import GeoUtil as viz

In [2]:
import scooby # Reports Python environment

In [3]:
# Generate report of Python environment
print(scooby.Report(additional=['pandas','pyincore','pyincore_viz']))


--------------------------------------------------------------------------------
  Date: Thu Jun 30 17:17:12 2022 Eastern Daylight Time

                OS : Windows
            CPU(s) : 12
           Machine : AMD64
      Architecture : 64bit
               RAM : 31.6 GiB
       Environment : Jupyter

  Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:45)
  [MSC v.1929 64 bit (AMD64)]

            pandas : 1.4.2
          pyincore : 1.4.1
      pyincore_viz : Version unknown
             numpy : 1.22.3
             scipy : 1.8.0
           IPython : 8.3.0
        matplotlib : 3.5.2
            scooby : 0.5.12
--------------------------------------------------------------------------------


In [4]:
# Check working directory - good practice for relative path access
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data\\WorkNPR'

## Step 1: Set up pyincore and read in data
IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/

Registration is free.

In [5]:
client = IncoreClient()
# IN-CORE caches files on the local machine, it might be necessary to clear the memory
#client.clear_cache() 

Connection successful to IN-CORE services. pyIncore version detected: 1.4.1


In [6]:
# create data_service object for loading files
data_service = DataService(client)

### Read in Building Inventory

In [7]:
# Building inventory
# bldg_inv_id = "62ab7dcbf328861e25ffea9e" # New building inventory
bldg_inv_id = "6036c2a9e379f22e1658d451" # Old building inventory
# load building inventory
bldg_inv = Dataset.from_data_service(bldg_inv_id, data_service)
filename = bldg_inv.get_file_path('shp')
print("The IN-CORE Dataservice has saved the Building Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Building Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\6036c2a9e379f22e1658d451\lumberton_building_inventory_w_strcid\lumberton_building_inventory_w_strcid.shp


In [8]:
bldg_inv_gdf = gpd.read_file(filename)

from pyproj import CRS
bldg_inv_gdf.crs = CRS("epsg:4326")

In [9]:
# Check Unique ID
bldg_inv_gdf[['guid','strctid']].astype(str).describe().T

Unnamed: 0,count,unique,top,freq
guid,20091,20091,efd13166-d7a0-476b-ada5-c55cea1f0184,1
strctid,20091,20091,STefd13166-d7a0-476b-ada5-c55cea1f0184,1


### Read in Housing Unit Inventory

In [10]:
# Housing Unit inventory
housing_unit_inv_id = "6262ef3204ce841cbeb30993"
# load housing unit inventory as pandas dataframe
housing_unit_inv = Dataset.from_data_service(housing_unit_inv_id, data_service)
filename = housing_unit_inv.get_file_path('csv')
print("The IN-CORE Dataservice has saved the Housing Unit Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Housing Unit Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\6262ef3204ce841cbeb30993\hui_v2-0-0_Lumberton_NC_2010_rs1000\hui_v2-0-0_Lumberton_NC_2010_rs1000.csv


In [11]:
housing_unit_inv_df = pd.read_csv(filename, header="infer")

In [12]:
housing_unit_inv_df['huid'].describe()

count                    52801
unique                   52801
top       B371559601011003H001
freq                         1
Name: huid, dtype: object

## Step 2: Obtain and Clean Block Data
Functions to read in block data and clean it. Cleaning adds PUMA ID, Place ID, and Housing unit counts (including group quarters)

### Setup notebook environment to access Cloned Github Package
This notebook uses functions that are in development. The current version of the package is available at:

https://github.com/npr99/intersect-community-data

Nathanael Rosenheim. (2022). npr99/intersect-community-data. Zenodo. https://doi.org/10.5281/zenodo.6476122

A permanent copy of the package and example datasets are available in the DesignSafe-CI repository:

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [13]:
#To replicate this notebook Clone the Github Package to a folder that is a sibling of this notebook.
# To access the sibling package you will need to append the parent directory ('..') to the system path list.
# append the path of the directory that includes the github repository.
# This step is not required when the package is in a folder below the notebook file.
github_code_path  = "../"
sys.path.append(github_code_path)

In [16]:
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data\\WorkNPR'

In [17]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_00b_directory_design import directory_design
from pyncoda.ncoda_00e_geoutilities import *
from pyncoda.ncoda_02b_cleanblockdata import *
from pyncoda.ncoda_02d_addresspoint import *


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


### Setup Community and Output Directory

In [18]:
# Example of data dictionary for one community with one county
# Check how to capatialize the state name at
## https://www2.census.gov/geo/tiger/TIGER2020PL/STATE/
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'STATE' : 'NORTH_CAROLINA',
                    'years' : ['2010'],
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}}}

In [19]:
version = '2.0.0'
version_text = 'v2-0-0'

# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - 
# this helps to follow the overall workflow
outputfolder = "OutputData"
# Make directory to save output
if not os.path.exists(outputfolder):
    os.mkdir(outputfolder)

# Set random seed for reproducibility
seed = 1000
basevintage = 2010


### Run Clean Block Data

In [23]:
for community in communities.keys():
    # Loop over years
    # Current version on 2010 works
    # Future versions could include 2000, 2020
    for year in communities[community]['years']:
        yr = year[2:4]
        # Create empty container to store outputs for in-core
        # Will use these to combine multiple counties
        hua_incore_county_df = {}
        print("Setting up Housing Unit Inventory for",communities[community]['community_name'])
        for county in communities[community]['counties'].keys():
            state_county = communities[community]['counties'][county]['FIPS Code']
            state_county_name  = communities[community]['counties'][county]['Name']
            state_caps = communities[community]['STATE']
            print(state_county_name,': county FIPS Code',state_county)
        
            outputfolders = directory_design(state_county_name = state_county_name,
                                                outputfolder = outputfolder)

            # Set up Census Block Data with PUMA and Place IDs
            output_folder = outputfolders['CommunitySourceData']


            census_block_place_puma_gdf = \
                obtain_join_block_place_puma_data(
                                county_fips = state_county,
                                state = state_caps,
                                year = year,
                                output_folder = output_folder,
                                replace = False)

            # Save map of Census Block Data
            map = single_layer_folium_map(gdf = census_block_place_puma_gdf,
                                layer_name = 'Census Blocks 2010',
                                output_folder = output_folder)

            # Merge Building Inventory and Census Block Data
            join_column_list = [f'BLOCKID{yr}',f'BLOCKID{yr}_str',
                                f'placeGEOID{yr}',f'placeNAME{yr}']
            geolevel = 'block'
            
            # add representative point to buildings
            bldg_inv_gdf_point = add_representative_point(bldg_inv_gdf,year=year)
            builidng_to_block_gdf = spatial_join_points_to_poly(
                        points_gdf = bldg_inv_gdf_point,
                        polygon_gdf = census_block_place_puma_gdf,
                        point_var = f'rppnt{yr}4326',
                        poly_var = f'blk{yr}4326',
                        geolevel = geolevel,
                        join_column_list = join_column_list)

            # Run Address Point Algorithm
            residential_archetypes = { 
                1 : 'One-story sf residential building on a crawlspace foundation',
                2 : 'One-story mf residential building on a slab-on-grade foundation',
                3 : 'Two-story sf residential building on a crawlspace foundation',
                4 : 'Two-story mf residential building on a slab-on-grade foundation'}

            # Housing unit inventory needs the block string variable
            housing_unit_inv_df[f'BLOCKID{yr}_str'] = \
                       housing_unit_inv_df[f'blockid'].\
                           apply(lambda x : "B"+str(int(x)).zfill(15))

            # Run Address Point Algorithm
            huesimate_df = predict_residential_addresspoints(
                            builidng_to_block_gdf = builidng_to_block_gdf,
                            hui_df = housing_unit_inv_df,
                            hui_blockid = f'BLOCKID{yr}_str',
                            bldg_blockid = 'blockBLOCKID10_str',
                            bldg_uniqueid = 'guid',
                            placename_var = 'blockplaceNAME10',
                            archetype_var = 'archetype',
                            residential_archetypes = residential_archetypes,
                            building_area_var = 'sq_foot',
                            building_area_cutoff = 300,
                            )

            # Check errors
            #huesimate_df['apcount'].groupby(huesimate_df['ErrorCheck1_int']).describe()
            #huesimate_df['apcount'].groupby(huesimate_df['ErrorCheck2_int']).describe()
            #huesimate_df['apcount'].groupby(huesimate_df['ErrorCheck3_int']).describe()

            # Add Single Family Dummy Variable
            condition1 = (huesimate_df["huestimate"] > 1)
            condition2 = ~(huesimate_df["guid"].isna())
            condition = condition1 & condition2
            huesimate_df.loc[condition,'d_sf'] = 0
            condition1 = (huesimate_df["huestimate"] == 1)
            condition = condition1 & condition2
            huesimate_df.loc[condition, 'd_sf'] = 1 
            #pd.crosstab(huesimate_df['huestimate'], 
            #    huesimate_df['d_sf'],
            #    margins=True, 
            #    margins_name="Total")

            # Identify Unicorporated Areas with Place Name
            # There are many address points that fall just outside of city limits 
            # in unincorprated places.
            # For these areas use the county inforamation to label 
            # the place names as the County Name.
            huesimate_df.loc[(huesimate_df['blockplaceNAME10'].isna()),
                        'blockplaceNAME10'] = f"Unincorporated {state_county_name}"

            ### Next Steps
            # Creat an address point inventory that has address point for each
            # housing unit that does not have a building.
            # https://github.com/npr99/IN-CORE_notebooks/blob/main/IN_CORE_2dv2_Lumberton_AddressPointInventory.ipynb
            # https://github.com/npr99/IN-CORE_notebooks/blob/main/IN-CORE_1dv2_Joplin_EstimateAddressPoints_2019-07-11.ipynb
            # Would like to give each address point a unique lat lon
            ## important for areas without buildings

            #Save results for community name
            output_filename = f'addpt_{version_text}_{community}_{basevintage}_{year}'
            csv_filepath = outputfolders['top']+"/"+output_filename+'.csv'
            savefile = sys.path[0]+"/"+csv_filepath
            huesimate_df.to_csv(savefile, index=False)

            # Save second set of files in common directory
            common_directory = outputfolders['top']+"/../"+output_filename
            huesimate_df.to_csv(common_directory+'.csv', index=False)


Setting up Housing Unit Inventory for Lumberton, NC
Robeson County, NC : county FIPS Code 37155
Block data already exists for  37155
Converting blk104269 to Geodataframe
Polygon file has 5799 block polygons.
Identified 2155 block polygons to spatially join.
....................6070 Buildings have Residential Archetype 1
10273 Buildings have Residential Archetype 2
249 Buildings have Residential Archetype 3
1391 Buildings have Residential Archetype 4
20 Buildings have building_area_var less than 300
0 Buildings have  sq_foot _by_AP less than 300
17968 Buildings assigned residential.
Total number of expected housing unit address points in county: 52801
779 buildings are in blocks with no housing units.
For Round 1 Estimated Residential Address Points correlation: 0.524
1. HU=0 had 218 observations
2. HU=AP had 398 observations
3. HU<AP had 436 observations
8. HU > 0, Building Count = Missing had 0 observations
9. HU > 0, Building Count = 0 had 2753 observations
10. HU = 0, Building Count

In [22]:
huesimate_df.head(1).T

Unnamed: 0,0
blockBLOCKID10_str,B371559601011002
guid,
blockplaceNAME10,
archetype,
residential,
apcount,1.0
bldgcount,
huestimate,
DiffCount3,1.0
bldgcountv3_sum,0.0


In [25]:
# look for observations with blockid = B371559616011029
huesimate_df[huesimate_df['blockBLOCKID10_str'] == 'B371559616011029'].head(1).T

Unnamed: 0,19158
blockBLOCKID10_str,B371559616011029
guid,8f82154f-5a49-423c-a063-454c1183a8ca
blockplaceNAME10,
archetype,2.0
residential,1.0
apcount,53.0
bldgcount,1.0
huestimate,2.0
DiffCount3,2.0
bldgcountv3_sum,51.0
