# Address Point Inventory Workflow

## Overview
Functions to obtain and clean data required for the Address Point Inventory - which is a key component of the Housing Unit Allocation process. The address point inventory predicts the number of housing units in each structure in a building inventory. 


### Resources and references:
For an overview of the address point invenotry and housing unit allocation method see:

Rosenheim, N., Guidotti, R., Gardoni, P., & Peacock, W. G. (2021). Integration of detailed household and housing unit characteristic data with critical infrastructure for post-hazard resilience modeling. Sustainable and Resilient Infrastructure, 6(6), 385-401.

## Required Inputs
Program requires the following inputs:
1. program obtains and cleans US Census Block Data
2. program requires a geocoded building inventory
    - Future version of ICD will provide tools for generating a building inventory file
    - Current version will require users to have an IN-CORE account
3. Housing Unit Inventory - for expected address point counts by block
4. program will use the block data, expected counts, and building inventory to generate an address point inventory.
    
## Output Description
The output of this workflow is a CSV file with the address point inventory and a codebook that describes the data.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE) for the housing unit allocation model.

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

## Description of Program
- program:    ncoda_07bv1_addresspoint_workflow
- task:       Run the Address Point Workflow
- See github commits for description of program updates
- Current Version:    
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE), Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

## Setup Python Environment

In [1]:
import pandas as pd
import geopandas as gpd # For reading in shapefiles
import numpy as np
import sys # For displaying package versions
import os # For managing directories and file paths if drive is mounted

from pyincore import IncoreClient, Dataset, FragilityService, MappingSet, DataService
from pyincore.analyses.buildingdamage.buildingdamage import BuildingDamage

from pyincore_viz.geoutil import GeoUtil as viz

In [2]:
import scooby # Reports Python environment

In [3]:
# Generate report of Python environment
print(scooby.Report(additional=['pandas','pyincore','pyincore_viz']))


--------------------------------------------------------------------------------
  Date: Thu Sep 15 16:17:27 2022 Central Daylight Time

                OS : Windows
            CPU(s) : 12
           Machine : AMD64
      Architecture : 64bit
               RAM : 31.6 GiB
       Environment : Jupyter

  Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:45)
  [MSC v.1929 64 bit (AMD64)]

            pandas : 1.4.2
          pyincore : 1.4.1
      pyincore_viz : Version unknown
             numpy : 1.22.3
             scipy : 1.8.0
           IPython : 8.3.0
        matplotlib : 3.5.2
            scooby : 0.5.12
--------------------------------------------------------------------------------


In [4]:
# Check working directory - good practice for relative path access
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data'

## Step 1: Set up pyincore and read in data
IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/

Registration is free.

In [5]:
client = IncoreClient()
# IN-CORE caches files on the local machine, it might be necessary to clear the memory
#client.clear_cache() 

Connection successful to IN-CORE services. pyIncore version detected: 1.4.1


In [6]:
# create data_service object for loading files
data_service = DataService(client)

### Read in Building Inventory

In [7]:
# Building inventory
bldg_inv_id = "63053ddaf5438e1f8c517fed" # Updated Galveston Inventory
# load building inventory
bldg_inv = Dataset.from_data_service(bldg_inv_id, data_service)
filename = bldg_inv.get_file_path('shp')
print("The IN-CORE Dataservice has saved the Building Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Building Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\63053ddaf5438e1f8c517fed\galveston_bldgs_w_guid\galveston_bldgs_w_guid.shp


In [8]:
bldg_inv_gdf = gpd.read_file(filename)

from pyproj import CRS
bldg_inv_gdf.crs = CRS("epsg:4326")

In [9]:
# Check Unique ID
bldg_inv_gdf[['guid','strctid']].astype(str).describe().T

Unnamed: 0,count,unique,top,freq
guid,172534,172534,1815653a-7b70-44ce-8544-e975596bdf82,1
strctid,172534,1,,172534


In [10]:
bldg_inv_gdf.head(1).T

Unnamed: 0,0
strctid,
parid,0
struct_typ,
year_built,0
no_stories,1
a_stories,0
b_stories,0
bsmt_type,0
sq_foot,2574
gsq_foot,0


In [11]:
bldg_inv_gdf['arch_wind'].describe()

count    172534.000000
mean          2.954728
std           2.555909
min           1.000000
25%           1.000000
50%           3.000000
75%           3.000000
max          19.000000
Name: arch_wind, dtype: float64

In [12]:
# create pandas table with counts of observations by arch_flood, arch_wind, arch_sw
bldg_inv_gdf[['arch_flood','arch_wind','arch_sw','guid']].groupby(['arch_flood','arch_wind','arch_sw']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,guid
arch_flood,arch_wind,arch_sw,Unnamed: 3_level_1
1,1,1,51505
1,1,20,2
2,3,3,68616
3,2,2,12549
3,2,20,332
4,4,4,27913
4,4,20,478
5,6,6,6253
5,6,20,106
6,15,15,649


In [13]:
bldg_inv_gdf['sq_foot'].describe()

count    172534.000000
mean       2409.341515
std        5855.639450
min          32.000000
25%         985.000000
50%        1942.000000
75%        2791.000000
max      630214.000000
Name: sq_foot, dtype: float64

### Read in Housing Unit Inventory

In [14]:
# Housing Unit inventory
housing_unit_inv_id = "626322a7e74a5c2dfb3a72b0"
# load housing unit inventory as pandas dataframe
housing_unit_inv = Dataset.from_data_service(housing_unit_inv_id, data_service)
filename = housing_unit_inv.get_file_path('csv')
print("The IN-CORE Dataservice has saved the Housing Unit Inventory on your local machine: "+filename)

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the Housing Unit Inventory on your local machine: C:\Users\nathanael99\.incore\cache_data\626322a7e74a5c2dfb3a72b0\hui_v2-0-0_Galveston_TX_2010_rs1000\hui_v2-0-0_Galveston_TX_2010_rs1000.csv


In [15]:
housing_unit_inv_df = pd.read_csv(filename, header="infer")

In [16]:
housing_unit_inv_df['huid'].describe()

count                   132553
unique                  132553
top       B481677201001000H001
freq                         1
Name: huid, dtype: object

In [17]:
housing_unit_inv_df.head()

Unnamed: 0,huid,blockid,bgid,tractid,FIPScounty,numprec,ownershp,race,hispan,family,vacancy,gqtype,incomegroup,hhinc,randincome,poverty
0,B481677201001000H001,481677201001000,481677201001,48167720100,48167,1,1.0,1.0,0.0,0.0,0,0,6,3,31459.0,0.0
1,B481677201001000H002,481677201001000,481677201001,48167720100,48167,1,1.0,1.0,0.0,0.0,0,0,6,3,34695.0,0.0
2,B481677201001000H003,481677201001000,481677201001,48167720100,48167,1,1.0,1.0,0.0,0.0,0,0,7,3,38776.0,0.0
3,B481677201001000H004,481677201001000,481677201001,48167720100,48167,1,1.0,1.0,0.0,0.0,0,0,10,3,52398.0,0.0
4,B481677201001000H005,481677201001000,481677201001,48167720100,48167,1,1.0,1.0,0.0,0.0,0,0,11,3,69564.0,0.0


## Step 2: Obtain and Clean Block Data and Estimate Housing Units by Building
Functions to read in block data and clean it. Cleaning adds PUMA ID, Place ID, and Housing unit counts (including group quarters)

### Setup notebook environment to access Cloned Github Package
This notebook uses functions that are in development. The current version of the package is available at:

https://github.com/npr99/intersect-community-data

Nathanael Rosenheim. (2022). npr99/intersect-community-data. Zenodo. https://doi.org/10.5281/zenodo.6476122

A permanent copy of the package and example datasets are available in the DesignSafe-CI repository:

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [18]:
#To replicate this notebook Clone the Github Package to a folder that is a sibling of this notebook.
# To access the sibling package you will need to append the parent directory ('..') to the system path list.
# append the path of the directory that includes the github repository.
# This step is not required when the package is in a folder below the notebook file.
github_code_path  = "../"
sys.path.append(github_code_path)

In [19]:
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data'

In [20]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from pyncoda.ncoda_00b_directory_design import directory_design
from pyncoda.ncoda_00e_geoutilities import *
from pyncoda.ncoda_02b_cleanblockdata import *
from pyncoda.ncoda_02d_addresspoint import *
from pyncoda.ncoda_07a_generate_hui import *


### Setup Community and Output Directory

In [21]:
# Example of data dictionary for one community with one county
# Check how to capitalize the state name at
## https://www2.census.gov/geo/tiger/TIGER2020PL/STATE/
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'STATE' : 'NORTH_CAROLINA',
                    'years' : ['2010'],
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}}}
communities = {'Galveston_TX' : {
                    'community_name' : 'Galveston, TX',
                    'STATE' : 'TEXAS',
                    'years' : ['2010'],
                    'counties' : { 
                        1 : {'FIPS Code' : '48167', 'Name' : 'Galveston County, TX'}}}}

In [22]:
version = '2.0.0'
version_text = 'v2-0-0'

# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - 
# this helps to follow the overall workflow
outputfolder = "OutputData"
# Make directory to save output
if not os.path.exists(outputfolder):
    os.mkdir(outputfolder)

# Set random seed for reproducibility
seed = 1000
basevintage = 2010


### Run Clean Block Data

In [23]:
for community in communities.keys():
    # Loop over years
    # Current version on 2010 works
    # Future versions could include 2000, 2020
    for year in communities[community]['years']:
        yr = year[2:4]
        # Create empty container to store outputs for in-core
        # Will use these to combine multiple counties
        hua_incore_county_df = {}
        print("Setting up Housing Unit Inventory for",communities[community]['community_name'])
        for county in communities[community]['counties'].keys():
            state_county = communities[community]['counties'][county]['FIPS Code']
            state_county_name  = communities[community]['counties'][county]['Name']
            state_caps = communities[community]['STATE']
            print(state_county_name,': county FIPS Code',state_county)
        
            outputfolders = directory_design(state_county_name = state_county_name,
                                                outputfolder = outputfolder)

            # Set up Census Block Data with PUMA and Place IDs
            output_folder = outputfolders['CommunitySourceData']


            census_block_place_puma_gdf = \
                obtain_join_block_place_puma_data(
                                county_fips = state_county,
                                state = state_caps,
                                year = year,
                                output_folder = output_folder,
                                replace = False)

            # Save map of Census Block Data
            map = single_layer_folium_map(gdf = census_block_place_puma_gdf,
                                layer_name = 'Census Blocks 2010',
                                output_folder = output_folder)

            # Merge Building Inventory and Census Block Data
            join_column_list = [f'BLOCKID{yr}',f'BLOCKID{yr}_str',
                                f'placeGEOID{yr}',f'placeNAME{yr}']
            geolevel = 'block'
            
            # add representative point to buildings
            bldg_inv_gdf_point = add_representative_point(bldg_inv_gdf,year=year)
            building_to_block_gdf = spatial_join_points_to_poly(
                        points_gdf = bldg_inv_gdf_point,
                        polygon_gdf = census_block_place_puma_gdf,
                        point_var = f'rppnt{yr}4326',
                        poly_var = f'blk{yr}4326',
                        geolevel = geolevel,
                        join_column_list = join_column_list)

            # Run Address Point Algorithm
            residential_archetypes = { 
                1 : 'One-story sf residential building on a crawlspace foundation',
                2 : 'One-story mf residential building on a slab-on-grade foundation',
                3 : 'Two-story sf residential building on a crawlspace foundation',
                4 : 'Two-story mf residential building on a slab-on-grade foundation'}

            # Housing unit inventory needs the block string variable
            housing_unit_inv_df[f'BLOCKID{yr}_str'] = \
                       housing_unit_inv_df[f'blockid'].\
                           apply(lambda x : "B"+str(int(x)).zfill(15))

            # Run Address Point Algorithm
            huesimate_df = predict_residential_addresspoints(
                            building_to_block_gdf = building_to_block_gdf,
                            hui_df = housing_unit_inv_df,
                            hui_blockid = f'BLOCKID{yr}_str',
                            bldg_blockid = 'blockBLOCKID10_str',
                            bldg_uniqueid = 'guid',
                            placename_var = 'blockplaceNAME10',
                            archetype_var = 'arch_flood',
                            residential_archetypes = residential_archetypes,
                            building_area_var = 'sq_foot',
                            building_area_cutoff = 300,
                            )

            # Check errors
            #huesimate_df['apcount'].groupby(huesimate_df['ErrorCheck1_int']).describe()
            #huesimate_df['apcount'].groupby(huesimate_df['ErrorCheck2_int']).describe()
            #huesimate_df['apcount'].groupby(huesimate_df['ErrorCheck3_int']).describe()

            # Add Single Family Dummy Variable
            condition1 = (huesimate_df["huestimate"] > 1)
            condition2 = ~(huesimate_df["guid"].isna())
            condition = condition1 & condition2
            huesimate_df.loc[condition,'d_sf'] = 0
            condition1 = (huesimate_df["huestimate"] == 1)
            condition = condition1 & condition2
            huesimate_df.loc[condition, 'd_sf'] = 1 
            #pd.crosstab(huesimate_df['huestimate'], 
            #    huesimate_df['d_sf'],
            #    margins=True, 
            #    margins_name="Total")

            # Identify Unincorporated Areas with Place Name
            # There are many address points that fall just outside of city limits 
            # in Unincorporated places.
            # For these areas use the county information to label 
            # the place names as the County Name.
            huesimate_df.loc[(huesimate_df['blockplaceNAME10'].isna()),
                        'blockplaceNAME10'] = f"Unincorporated {state_county_name}"

            ### Next Steps
            # Create an address point inventory that has address point for each
            # housing unit that does not have a building.
            # https://github.com/npr99/IN-CORE_notebooks/blob/main/IN_CORE_2dv2_Lumberton_AddressPointInventory.ipynb
            # https://github.com/npr99/IN-CORE_notebooks/blob/main/IN-CORE_1dv2_Joplin_EstimateAddressPoints_2019-07-11.ipynb
            # Would like to give each address point a unique lat lon
            ## important for areas without buildings

            #Save results for community name
            output_filename = f'huest_{version_text}_{community}_{basevintage}_{year}_{bldg_inv_id}'
            csv_filepath = outputfolders['top']+"/"+output_filename+'.csv'
            savefile = sys.path[0]+"/"+csv_filepath
            huesimate_df.to_csv(savefile, index=False)

            # Save second set of files in common directory
            common_directory = outputfolders['top']+"/../"+output_filename
            huesimate_df.to_csv(common_directory+'.csv', index=False)


Setting up Housing Unit Inventory for Galveston, TX
Galveston County, TX : county FIPS Code 48167
Checking if file exists: OutputData/GalvestonCounty_TX/01_CommunitySourceData/tl_2010_48167_tabblockplacepuma10EPSG4269.csv
File exists. Skipping. Use replace=True to overwrite.
Block data already exists for  48167
Converting blk104269 to Geodataframe
Polygon file has 9595 block polygons.
Identified 9584 block polygons to spatially join.
................................................................................................51507 Buildings have Residential Archetype 1
68616 Buildings have Residential Archetype 2
12881 Buildings have Residential Archetype 3
28391 Buildings have Residential Archetype 4
15057 Buildings have building_area_var less than 300
0 Buildings have  sq_foot _by_AP less than 300
147360 Buildings assigned residential.
Total number of expected housing unit address points in county: 132553
8007 buildings are in blocks with no housing units.
For Round 1 Estimated Re

In [24]:
huesimate_df.head(1).T

Unnamed: 0,0
blockBLOCKID10_str,B481677201001000
guid,1815653a-7b70-44ce-8544-e975596bdf82
blockplaceNAME10,Friendswood
arch_flood,1.0
residential,1.0
apcount,229.0
bldgcount,1.0
huestimate,1.0
DiffCount3,-105.0
bldgcountv3_sum,334.0


In [25]:
# look for observations with blockid = B481677201001000
huesimate_df[huesimate_df['blockBLOCKID10_str'] == 'B481677201001000'].head(1).T

Unnamed: 0,0
blockBLOCKID10_str,B481677201001000
guid,1815653a-7b70-44ce-8544-e975596bdf82
blockplaceNAME10,Friendswood
arch_flood,1.0
residential,1.0
apcount,229.0
bldgcount,1.0
huestimate,1.0
DiffCount3,-105.0
bldgcountv3_sum,334.0


In [26]:
huesimate_df[['blockplaceNAME10','guid']].groupby(['blockplaceNAME10']).count()

Unnamed: 0_level_0,guid
blockplaceNAME10,Unnamed: 1_level_1
Bacliff,5145
Bayou Vista,1376
Bolivar Peninsula,5564
Clear Lake Shores,911
Dickinson,9416
Friendswood,12576
Galveston,28229
Hitchcock,5144
Jamaica Beach,1313
Kemah,1500


## Step 3: Expand Building Inventory to Address Points

In [27]:
# Confirm Primary Key is Unique and Non-Missing
huesimate_df.guid.describe()

count                                   172534
unique                                  172534
top       1815653a-7b70-44ce-8544-e975596bdf82
freq                                         1
Name: guid, dtype: object

### Keep primary columns
only a few are needed to generate the address point inventory. The variable huestimate provides the estimate for the number of housing units in the building.

In [28]:
huesimate_df.head()

Unnamed: 0,blockBLOCKID10_str,guid,blockplaceNAME10,arch_flood,residential,apcount,bldgcount,huestimate,DiffCount3,bldgcountv3_sum,ErrorCheck1_int,ErrorCheck2_int,ErrorCheck3_int,d_sf
0,B481677201001000,1815653a-7b70-44ce-8544-e975596bdf82,Friendswood,1.0,1.0,229.0,1.0,1.0,-105.0,334.0,3.0,3.0,3.0,1.0
1,B481677201001000,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,Friendswood,1.0,1.0,229.0,1.0,1.0,-105.0,334.0,3.0,3.0,3.0,1.0
2,B481677201001000,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,Friendswood,1.0,1.0,229.0,1.0,1.0,-105.0,334.0,3.0,3.0,3.0,1.0
3,B481677201001000,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,Friendswood,3.0,1.0,229.0,1.0,1.0,-105.0,334.0,3.0,3.0,3.0,1.0
4,B481677201001000,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,Friendswood,3.0,1.0,229.0,1.0,1.0,-105.0,334.0,3.0,3.0,3.0,1.0


In [29]:
select_cols = ['guid','blockBLOCKID10_str','huestimate']
huesimate_df_cols = huesimate_df[select_cols]
huesimate_df_cols.head()

Unnamed: 0,guid,blockBLOCKID10_str,huestimate
0,1815653a-7b70-44ce-8544-e975596bdf82,B481677201001000,1.0
1,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,B481677201001000,1.0
2,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,B481677201001000,1.0
3,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,B481677201001000,1.0
4,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,B481677201001000,1.0


In [30]:
census_block_place_puma_gdf.head(1).T

Unnamed: 0,0
Unnamed: 0,0
BLOCKID10,481677240001060
BLOCKID10_str,B481677240001060
STATEFP10,48
COUNTYFP10,167
TRACTCE10,724000
BLOCKCE10,1060
GEOID10,481677240001060
NAME10,Block 1060
MTFCC10,G5040


In [31]:
select_cols = ['BLOCKID10_str','BLOCKID10','geometry','rppnt104269']
census_blocks_df_cols = census_block_place_puma_gdf[select_cols]
census_blocks_df_cols.head()

Unnamed: 0,BLOCKID10_str,BLOCKID10,geometry,rppnt104269
0,B481677240001060,481677240001060,"POLYGON ((-94.85595 29.33600, -94.84383 29.335...",POINT (-94.84710991815908 29.3317505)
1,B481677219001117,481677219001117,"POLYGON ((-94.98080 29.41622, -94.98030 29.415...",POINT (-94.98065093828714 29.416070499999996)
2,B481677261001036,481677261001036,"POLYGON ((-94.95002 29.21186, -94.94954 29.211...",POINT (-94.94436799789915 29.212448)
3,B481677261001179,481677261001179,"POLYGON ((-94.94956 29.21201, -94.94953 29.212...",POINT (-94.94943020212766 29.211959)
4,B481677226001020,481677226001020,"POLYGON ((-94.96813 29.38813, -94.96771 29.388...",POINT (-94.96635637516354 29.3900445)


## Add Total Housing Units in Block

In [32]:
housing_unit_inv_df.head(1).T

Unnamed: 0,0
huid,B481677201001000H001
blockid,481677201001000
bgid,481677201001
tractid,48167720100
FIPScounty,48167
numprec,1
ownershp,1.0
race,1.0
hispan,0.0
family,0.0


In [33]:
    # Look at address point count by block
    hui_blockid = 'blockid'
    bldg_blockid = 'BLOCKID10'
    hua_block_counts = housing_unit_inv_df[[hui_blockid,'huid']].groupby(hui_blockid).agg('count')
    hua_block_counts.reset_index(inplace = True)
    hua_block_counts = hua_block_counts.\
        rename(columns={'huid': "tothupoints", hui_blockid : bldg_blockid })
    # Sum tothupoints
    hua_apcount = hua_block_counts['tothupoints'].sum()
    print("Total number of expected housing unit address points in county:",hua_apcount)

    # merge address point counts by block with building data
    census_blocks_df_cols = pd.merge(right = census_blocks_df_cols,
                       left = hua_block_counts,
                       right_on = bldg_blockid,
                       left_on =  bldg_blockid,
                       how = 'outer')   

    # fill in missing tothupoints with 0 values
    census_blocks_df_cols['tothupoints'] = census_blocks_df_cols['tothupoints'].fillna(value=0)

Total number of expected housing unit address points in county: 132553


In [34]:
census_blocks_df_cols.head(1).T

Unnamed: 0,0
BLOCKID10,481677201001000
tothupoints,229.0
BLOCKID10_str,B481677201001000
geometry,"POLYGON ((-95.217556 29.549383, -95.2171889999..."
rppnt104269,POINT (-95.2110716827504 29.554252499999997)


### Prepare Building Inventory to Expand Based on Housing Unit Estimate
For the address point inventory to work there needs to be one observation for each possible housing unit. This means that for buildings that have multiple housing units there will be one address point for each housing unit.

For places that do not have buildings but have people the address point inventory will provide details on housing units impacted outside of the study area.

In [35]:
huesimate_df_cols.huestimate.describe()

count    172507.000000
mean          1.049250
std           2.031178
min           0.000000
25%           1.000000
50%           1.000000
75%           1.000000
max         295.000000
Name: huestimate, dtype: float64

In [36]:
# If the residentialAP3v1 is used to expand the dataset observations without residential address points will be lost.
# To keep all buildings add an expand variable
huesimate_df_cols.loc[(huesimate_df_cols['huestimate']==0),'expandvar'] = 1
huesimate_df_cols.loc[(huesimate_df_cols['huestimate']>0),'expandvar'] = huesimate_df_cols['huestimate']
# Check to make sure expand variable was generated correctly
pd.crosstab(huesimate_df_cols['expandvar'].loc[huesimate_df_cols['expandvar']<=3],
            huesimate_df_cols['huestimate'], margins=True, margins_name="Total")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  huesimate_df_cols.loc[(huesimate_df_cols['huestimate']==0),'expandvar'] = 1


huestimate,0.0,1.0,2.0,3.0,Total
expandvar,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1.0,29082,124997,0,0,154079
2.0,0,0,14783,0,14783
3.0,0,0,0,1705,1705
Total,29082,124997,14783,1705,170567


## Expand GUID List
Using the expand variable expand building inventory.

In [37]:
huesimate_df_cols['expandvar'].describe()

count    172507.000000
mean          1.217835
std           1.977896
min           1.000000
25%           1.000000
50%           1.000000
75%           1.000000
max         295.000000
Name: expandvar, dtype: float64

In [38]:
huesimate_df_cols.loc[(huesimate_df_cols['expandvar']<0)]

Unnamed: 0,guid,blockBLOCKID10_str,huestimate,expandvar


In [39]:
huesimate_df_cols.loc[(huesimate_df_cols.expandvar.isna())]


Unnamed: 0,guid,blockBLOCKID10_str,huestimate,expandvar
569,,B481677201001008,,
596,,B481677201001011,,
4554,,B481677202002022,,
11370,,B481677204001028,,
15078,,B481677205022000,,
...,...,...,...,...
164663,f0245f4c-e2f7-4e79-88a9-47cb9a1712f8,,,
164664,d00bed35-370f-433b-bf05-b944d775cae2,,,
164665,aeba38f0-2848-438e-bdf4-0d0b075485a9,,,
164666,663cfc4d-6c00-4017-91f2-13cf3dcbc74f,,,


In [40]:
huesimate_df_cols.loc[(huesimate_df_cols.expandvar.isna(),'expandvar')] = 1
huesimate_df_cols.loc[(huesimate_df_cols.expandvar.isna())]

Unnamed: 0,guid,blockBLOCKID10_str,huestimate,expandvar


In [41]:
# The address point inventory is the expanded housing unit estimate dataframe
# code to expand dataframe using .repeat() method
huesimate_df_cols_expand = huesimate_df_cols.reindex(
    huesimate_df_cols.index.repeat(huesimate_df_cols['expandvar']))

In [42]:
huesimate_df_cols_expand.guid.describe()

count                                   210112
unique                                  172534
top       6396008f-530a-481c-9757-93f7d58391f9
freq                                       295
Name: guid, dtype: object

In [43]:
huesimate_df_cols_expand.head()

Unnamed: 0,guid,blockBLOCKID10_str,huestimate,expandvar
0,1815653a-7b70-44ce-8544-e975596bdf82,B481677201001000,1.0,1.0
1,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,B481677201001000,1.0,1.0
2,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,B481677201001000,1.0,1.0
3,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,B481677201001000,1.0,1.0
4,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,B481677201001000,1.0,1.0


# Expand Residential Address Point Count File
Using the count of address point variable expand residential address point count file

In [44]:
census_blocks_df_cols['BLOCKID10'].describe()

count    9.595000e+03
mean     4.816772e+14
std      1.525203e+08
min      4.816772e+14
25%      4.816772e+14
50%      4.816772e+14
75%      4.816772e+14
max      4.816799e+14
Name: BLOCKID10, dtype: float64

In [45]:
census_blocks_df_cols['tothupoints'].describe()

count    9595.000000
mean       13.814799
std        34.063595
min         0.000000
25%         0.000000
50%         5.000000
75%        14.000000
max       923.000000
Name: tothupoints, dtype: float64

In [46]:
# The expand variable can not have missing values
census_blocks_df_cols.loc[(census_blocks_df_cols['tothupoints'].isna()),'expandvar'] = 0
census_blocks_df_cols.loc[(census_blocks_df_cols['tothupoints']>=0),'expandvar'] = census_blocks_df_cols['tothupoints']
# Check to make sure expand variable was generated correctly
census_blocks_df_cols['expandvar'].describe()

count    9595.000000
mean       13.814799
std        34.063595
min         0.000000
25%         0.000000
50%         5.000000
75%        14.000000
max       923.000000
Name: expandvar, dtype: float64

In [47]:
census_blocks_df_cols.loc[(census_blocks_df_cols.expandvar.isna())]

Unnamed: 0,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar


In [48]:
# Expand data using repeat method
census_blocks_df_cols_expand = census_blocks_df_cols.reindex(
    census_blocks_df_cols.index.repeat(census_blocks_df_cols['expandvar']))

In [49]:
census_blocks_df_cols_expand.BLOCKID10.describe()

count    1.325530e+05
mean     4.816772e+14
std      1.862878e+07
min      4.816772e+14
25%      4.816772e+14
50%      4.816772e+14
75%      4.816772e+14
max      4.816773e+14
Name: BLOCKID10, dtype: float64

In [50]:
census_blocks_df_cols_expand.head()

Unnamed: 0,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0


## Merge Two Address Point Files
Combing the address points based on building inventory and the address points based on the 2010 Census will create one file that has address points for the entire county.

The combined file will show where the building inventory may be missing information within the study community. The combined file will also help to show the populations impacted both inside the study community and in neighboring areas.

To merge the two files need to add a counter to each file by blockid.

In [51]:
# Add counter by block id - use cumulative count method
census_blocks_df_cols_expand['blockidcounter'] = census_blocks_df_cols_expand.groupby('BLOCKID10').cumcount()

In [52]:
census_blocks_df_cols_expand.head()

Unnamed: 0,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar,blockidcounter
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,0
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,1
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,2
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,3
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,4


In [53]:
# Add counter by block id - use cumulative count method
huesimate_df_cols_expand['blockidcounter'] = huesimate_df_cols_expand.groupby('blockBLOCKID10_str').cumcount()
huesimate_df_cols_expand.head()

Unnamed: 0,guid,blockBLOCKID10_str,huestimate,expandvar,blockidcounter
0,1815653a-7b70-44ce-8544-e975596bdf82,B481677201001000,1.0,1.0,0
1,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,B481677201001000,1.0,1.0,1
2,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,B481677201001000,1.0,1.0,2
3,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,B481677201001000,1.0,1.0,3
4,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,B481677201001000,1.0,1.0,4


In [54]:
census_blocks_df_cols_expand.head()

Unnamed: 0,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar,blockidcounter
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,0
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,1
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,2
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,3
0,481677201001000,229.0,B481677201001000,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),229.0,4


In [55]:
# Merge 2 files based on blockid and blockid counter - keep all observations from both files with full outer join
address_point_inventory = pd.merge(left = huesimate_df_cols_expand, 
                                  right = census_blocks_df_cols_expand,
                                  left_on=['blockBLOCKID10_str','blockidcounter'], 
                                  right_on=['BLOCKID10_str','blockidcounter'], how='outer')

In [56]:
# Check merge - examples were Building Id is missing
displaycols = ['guid','BLOCKID10']
condition = address_point_inventory['guid'].isna()
address_point_inventory[displaycols].loc[condition].head()

Unnamed: 0,guid,BLOCKID10
569,,481677200000000.0
596,,481677200000000.0
4908,,481677200000000.0
12415,,481677200000000.0
16523,,481677200000000.0


In [57]:
# Check merge - examples were there is no census data
displaycols = ['guid','BLOCKID10']
condition = address_point_inventory['tothupoints'].isnull()
address_point_inventory[displaycols].loc[condition].head()

Unnamed: 0,guid,BLOCKID10
229,b83de3c7-c3f9-4080-9a43-21d363c57037,
230,6d5cb41e-aaea-4f52-b438-d93b6e9eefc8,
231,4c5db572-6765-4256-840a-315e428d4a0e,
232,32b40363-687b-46c7-94c1-2d4c98fa42f7,
233,5b4216ce-4f5c-43a8-9ed3-ba554108de47,


# Fix issue with missing blockid vs BLOCKID10

In [58]:
address_point_inventory.loc[address_point_inventory.guid == "b83de3c7-c3f9-4080-9a43-21d363c57037"]

Unnamed: 0,guid,blockBLOCKID10_str,huestimate,expandvar_x,blockidcounter,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar_y
229,b83de3c7-c3f9-4080-9a43-21d363c57037,B481677201001000,1.0,1.0,229,,,,,,


In [59]:
address_point_inventory.loc[address_point_inventory.BLOCKID10_str.isna(),
                                                                'BLOCKID10_str'] = address_point_inventory['blockBLOCKID10_str']

In [60]:
cols = [col for col in address_point_inventory]
cols

['guid',
 'blockBLOCKID10_str',
 'huestimate',
 'expandvar_x',
 'blockidcounter',
 'BLOCKID10',
 'tothupoints',
 'BLOCKID10_str',
 'geometry',
 'rppnt104269',
 'expandvar_y']

#### The Address Point ID is based on the building id first then the block id
In the best case scenario every address point is connected to a building but in cases where the building id is missing then the address point is based on the Census Block ID.

In [61]:
address_point_inventory.loc[(address_point_inventory['guid'].isna()),
                            'strctid'] = address_point_inventory.apply(lambda x: "C"+ str(x['BLOCKID10_str']).zfill(36), axis=1)
address_point_inventory.loc[(address_point_inventory['guid'].notna()),
                            'strctid'] = address_point_inventory.apply(lambda x: "ST"+ str(x['guid']).zfill(36), axis=1)
# Confirm Primary Key is Unique and Non-Missing
address_point_inventory[['strctid']].head(10)

Unnamed: 0,strctid
0,ST1815653a-7b70-44ce-8544-e975596bdf82
1,ST34d0d761-3d71-4b0f-bc7b-0ddccd509a25
2,ST68500aae-0ff1-4332-b9c2-1b3c5eeda70d
3,ST21151b3e-c5b5-4a8d-9e9d-13c77b203a51
4,ST5691e976-f19c-4bdb-9f9c-9b5da7d8a446
5,ST84b6b57a-3150-4d32-8bad-8c74771afb44
6,STe09242b5-4b90-4d27-94eb-ccac84becbe5
7,ST0bbc716a-4439-4ccc-9de0-a906ebd59d7a
8,STd9768ff3-aaeb-4c97-85cf-866a97f31707
9,STba114268-7dea-4a50-b609-779203296136


In [62]:
# Sort Address Points by The first part of the address point 
address_point_inventory.sort_values(by=['strctid'])
# Add Counter by Building
address_point_inventory['apcounter'] = address_point_inventory.groupby('strctid').cumcount()

# Are there any examples were the block building counter does not equal the blockid counter?
displaycols = ['guid','BLOCKID10_str','tothupoints','blockidcounter','apcounter']
condition = address_point_inventory['blockidcounter']!=address_point_inventory['apcounter']
address_point_inventory[displaycols].loc[condition].head()

Unnamed: 0,guid,BLOCKID10_str,tothupoints,blockidcounter,apcounter
1,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,B481677201001000,229.0,1,0
2,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,B481677201001000,229.0,2,0
3,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,B481677201001000,229.0,3,0
4,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,B481677201001000,229.0,4,0
5,84b6b57a-3150-4d32-8bad-8c74771afb44,B481677201001000,229.0,5,0


In [63]:
address_point_inventory.strctid.describe()

count                                     210678
unique                                    172648
top       ST6396008f-530a-481c-9757-93f7d58391f9
freq                                         295
Name: strctid, dtype: object

To make a unique id for the address points need to have a combination of unique values. The first part of the address point id is based on either the building id or the block id. Within each Building or Census Block the counter variable provides a way to identify address points within a block.

In [64]:
address_point_inventory['addrptid'] = address_point_inventory.apply(lambda x: x['strctid'] + "AP" +
                                                                 str(int(x['apcounter'])).zfill(6), axis=1)
# Move Primary Key Column to first Column
cols = ['addrptid']  + [col for col in address_point_inventory if col != 'addrptid']
address_point_inventory = address_point_inventory[cols]
address_point_inventory[['addrptid','BLOCKID10_str', 'apcounter']].head(6)

Unnamed: 0,addrptid,BLOCKID10_str,apcounter
0,ST1815653a-7b70-44ce-8544-e975596bdf82AP000000,B481677201001000,0
1,ST34d0d761-3d71-4b0f-bc7b-0ddccd509a25AP000000,B481677201001000,0
2,ST68500aae-0ff1-4332-b9c2-1b3c5eeda70dAP000000,B481677201001000,0
3,ST21151b3e-c5b5-4a8d-9e9d-13c77b203a51AP000000,B481677201001000,0
4,ST5691e976-f19c-4bdb-9f9c-9b5da7d8a446AP000000,B481677201001000,0
5,ST84b6b57a-3150-4d32-8bad-8c74771afb44AP000000,B481677201001000,0


In [65]:
# Confirm Primary Key is Unique and Non-Missing
address_point_inventory.addrptid.describe()

count                                             210678
unique                                            210678
top       ST1815653a-7b70-44ce-8544-e975596bdf82AP000000
freq                                                   1
Name: addrptid, dtype: object

#### Generate Flag Variables
For the merged dataset identify cases where either building or census data is missing.

In [66]:
# Create Address Poing Flag Variable
address_point_inventory['flag_ap'] = 0
address_point_inventory.loc[(address_point_inventory['tothupoints'].isnull()),'flag_ap'] = 1
address_point_inventory.loc[(address_point_inventory['guid'].isna()),'flag_ap'] = 2
address_point_inventory.loc[(address_point_inventory['BLOCKID10_str'].isnull()),'flag_ap'] = 3
# Check to make sure expand variable was generated correctly
address_point_inventory.groupby(['flag_ap']).count()

Unnamed: 0_level_0,addrptid,guid,blockBLOCKID10_str,huestimate,expandvar_x,blockidcounter,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar_y,strctid,apcounter
flag_ap,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0,131987,131987,131987,131987,131987,131987,131987,131987,131987,131987,131987,131987,131987,131987
1,78098,78098,78098,78098,78098,78098,0,0,78098,0,0,0,78098,78098
2,566,0,114,0,114,566,566,566,566,566,566,566,566,566
3,27,27,0,0,27,27,0,0,0,0,0,0,27,27


## Identify observations that represent the primary building
In some future exploration cases it would be of interest to run cross tabulations on just the buildings, instead of all of the address points. To identify the buildings it is possible to use the address point counter (apcounter) and the address point flag (flag_ap). If the counter is 0 and the flag is 0 or 1 then the address point observation is the first address point in a building.

In [67]:
# create a binary variable 0 - not the primary building observation, 1 - use to count buildings
address_point_inventory['bldgobs'] = 0
# If the ap count is 0 and the flag is 0 or 1 then the bldgobs should be 1
address_point_inventory.loc[(address_point_inventory['apcounter'] == 0) &
                            (address_point_inventory['flag_ap'] <= 1), 'bldgobs'] = 1
# Check new variable
address_point_inventory.groupby(['bldgobs']).count()

Unnamed: 0_level_0,addrptid,guid,blockBLOCKID10_str,huestimate,expandvar_x,blockidcounter,BLOCKID10,tothupoints,BLOCKID10_str,geometry,rppnt104269,expandvar_y,strctid,apcounter,flag_ap
bldgobs,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,38171,37605,37692,37578,37719,38171,29574,29574,38144,29574,29574,29574,38171,38171,38171
1,172507,172507,172507,172507,172507,172507,102979,102979,172507,102979,102979,102979,172507,172507,172507


## Set Geometry for Address Points
The location of the address point will be important for identifying the hazard impact. There are two options for the address point location.

If there is a building representative point use the building representative point
If there building data is missing use the representative point from the census block

In [68]:
address_point_inventory[['guid','geometry','rppnt104269']].head()


Unnamed: 0,guid,geometry,rppnt104269
0,1815653a-7b70-44ce-8544-e975596bdf82,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
1,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
2,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
3,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
4,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)


In [69]:
bldg_inv_gdf[['guid','geometry']].head()

Unnamed: 0,guid,geometry
0,1815653a-7b70-44ce-8544-e975596bdf82,POINT (-95.21749 29.54973)
1,df63f574-8e9b-426b-aa3b-b3757cb699b5,POINT (-95.21780 29.54901)
2,a743ae24-4209-44e2-b11e-7a872f071ae9,POINT (-95.21771 29.54766)
3,59ed0339-c8e3-4fcd-9b5a-c1487b035d3b,POINT (-95.21894 29.54202)
4,5cc8a749-21ca-4073-8626-4ae7332cc0dd,POINT (-95.21891 29.54143)


In [70]:
# Merge guid and geometry from building inventory to address point inventory
address_point_inventory_geo = pd.merge(left = address_point_inventory,
                                    right = bldg_inv_gdf[['guid','geometry']],
                                    left_on=['guid'],
                                    right_on=['guid'], 
                                    how='left')
# Rename geometry column to block geometry
address_point_inventory_geo.rename(columns={'geometry_x':'block10_geometry'}, inplace=True)

# Rename geometry column to building geometry
address_point_inventory_geo.rename(columns={'geometry_y':'building_geometry'}, inplace=True)

address_point_inventory_geo[['guid','building_geometry','block10_geometry','rppnt104269']].head()


Unnamed: 0,guid,building_geometry,block10_geometry,rppnt104269
0,1815653a-7b70-44ce-8544-e975596bdf82,POINT (-95.21749 29.54973),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
1,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,POINT (-95.20944 29.55829),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
2,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,POINT (-95.20865 29.55603),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
3,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,POINT (-95.20867 29.55600),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)
4,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,POINT (-95.21040 29.55450),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997)


### Identify Residential Address Points
For Address Points that have an estimate for the number of housing units, or if the building data is missing then the address point is likely to be a residential address point.

The knowledge that an address point is residential will help prioritize the allocation of housing units to address points.

For address points in buildings with more than housing unit the number of housing units also provides a way to prioritize renters and owners. With renters more likely to be allocated to buildings with greater numbers of housing units.

In [71]:

address_point_inventory_geo['residential'] = 0
# If the building id is missing then the address point is residential
address_point_inventory_geo.loc[(address_point_inventory_geo['guid'].isna()),'residential'] = 1
# The the variable residentialAP3v1 is greater than 0 then the address point is residential
address_point_inventory_geo.loc[(address_point_inventory_geo['huestimate']>0),'residential'] = 1
# Check new variable
address_point_inventory_geo[['flag_ap','residential']].groupby(['flag_ap']).sum()

Unnamed: 0_level_0,residential
flag_ap,Unnamed: 1_level_1
0,120531
1,60472
2,566
3,0


## Keep primary columns
The address point county file has many columns but only a few are needed to generate the address point inventory.

In [72]:
## Create block id variable from substring of BLOCKID10_str
address_point_inventory_geo['blockid'] = address_point_inventory_geo['BLOCKID10_str'].str[1:16]
address_point_inventory_geo[['blockid','BLOCKID10_str']].head()

Unnamed: 0,blockid,BLOCKID10_str
0,481677201001000,B481677201001000
1,481677201001000,B481677201001000
2,481677201001000,B481677201001000
3,481677201001000,B481677201001000
4,481677201001000,B481677201001000


In [73]:
select_cols = ['addrptid','strctid','guid','blockid','BLOCKID10_str','building_geometry','block10_geometry','rppnt104269',
    'huestimate','residential','bldgobs','flag_ap']
address_point_inventory_cols = address_point_inventory_geo[select_cols]
address_point_inventory_cols.head()

Unnamed: 0,addrptid,strctid,guid,blockid,BLOCKID10_str,building_geometry,block10_geometry,rppnt104269,huestimate,residential,bldgobs,flag_ap
0,ST1815653a-7b70-44ce-8544-e975596bdf82AP000000,ST1815653a-7b70-44ce-8544-e975596bdf82,1815653a-7b70-44ce-8544-e975596bdf82,481677201001000,B481677201001000,POINT (-95.21749 29.54973),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),1.0,1,1,0
1,ST34d0d761-3d71-4b0f-bc7b-0ddccd509a25AP000000,ST34d0d761-3d71-4b0f-bc7b-0ddccd509a25,34d0d761-3d71-4b0f-bc7b-0ddccd509a25,481677201001000,B481677201001000,POINT (-95.20944 29.55829),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),1.0,1,1,0
2,ST68500aae-0ff1-4332-b9c2-1b3c5eeda70dAP000000,ST68500aae-0ff1-4332-b9c2-1b3c5eeda70d,68500aae-0ff1-4332-b9c2-1b3c5eeda70d,481677201001000,B481677201001000,POINT (-95.20865 29.55603),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),1.0,1,1,0
3,ST21151b3e-c5b5-4a8d-9e9d-13c77b203a51AP000000,ST21151b3e-c5b5-4a8d-9e9d-13c77b203a51,21151b3e-c5b5-4a8d-9e9d-13c77b203a51,481677201001000,B481677201001000,POINT (-95.20867 29.55600),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),1.0,1,1,0
4,ST5691e976-f19c-4bdb-9f9c-9b5da7d8a446AP000000,ST5691e976-f19c-4bdb-9f9c-9b5da7d8a446,5691e976-f19c-4bdb-9f9c-9b5da7d8a446,481677201001000,B481677201001000,POINT (-95.21040 29.55450),"POLYGON ((-95.21756 29.54938, -95.21719 29.551...",POINT (-95.2110716827504 29.554252499999997),1.0,1,1,0


### Merge Address Point inventory with Building and Census Data
To analyze the impact of the hazard the address point inventory needs to include building information and census place information. The building information will include building type, year built, and appraised values (when available). The Census information will include city name and count information.

In [74]:
# Keep columns for merge
merge_cols = ['guid','arch_flood']
building_df_merge_cols = bldg_inv_gdf[merge_cols]
building_df_merge_cols.head()

Unnamed: 0,guid,arch_flood
0,1815653a-7b70-44ce-8544-e975596bdf82,1
1,df63f574-8e9b-426b-aa3b-b3757cb699b5,1
2,a743ae24-4209-44e2-b11e-7a872f071ae9,1
3,59ed0339-c8e3-4fcd-9b5a-c1487b035d3b,1
4,5cc8a749-21ca-4073-8626-4ae7332cc0dd,1


In [75]:
# Confirm Primary Key is Unique and Non-Missing
building_df_merge_cols.guid.describe()

count                                   172534
unique                                  172534
top       1815653a-7b70-44ce-8544-e975596bdf82
freq                                         1
Name: guid, dtype: object

In [76]:
# merge selected columns from building inventory to address point inventory
address_point_inventory_cols_bldg = pd.merge(address_point_inventory_cols, building_df_merge_cols,
                                  left_on='guid', right_on='guid', how='left')

Merge Select Columns from Census Block Data

In [77]:
census_block_place_puma_gdf.head(1).T

Unnamed: 0,0
Unnamed: 0,0
BLOCKID10,481677240001060
BLOCKID10_str,B481677240001060
STATEFP10,48
COUNTYFP10,167
TRACTCE10,724000
BLOCKCE10,1060
GEOID10,481677240001060
NAME10,Block 1060
MTFCC10,G5040


In [78]:
# For the merge only need a select number of columns
merge_cols = ['BLOCKID10_str','placeGEOID10','placeNAME10','COUNTYFP10']
census_blocks_df_merge_cols = census_block_place_puma_gdf[merge_cols]
census_blocks_df_merge_cols.head()

Unnamed: 0,BLOCKID10_str,placeGEOID10,placeNAME10,COUNTYFP10
0,B481677240001060,4828068.0,Galveston,167
1,B481677219001117,4872392.0,Texas City,167
2,B481677261001036,4828068.0,Galveston,167
3,B481677261001179,4828068.0,Galveston,167
4,B481677226001020,4872392.0,Texas City,167


In [79]:
# Confirm Primary Key is Unique and Non-Missing
census_blocks_df_merge_cols.BLOCKID10_str.describe()

count                 9595
unique                9595
top       B481677240001060
freq                     1
Name: BLOCKID10_str, dtype: object

In [80]:
# merge selected columns from building inventory to address point inventory
address_point_inventory_cols_bldg_block = pd.merge(address_point_inventory_cols_bldg, 
                                  census_blocks_df_merge_cols,
                                  left_on='BLOCKID10_str', right_on='BLOCKID10_str', how='left')

In [81]:
address_point_inventory_cols_bldg_block[['placeNAME10','guid']].groupby(['placeNAME10']).count()

Unnamed: 0_level_0,guid
placeNAME10,Unnamed: 1_level_1
Bacliff,5508
Bayou Vista,1380
Bolivar Peninsula,5844
Clear Lake Shores,1053
Dickinson,11507
Friendswood,13738
Galveston,44166
Hitchcock,5793
Jamaica Beach,1651
Kemah,1575


### Identify Unincorporated Areas with Place Name
There are many address points that fall just outside of city limits in unincorporated places. For these areas use the county information to label the place names as the County Name.

In [3]:
state_county = '48167'
fips = int(str(state_county)[2:5])
print(fips)

167


In [82]:
address_point_inventory_cols_bldg_block.loc[(address_point_inventory_cols_bldg_block['placeNAME10'].isna()) &
                                            (address_point_inventory_cols_bldg_block['COUNTYFP10'] == 167),
                                             'placeNAME10'] = "Unincorporated Galveston County"
# Check new variable
pd.crosstab(address_point_inventory_cols_bldg_block['placeNAME10'], 
            address_point_inventory_cols_bldg_block['COUNTYFP10'], margins=True, margins_name="Total")

COUNTYFP10,167.0,Total
placeNAME10,Unnamed: 1_level_1,Unnamed: 2_level_1
Bacliff,5518,5518
Bayou Vista,1380,1380
Bolivar Peninsula,5860,5860
Clear Lake Shores,1053,1053
Dickinson,11516,11516
Friendswood,13747,13747
Galveston,44435,44435
Hitchcock,5810,5810
Jamaica Beach,1651,1651
Kemah,1577,1577


### Save Work as CSV
A CSV file with the Well Known Text (WKT) geometry provides flexibility for saving and working with files.

In [83]:
# Move Foreign Key Columns Block ID State, County, Tract to first Columns
first_columns = ['addrptid','guid','strctid','blockid','placeGEOID10','placeNAME10','COUNTYFP10']
cols = first_columns + [col for col in address_point_inventory_cols_bldg_block if col not in first_columns]
address_point_inventory_cols_bldg_block = address_point_inventory_cols_bldg_block[cols]

In [84]:
#Save results for community name
output_filename = f'addpt_{version_text}_{community}_{basevintage}_{year}_{bldg_inv_id}'
csv_filepath = outputfolders['top']+"/"+output_filename+'.csv'
savefile = sys.path[0]+"/"+csv_filepath
address_point_inventory_cols_bldg_block.to_csv(savefile, index=False)

# Save second set of files in common directory
common_directory = outputfolders['top']+"/../"+output_filename
address_point_inventory_cols_bldg_block.to_csv(common_directory+'.csv', index=False)

#### Add X Y variables
To be consistent with previous address point inventories add X and Y variables

Issue with missing geometries could not be fixed earlier because data frames were geodataframes. Now that the data frame is a regular data frame can use geometry columns as string to fix the issue.


In [85]:
## read in the address point inventory csv file
address_point_df = pd.read_csv(common_directory+'.csv')

In [86]:
address_point_df.head(1).T

Unnamed: 0,0
addrptid,ST1815653a-7b70-44ce-8544-e975596bdf82AP000000
guid,1815653a-7b70-44ce-8544-e975596bdf82
strctid,ST1815653a-7b70-44ce-8544-e975596bdf82
blockid,481677201001000.0
placeGEOID10,4827648.0
placeNAME10,Friendswood
COUNTYFP10,167.0
BLOCKID10_str,B481677201001000
building_geometry,POINT (-95.21749326252923 29.549727920274638)
block10_geometry,"POLYGON ((-95.217556 29.549383, -95.2171889999..."


In [87]:
# Set Address Point Geometry
# The default geometry is the building representative point
address_point_df['geometry'] = address_point_df['building_geometry']
# When the building representative point is missing use the Census Block Representative Point
address_point_df.loc[(address_point_df['geometry'].isnull()),'geometry'] = address_point_df['rppnt104269']

In [88]:
# Convert Data Frame to Geodataframe
address_point_gdf = gpd.GeoDataFrame(address_point_df)

# Use shapely.wkt loads to convert WKT to GeoSeries
from shapely.wkt import loads

address_point_gdf['geometry'] = address_point_gdf['geometry'].apply(lambda x: loads(x))

In [89]:
address_point_gdf['x'] = address_point_gdf['geometry'].x
address_point_gdf['y'] = address_point_gdf['geometry'].y
address_point_gdf[['geometry','x','y']].head(10)

Unnamed: 0,geometry,x,y
0,POINT (-95.21749 29.54973),-95.217493,29.549728
1,POINT (-95.20944 29.55829),-95.209442,29.558292
2,POINT (-95.20865 29.55603),-95.208647,29.556027
3,POINT (-95.20867 29.55600),-95.208671,29.556004
4,POINT (-95.21040 29.55450),-95.210402,29.554502
5,POINT (-95.21046 29.55448),-95.210461,29.554481
6,POINT (-95.21590 29.55434),-95.215905,29.554339
7,POINT (-95.21019 29.55426),-95.210192,29.554259
8,POINT (-95.20837 29.55858),-95.208373,29.558576
9,POINT (-95.20790 29.55827),-95.207897,29.558275


### Fix issue with Blockid with trailing .0
The blockid variable has a trailing .0 that needs to be removed.
Issue with Place GEOID, Counnty FIPS

In [90]:
address_point_gdf.head(1).T

Unnamed: 0,0
addrptid,ST1815653a-7b70-44ce-8544-e975596bdf82AP000000
guid,1815653a-7b70-44ce-8544-e975596bdf82
strctid,ST1815653a-7b70-44ce-8544-e975596bdf82
blockid,481677201001000.0
placeGEOID10,4827648.0
placeNAME10,Friendswood
COUNTYFP10,167.0
BLOCKID10_str,B481677201001000
building_geometry,POINT (-95.21749326252923 29.549727920274638)
block10_geometry,"POLYGON ((-95.217556 29.549383, -95.2171889999..."


### ISSUE - There are buildings on the edge of the county
These observations do not geocode inside the county boundary.

Observations with missing values can not be converted to integer and therefore will have the trailing .0 - since they are a float.

In [91]:
# Locate observations with missing COUNTYFP10
address_point_gdf.loc[(address_point_gdf['COUNTYFP10'].isna()),['COUNTYFP10','placeNAME10','placeGEOID10','geometry']]

Unnamed: 0,COUNTYFP10,placeNAME10,placeGEOID10,geometry
202219,,,,POINT (-95.21780 29.54901)
202220,,,,POINT (-95.21894 29.54202)
202221,,,,POINT (-95.22080 29.53290)
202222,,,,POINT (-95.22137 29.52884)
202223,,,,POINT (-95.22203 29.52607)
202224,,,,POINT (-95.22255 29.52248)
202225,,,,POINT (-95.22371 29.51618)
202226,,,,POINT (-95.22389 29.51562)
202227,,,,POINT (-95.22399 29.51476)
202228,,,,POINT (-95.22219 29.52471)


In [92]:
# Count the number of observations with missing COUNTYFP10
address_point_gdf.loc[(address_point_gdf['COUNTYFP10'].isna()),['COUNTYFP10','placeNAME10','placeGEOID10','geometry']].shape

(27, 4)

In [93]:
# Drop observations with missing COUNTYFP10
address_point_gdfv2 = address_point_gdf.dropna(subset=['COUNTYFP10'])

In [94]:
# Remove .0 from data
address_point_gdfv2 = address_point_gdfv2.applymap(lambda cell: int(cell) if str(cell).endswith('.0') else cell)


In [97]:
# drop columns not needed for analysis
address_point_gdfv2.drop(['geometry','building_geometry','block10_geometry','rppnt104269'], axis=1, inplace=True)

In [101]:
address_point_gdfv2.head(1).T

Unnamed: 0,0
addrptid,ST1815653a-7b70-44ce-8544-e975596bdf82AP000000
guid,1815653a-7b70-44ce-8544-e975596bdf82
strctid,ST1815653a-7b70-44ce-8544-e975596bdf82
blockid,481677201001000
placeGEOID10,4827648.0
placeNAME10,Friendswood
COUNTYFP10,167
BLOCKID10_str,B481677201001000
huestimate,1.0
residential,1


In [102]:
# Resave results for community name
output_filename = f'addpt_{version_text}_{community}_{basevintage}_{year}_{bldg_inv_id}'
csv_filepath = outputfolders['top']+"/"+output_filename+'.csv'
savefile = sys.path[0]+"/"+csv_filepath
address_point_gdfv2.to_csv(savefile, index=False)

# Save second set of files in common directory
common_directory = outputfolders['top']+"/../"+output_filename
address_point_gdfv2.to_csv(common_directory+'.csv', index=False)

## Upload Address Point Inventory to IN-CORE

In [103]:
generate_hui_df = generate_hui_functions(
                    communities =   communities,
                    seed =          seed,
                    version =       version,
                    version_text=   version_text,
                    basevintage=    basevintage,
                    outputfolder=   outputfolder
                    )

In [104]:
county_list = 'Galveston County, TX'
# Upload CSV file to IN-CORE and save dataset_id
# note you have to put the correct dataType as well as format
title = "Address Point Inventory v2.0.0 data for Galveston, TX"
addpt_description =  '\n'.join(["2010 Address Point Inventory v2.0.0 with required IN-CORE columns. " 
        "Compatible with pyincore v1.4. " 
        "Unit of observation is address point. " 
        "Each address point is associated with a building in the building inventory. "
        "Building Inventory ID is the last part of the address point filename. " 
        "Rosenheim, Nathanael. (2022). npr99/intersect-community-data. Zenodo. " 
        "https://doi.org/10.5281/zenodo.6476122. "
        "File includes data for "+county_list])

dataset_metadata = {
    "title":title,
    "description": addpt_description,
    "dataType": "incore:addressPoints",
    "format": "table"
    }

data_service = generate_hui_df.loginto_incore_dataservice()
created_dataset = data_service.create_dataset(properties = dataset_metadata)
dataset_id = created_dataset['id']
print('dataset is created with id ' + dataset_id)

## Attach files to the dataset created
files = [csv_filepath]
full_dataset = data_service.add_files_to_dataset(dataset_id, files)

print('The file(s): '+ output_filename +" have been uploaded to IN-CORE")
print("Dataset now on IN-CORE, use dataset_id:",dataset_id)
print("Dataset is only in personal account, contact IN-CORE to make public")

Connection successful to IN-CORE services. pyIncore version detected: 1.4.1
dataset is created with id 632397becd619334caa8fc78


HTTPError: 403 Client Error: Forbidden for url: https://incore.ncsa.illinois.edu/data/api/datasets/632397becd619334caa8fc78/files