<a href="https://colab.research.google.com/github/npr99/IN-CORE_notebooks/blob/main/IN_CORE_NCSA_Galveston_Building_Inventory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Update Gallveston Building Inventory
The Galveston Building inventory needs to have the required attribute columns based on IN-CORE standards.

The inventory needs to be in a shapefile format.

The inventory needs to include the variables needed for the Building Damage and the Housing Unit Allocation algorithms.



### Variables needed for building damage
LHSM: Elevation of the lowest horizontal structural member (units in ft)

age_group: Age group of the building (1, 2,3, and 4 representing age group pre-1974, 1974–1987, 1987–1995, and 1995– 2008, respectively)

G_elev: Elevation of the building with respect to the ground (units in meter)

### Comment from Michal Ondrejcek at NCSA
The latest Building Inventory of a shape file is of version 6, we use the type
ergo:buildingInventoryVer6 which has attributes shown in the picture here (I am using QGIS to open Joplin building inventory).
It would be nice if we can use the same set of attributes. We noticed that, for example Lumberton testbed has slightly different column/attribute names and a few extra ones. If the extra are empty we don’t have to add them to version 6. I think the same should apply for Galveston. If we have to expand the list of building attributes we would call it version 7.
The most important attribute for pyIncore is guid which is used for merging different data sets. In the past guid was generated by NCSA. It gets automatically created when a user uploads Building inventory shape file (not csv) to INCORE’s Dataservice assuming guid attribute does not exist. We can, for example convert your csv with geo reference to shape file and upload it and send back a guid column or you can upload the shape file yourself.

In [None]:
import pandas as pd   # Pandas for data frame manipulation
import geopandas as gpd # For reading in shapefiles
import numpy as np # For filling in missing values

In [None]:
bldg_invtry_updated_gdf = gpd.read_file('Fereshtehnejad et al 2020 Building Inventory/BuildingInventoryUpdated.csv')

In [None]:
bldg_invtry_updated_gdf.head()

Unnamed: 0,Lon,Lat,xref,age_group,LHSM,G_elev,geometry
0,-95.08369,29.11267,0121-0001-0004-000,4,14,2.411895,
1,-95.0914,29.11505,0121-0003-0000-000,4,16,1.903827,
2,-95.07529,29.11732,0121-0022-0000-000,4,16,1.251554,
3,-95.0755,29.11719,0121-0023-0000-000,4,16,1.251554,
4,-95.06448,29.12661,0121-0024-0000-000,2,13,2.178023,


In [None]:
bldg_invtry_updated_gdf['xref'].describe()

count                  18962
unique                 18962
top       6381-0000-1562-000
freq                       1
Name: xref, dtype: object

## Add link between Address Point Inventory and Building Inventory
In order for the Housing Unit Allocation link to the Building Inventory is the structure id.

Structure ID is a critical variable that links the building inventory with the address point inventory. We can explore the address point inventory next and see that the variable strctid is XREF with the string XREF added to the front.

In [None]:
bldg_invtry_updated_gdf['strctid'] =  bldg_invtry_updated_gdf['xref'].apply(lambda x : "XREF"+str(x).zfill(18))
bldg_invtry_updated_gdf[['strctid','xref']].head() 

Unnamed: 0,strctid,xref
0,XREF0121-0001-0004-000,0121-0001-0004-000
1,XREF0121-0003-0000-000,0121-0003-0000-000
2,XREF0121-0022-0000-000,0121-0022-0000-000
3,XREF0121-0023-0000-000,0121-0023-0000-000
4,XREF0121-0024-0000-000,0121-0024-0000-000


In [None]:
bldg_invtry_updated_gdf[['strctid','xref']].describe() 

Unnamed: 0,strctid,xref
count,18962,18962
unique,18962,18962
top,XREF3505-0384-0010-002,6381-0000-1562-000
freq,1,1


## Confirm Structure ID match

In [None]:
addpt_gdf = gpd.read_file('Rosenheim Datasets/IN-CORE_2bv1_GalvestonAddrI_2020-06-30.csv')
addpt_gdf.head()

Unnamed: 0,addrptid,strctid,parid,blockid,blockidstr,huestimate,residential,bldgobs,yrblt,aprbldg,x,y,geometry
0,XREF0121-0001-0004-000AP001,XREF0121-0001-0004-000,XREF0121-0001-0004-000,481677261002122,CB481677261002122,1,1,1,,,-95.08369,29.11267,
1,XREF0121-0003-0000-000AP001,XREF0121-0003-0000-000,XREF0121-0003-0000-000,481677261002009,CB481677261002009,1,1,1,,,-95.0914,29.11505,
2,XREF0121-0022-0000-000AP001,XREF0121-0022-0000-000,XREF0121-0022-0000-000,481677261002121,CB481677261002121,1,1,1,2005.0,,-95.07529,29.11732,
3,XREF0121-0023-0000-000AP001,XREF0121-0023-0000-000,XREF0121-0023-0000-000,481677261002121,CB481677261002121,1,1,1,2005.0,,-95.0755,29.11719,
4,XREF0121-0024-0000-000AP001,XREF0121-0024-0000-000,XREF0121-0024-0000-000,481677261002031,CB481677261002031,1,1,1,,,-95.06448,29.12661,


In [None]:
addpt_gdf['strctid'].describe()

count                      32501
unique                     18962
top       XREF7205-0000-0136-002
freq                         428
Name: strctid, dtype: object

### Notice Strctid is XREF
The Structure ID should be XREF value with "XREF" added to the front.

## Check Address Point Inventory merge with Building Inventory

In [None]:
bldg_addpt_merged_gdf = bldg_invtry_updated_gdf.merge(addpt_gdf, how= 'left',
                                 left_on='strctid', right_on='strctid')
bldg_addpt_merged_gdf['strctid'].describe()

count                      32501
unique                     18962
top       XREF7205-0000-0136-002
freq                         428
Name: strctid, dtype: object

# Keep Unique Strctid
The building inventory should only have one observation for each structure. 

In [None]:
# ID unique observations
bldg_addpt_merged_gdf['count'] = (bldg_addpt_merged_gdf.groupby('strctid').cumcount()).astype(int)
bldg_addpt_merged_gdf[['strctid','count','yrblt','huestimate']].head()

Unnamed: 0,strctid,count,yrblt,huestimate
0,XREF0121-0001-0004-000,0,,1
1,XREF0121-0003-0000-000,0,,1
2,XREF0121-0022-0000-000,0,2005.0,1
3,XREF0121-0023-0000-000,0,2005.0,1
4,XREF0121-0024-0000-000,0,,1


In [None]:
bldg_addpt_merged_gdf[['count','yrblt','huestimate']].describe()

Unnamed: 0,count
count,32501.0
mean,20.717332
std,47.752719
min,0.0
25%,0.0
50%,0.0
75%,10.0
max,427.0


In [None]:
bldg_unique_gdf = bldg_addpt_merged_gdf.loc[bldg_addpt_merged_gdf['count']==0]
bldg_unique_gdf[['count','yrblt','huestimate']].describe()

Unnamed: 0,count
count,18962.0
mean,0.0
std,0.0
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,0.0


In [None]:
#check unique id
bldg_unique_gdf['strctid'].describe()

count                      18962
unique                     18962
top       XREF3505-0384-0010-002
freq                           1
Name: strctid, dtype: object

In [None]:
bldg_unique_gdf.head()

Unnamed: 0,Lon,Lat,xref,age_group,LHSM,G_elev,geometry_x,strctid,addrptid,parid,...,blockidstr,huestimate,residential,bldgobs,yrblt,aprbldg,x,y,geometry_y,count
0,-95.08369,29.11267,0121-0001-0004-000,4,14,2.411895,,XREF0121-0001-0004-000,XREF0121-0001-0004-000AP001,XREF0121-0001-0004-000,...,CB481677261002122,1,1,1,,,-95.08369,29.11267,,0
1,-95.0914,29.11505,0121-0003-0000-000,4,16,1.903827,,XREF0121-0003-0000-000,XREF0121-0003-0000-000AP001,XREF0121-0003-0000-000,...,CB481677261002009,1,1,1,,,-95.0914,29.11505,,0
2,-95.07529,29.11732,0121-0022-0000-000,4,16,1.251554,,XREF0121-0022-0000-000,XREF0121-0022-0000-000AP001,XREF0121-0022-0000-000,...,CB481677261002121,1,1,1,2005.0,,-95.07529,29.11732,,0
3,-95.0755,29.11719,0121-0023-0000-000,4,16,1.251554,,XREF0121-0023-0000-000,XREF0121-0023-0000-000AP001,XREF0121-0023-0000-000,...,CB481677261002121,1,1,1,2005.0,,-95.0755,29.11719,,0
4,-95.06448,29.12661,0121-0024-0000-000,2,13,2.178023,,XREF0121-0024-0000-000,XREF0121-0024-0000-000AP001,XREF0121-0024-0000-000,...,CB481677261002031,1,1,1,,,-95.06448,29.12661,,0


# Clean mereged data to match 

The latest Building Inventory of a shape file is of version 6, we use the type
ergo:buildingInventoryVer6 

https://opensource.ncsa.illinois.edu/confluence/display/INCORE1/Building+Inventory+Datatype+Schema

Variables to include
- (added by IN-CORE guid)
- strctid
- archtype
- parid
- struct_typ
- year_built
- no_stories
- a_stories
- b_stories
- bsmt_type
- sq_foot
- gsq_foot
- occ_type
- occ_detail
- major_occ
- broad_occ
- appr_bldg
- repl_cst
- str_cst
- nstra_cst
- nstrd_cst
- dgn_lvl
- cont_val
- efacility
- dwell_unit
- str_typ2
- occ_typ2
- appr_land
- appr_tot
- types
- failure
- fun
- latitude
- longitude
- geometry in EPSG 4326


In [None]:
building_inventory = bldg_unique_gdf[['strctid','Lon','Lat']].copy()

In [None]:
building_inventory.head()

Unnamed: 0,strctid,Lon,Lat
0,XREF0121-0001-0004-000,-95.08369,29.11267
1,XREF0121-0003-0000-000,-95.0914,29.11505
2,XREF0121-0022-0000-000,-95.07529,29.11732
3,XREF0121-0023-0000-000,-95.0755,29.11719
4,XREF0121-0024-0000-000,-95.06448,29.12661


In [None]:
# create list of all required ergo:buildingInventoryVer6 columns
incore_columns = ['archtype', 
                  'parid', 
                  'struct_typ', 
                  'year_built', 
                  'no_stories', 
                  'a_stories', 
                  'b_stories', 
                  'bsmt_type', 
                  'sq_foot', 
                  'gsq_foot', 
                  'occ_type', 
                  'occ_detail', 
                  'major_occ', 
                  'broad_occ', 
                  'appr_bldg', 
                  'repl_cst', 
                  'str_cst', 
                  'nstra_cst', 
                  'nstrd_cst', 
                  'dgn_lvl', 
                  'cont_val', 
                  'efacility', 
                  'dwell_unit', 
                  'str_typ2', 
                  'occ_typ2', 
                  'appr_land', 
                  'appr_tot', 
                  'types', 
                  'failure', 
                  'fun']

for column in incore_columns :
    building_inventory[column] = np.nan

In [None]:
building_inventory.head()

Unnamed: 0,strctid,Lon,Lat,archtype,parid,struct_typ,year_built,no_stories,a_stories,b_stories,...,cont_val,efacility,dwell_unit,str_typ2,occ_typ2,appr_land,appr_tot,types,failure,fun
0,XREF0121-0001-0004-000,-95.08369,29.11267,,,,,,,,...,,,,,,,,,,
1,XREF0121-0003-0000-000,-95.0914,29.11505,,,,,,,,...,,,,,,,,,,
2,XREF0121-0022-0000-000,-95.07529,29.11732,,,,,,,,...,,,,,,,,,,
3,XREF0121-0023-0000-000,-95.0755,29.11719,,,,,,,,...,,,,,,,,,,
4,XREF0121-0024-0000-000,-95.06448,29.12661,,,,,,,,...,,,,,,,,,,


## Assume all buildngs (18,962) are wood frame

Tori Tomiczek et al (2014) and Fereshtehnejad et al (2020) assume all buildings in the inventory are Wood Frame.

In [None]:
building_inventory['struct_typ'] = 'W1'
building_inventory.head()

Unnamed: 0,strctid,Lon,Lat,archtype,parid,struct_typ,year_built,no_stories,a_stories,b_stories,...,cont_val,efacility,dwell_unit,str_typ2,occ_typ2,appr_land,appr_tot,types,failure,fun
0,XREF0121-0001-0004-000,-95.08369,29.11267,,,W1,,,,,...,,,,,,,,,,
1,XREF0121-0003-0000-000,-95.0914,29.11505,,,W1,,,,,...,,,,,,,,,,
2,XREF0121-0022-0000-000,-95.07529,29.11732,,,W1,,,,,...,,,,,,,,,,
3,XREF0121-0023-0000-000,-95.0755,29.11719,,,W1,,,,,...,,,,,,,,,,
4,XREF0121-0024-0000-000,-95.06448,29.12661,,,W1,,,,,...,,,,,,,,,,


## Save 2 Files to Share with NCSA

File 1: Building Inventory Shape File with Required Columns

File 2: Additional building data required for Fereshtehnejad et al 2020 fragility functions

### Convert Dataframe to Geodataframe
Need to create a geodataframe that can be saved as a shapefile

Lon and Lat need to be floats

In [None]:
building_inventory.Lon.describe()

count         18962
unique        10742
top       -94.78381
freq             10
Name: Lon, dtype: object

In [None]:
building_inventory.Lat.describe()

count        18962
unique        9069
top       29.29523
freq            13
Name: Lat, dtype: object

In [None]:
building_inventory.Lon = building_inventory.Lon.astype(float)
building_inventory.Lon.describe()

count    18962.000000
mean       -94.862251
std          0.088154
min        -95.109040
25%        -94.928905
50%        -94.823750
75%        -94.800640
max        -94.741170
Name: Lon, dtype: float64

In [None]:
building_inventory.Lat = building_inventory.Lat.astype(float)
building_inventory.Lat.describe()

count    18962.000000
mean        29.256756
std          0.053967
min         29.097150
25%         29.218140
50%         29.281450
75%         29.292820
max         29.326070
Name: Lat, dtype: float64

In [None]:
gdf = gpd.GeoDataFrame(
    building_inventory, geometry=gpd.points_from_xy(building_inventory.Lon, building_inventory.Lat),
    crs="EPSG:4269")

### Check the Coordinate Reference System

In [None]:
type(gdf.crs)

pyproj.crs.crs.CRS

In [None]:
gdf.crs

<Geographic 2D CRS: EPSG:4269>
Name: NAD83
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: North America - onshore and offshore: Canada - Alberta; British Columbia; Manitoba; New Brunswick; Newfoundland and Labrador; Northwest Territories; Nova Scotia; Nunavut; Ontario; Prince Edward Island; Quebec; Saskatchewan; Yukon. Puerto Rico. United States (USA) - Alabama; Alaska; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Hawaii; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming. US Virgin Islands.  British Virgin Island

In [None]:
gdf.head()

Unnamed: 0,strctid,Lon,Lat,archtype,parid,struct_typ,year_built,no_stories,a_stories,b_stories,...,efacility,dwell_unit,str_typ2,occ_typ2,appr_land,appr_tot,types,failure,fun,geometry
0,XREF0121-0001-0004-000,-95.08369,29.11267,,,W1,,,,,...,,,,,,,,,,POINT (-95.08369 29.11267)
1,XREF0121-0003-0000-000,-95.0914,29.11505,,,W1,,,,,...,,,,,,,,,,POINT (-95.09140 29.11505)
2,XREF0121-0022-0000-000,-95.07529,29.11732,,,W1,,,,,...,,,,,,,,,,POINT (-95.07529 29.11732)
3,XREF0121-0023-0000-000,-95.0755,29.11719,,,W1,,,,,...,,,,,,,,,,POINT (-95.07550 29.11719)
4,XREF0121-0024-0000-000,-95.06448,29.12661,,,W1,,,,,...,,,,,,,,,,POINT (-95.06448 29.12661)


### Save as Shapefile

In [None]:
gdf.to_file("NCSAFiles/IN-CORE_Galveston_BuildingInventory_2020-11-30.shp")

## Save additional Columns as CSV

In [None]:
bldg_invtry_updated_gdf.head()

Unnamed: 0,Lon,Lat,xref,age_group,LHSM,G_elev,geometry,strctid
0,-95.08369,29.11267,0121-0001-0004-000,4,14,2.411895,,XREF0121-0001-0004-000
1,-95.0914,29.11505,0121-0003-0000-000,4,16,1.903827,,XREF0121-0003-0000-000
2,-95.07529,29.11732,0121-0022-0000-000,4,16,1.251554,,XREF0121-0022-0000-000
3,-95.0755,29.11719,0121-0023-0000-000,4,16,1.251554,,XREF0121-0023-0000-000
4,-95.06448,29.12661,0121-0024-0000-000,2,13,2.178023,,XREF0121-0024-0000-000


In [None]:
bldg_invtry_updated_gdf[['strctid','age_group','LHSM','G_elev']].to_csv("NCSAFiles/IN-CORE_Galveston_BuildingInventory_part2_2020-11-30.csv")