# Walkable Accessibility Score (WAS)

### Date: July, 25, 2024

### Compute a Walkable Accessibility Score (WAS) at the block group scale using InfoUSA POI data

This notebook creates a Walkable Accessibility Score (WAS) computing the distance between businesses (points) and the centroids of block groups (points). The goal is to show through an example how to compute an access metric and to make it accessible enough for practitioners and scholars to use for their own purpose. Thus, businesses could be easily changed with other data of interest, such as schools, parks, or any other data. Also, the polygons (in this case, block groups), can be interchanged with other geographies, such as tracts, blocks or a similar type of geography that you might be interested in.

In this example, we use business data from INFO USA and the geometries of the block groups from [IPUMS NHGIS](https://data2.nhgis.org/).

### 1. Load libraries needed

In [1]:
# Load libraries
from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd
import geopandas as gpd

### 2. Load data

Load data that contain latitude and longitude as columns of the table. These could be points or centroids of polygons.

In [2]:
# Load 2019 InfoUSA data - other data can be used
# Takes ~2 min to run
df = pd.read_csv('../../../Downloads/1997_Business_Academic_QCQ.txt', sep=",", encoding='latin-1')

#Similarly, if you have a csv, you could read it as:
# 
df.head(10)

  df = pd.read_csv('../../../Downloads/1997_Business_Academic_QCQ.txt', sep=",", encoding='latin-1')


Unnamed: 0,Company,Address Line 1,City,State,ZipCode,Zip4,County Code,Area Code,IDCode,Location Employee Size Code,...,Population Code,Census Tract,Census Block,Latitude,Longitude,Match Code,CBSA Code,CBSA Level,CSA Code,FIPS Code
0,BOB'S AUTO REPAIR,1688 MAIN ST,AGAWAM,MA,1001,2577.0,13.0,413,,A,...,1.0,813205.0,4.0,42.03614,-72.61752,P,44140.0,2.0,521.0,25013
1,RIVER STREET AUTO CLINIC INC,27 RIVER,AGAWAM,MA,1001,,13.0,413,,A,...,1.0,813207.0,5.0,42.09897,-72.63442,P,44140.0,2.0,521.0,25013
2,ALWAYS BLOOMING BALLOONS,3 PLANTATION DR,AGAWAM,MA,1001,3231.0,13.0,413,,A,...,1.0,813203.0,4.0,42.07347,-72.60428,P,44140.0,2.0,521.0,25013
3,VICTOR'S HAIRSTYLING,332 WALNUT STREET EXT,AGAWAM,MA,1001,1524.0,13.0,413,,A,...,1.0,813207.0,4.0,42.088669,-72.629398,P,44140.0,2.0,521.0,25013
4,AXLER'S BICYCLE CORNER,313 SPRINGFIELD ST,AGAWAM,MA,1001,1511.0,13.0,413,,A,...,1.0,813207.0,4.0,42.08732,-72.64032,4,44140.0,2.0,521.0,25013
5,RACK N CUE PRO SHOP,80 RAMAH CIR,AGAWAM,MA,1001,,13.0,413,,A,...,1.0,813207.0,4.0,42.08493,-72.63194,P,44140.0,2.0,521.0,25013
6,,1744 MAIN ST,AGAWAM,MA,1001,2513.0,13.0,413,,A,...,1.0,813205.0,4.0,42.035431,-72.617565,P,44140.0,2.0,521.0,25013
7,MC GUIRE PECK & CO,630 SILVER ST,AGAWAM,MA,1001,2987.0,13.0,413,,A,...,1.0,813205.0,5.0,42.0557,-72.65081,4,44140.0,2.0,521.0,25013
8,AFFORDABLE WEDDING & ANNVRSRY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,1.0,813207.0,4.0,42.089474,-72.63168,0,44140.0,2.0,521.0,25013
9,AGAWAM ADVERTISING AGENCY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,1.0,813207.0,4.0,42.089474,-72.63168,0,44140.0,2.0,521.0,25013


### 3. Know your data!

#### Check how large is your data and what information it contains.

In [3]:
"Your data contains " + str(len(df)) + " rows."

'Your data contains 11263921 rows.'

The table contains the followning information:

In [4]:
sorted(list(df.columns.values.tolist()))

['ABI',
 'Address Line 1',
 'Address Type Indicator',
 'Archive Version Year',
 'Area Code',
 'Business Status Code',
 'CBSA Code',
 'CBSA Level',
 'CSA Code',
 'Census Block',
 'Census Tract',
 'City',
 'Company',
 'Company Holding Status',
 'County Code',
 'Employee Size (5) - Location',
 'FIPS Code',
 'IDCode',
 'Industry Specific First Byte',
 'Latitude',
 'Location Employee Size Code',
 'Location Sales Volume Code',
 'Longitude',
 'Match Code',
 'NAICS8 Descriptions',
 'Office Size Code',
 'Parent Actual Employee Size',
 'Parent Actual Sales Volume',
 'Parent Employee Size Code',
 'Parent Number',
 'Parent Sales Volume Code',
 'Population Code',
 'Primary NAICS Code',
 'Primary SIC Code',
 'SIC Code',
 'SIC Code 1',
 'SIC Code 2',
 'SIC Code 3',
 'SIC Code 4',
 'SIC6_Descriptions',
 'SIC6_Descriptions (SIC)',
 'SIC6_Descriptions (SIC1)',
 'SIC6_Descriptions(SIC2)',
 'SIC6_Descriptions(SIC3)',
 'SIC6_Descriptions(SIC4)',
 'Sales Volume (9) - Location',
 'Site Number',
 'State',
 'S

### 4. Clean data of interest

#### 4.1. Filter data

In [5]:
# Amenities: groceries, restaurants, coffee shops, banks, parks, schools, bookstores, entertainment, and general shopping establishments 
#schools (https://nces.ed.gov/programs/edge/geographic/schoollocations) and parks (centroids - https://www.arcgis.com/home/item.html?id=f092c20803a047cba81fbf1e30eff0b5)

#Convert the column to string
df['Primary NAICS Code'].astype(str)

#Created new categories of NAICS codes so it was easier to filter the categories of interest.
df['NAICS'] = df['Primary NAICS Code'].astype(str)
df['NAICS2'] = df.NAICS.str[:2]
df['NAICS4'] = df.NAICS.str[:4]
df['NAICS6'] = df.NAICS.str[:6]
df.NAICS4.value_counts()

NAICS4
6211    538447
5411    516143
7225    449267
8131    340021
8121    327799
         ...  
9271        29
1131        23
1132        17
1122        15
1124         5
Name: count, Length: 312, dtype: int64

In [6]:
# Specific amenity NAICS codes

#Filter
filtered = df.loc[(df['NAICS2'] == '72') | (df['NAICS4'] == '4421') | (df['NAICS4'] == '4431') | (df['NAICS4'] == '4451') | 
                (df['NAICS4'] == '4461') | (df['NAICS4'] == '4481') | (df['NAICS4'] == '4482') | (df['NAICS4'] == '4483') |
                (df['NAICS4'] == '4511') | (df['NAICS4'] == '4531') | (df['NAICS4'] == '4532') | (df['NAICS4'] == '4539') |
                (df['NAICS4'] == '4453') | (df['NAICS4'] == '4523') | (df['NAICS4'] == '5221') | (df['NAICS6'] == '311811') |
                (df['NAICS6'] == '451211')]

# Remove Puerto Rico, Alaska, Hawaii, and US Virgin Islands because we will be measuring distances and islands will affect our analysis
filtered = filtered[(filtered['State'] != 'PR') & (filtered['State'] != 'AK') & (filtered['State'] != 'HI') & (filtered['State'] != 'VI')]

#### Check your data...How large is your filtered data and how does it look?

In [7]:
"Your filtered data contains " + str(len(filtered)) + " rows."

'Your filtered data contains 1812062 rows.'

In [8]:
filtered.head(3)

Unnamed: 0,Company,Address Line 1,City,State,ZipCode,Zip4,County Code,Area Code,IDCode,Location Employee Size Code,...,Longitude,Match Code,CBSA Code,CBSA Level,CSA Code,FIPS Code,NAICS,NAICS2,NAICS4,NAICS6
2,ALWAYS BLOOMING BALLOONS,3 PLANTATION DR,AGAWAM,MA,1001,3231.0,13.0,413,,A,...,-72.60428,P,44140.0,2.0,521.0,25013,45322002.0,45,4532,453220
4,AXLER'S BICYCLE CORNER,313 SPRINGFIELD ST,AGAWAM,MA,1001,1511.0,13.0,413,,A,...,-72.64032,4,44140.0,2.0,521.0,25013,45111006.0,45,4511,451110
8,AFFORDABLE WEDDING & ANNVRSRY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,-72.63168,0,44140.0,2.0,521.0,25013,45399870.0,45,4539,453998


In [9]:
# Making sure that the latitude and longitude include all decimal points. # Is this right?
filtered = filtered[filtered.Longitude != '-000.000-76']
filtered = filtered[filtered.Latitude != '-000.000-76']

### 4.2 Bring in the spatial!

In [10]:
# Create a geodataframe from coordinates (latitude and longitude)
gdf = gpd.GeoDataFrame(
    filtered,
    geometry=gpd.points_from_xy(filtered.Longitude, filtered.Latitude),
    crs='epsg:4326') # epsg specifies the projection

In [11]:
# Note that a geometry column is added at the end of the table
gdf.head(3)

Unnamed: 0,Company,Address Line 1,City,State,ZipCode,Zip4,County Code,Area Code,IDCode,Location Employee Size Code,...,Match Code,CBSA Code,CBSA Level,CSA Code,FIPS Code,NAICS,NAICS2,NAICS4,NAICS6,geometry
2,ALWAYS BLOOMING BALLOONS,3 PLANTATION DR,AGAWAM,MA,1001,3231.0,13.0,413,,A,...,P,44140.0,2.0,521.0,25013,45322002.0,45,4532,453220,POINT (-72.60428 42.07347)
4,AXLER'S BICYCLE CORNER,313 SPRINGFIELD ST,AGAWAM,MA,1001,1511.0,13.0,413,,A,...,4,44140.0,2.0,521.0,25013,45111006.0,45,4511,451110,POINT (-72.64032 42.08732)
8,AFFORDABLE WEDDING & ANNVRSRY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,0,44140.0,2.0,521.0,25013,45399870.0,45,4539,453998,POINT (-72.63168 42.08947)


In [12]:
# Change the Coordinate Reference System (CRS)
# Check for different projections here: https://epsg.io/
gdf = gdf.to_crs('esri:102003')

In [13]:
# Check that the CRS actually changed
gdf.crs

<Projected CRS: ESRI:102003>
Name: USA_Contiguous_Albers_Equal_Area_Conic
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: United States (USA) - CONUS onshore - Alabama; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming.
- bounds: (-124.79, 24.41, -66.91, 49.38)
Coordinate Operation:
- name: USA_Contiguous_Albers_Equal_Area_Conic
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

In [14]:
# Make sure that the geometry for each row has a value
gdf = gdf[~gdf.is_empty]

In [15]:
"The data contains " + str(len(gdf)) + " rows."

'The data contains 1811016 rows.'

#### 4.3 Add more data: schools and parks

In [None]:
# Add 2011 GreatSchools school data (can use other sources)
sch = gpd.read_file('GreatSchools_2011_us48.shp') 
sch = sch.to_crs('esri:102003')
#2021 ESRI parks data (centroids)
prk = gpd.read_file('Centroids_for_USA_Parks_2021_Buffer2.shp') 
prk = prk.to_crs('esri:102003')

In [None]:
lst=[gbis,sch,prk]
am=pd.concat(lst, ignore_index=True, axis=0)
am["ID"] = am.index

In [16]:

#Change this later (Irene)
am_id = gdf[['geometry']]
am_id

Unnamed: 0,geometry
2,POINT (1903292.607 747815.533)
4,POINT (1900043.044 748596.863)
8,POINT (1900672.943 749003.317)
16,POINT (1900885.474 748973.976)
26,POINT (1902633.436 746023.805)
...,...
11226983,POINT (-1609892.81 1175691.993)
11226994,POINT (-1608627.525 1175461.963)
11226995,POINT (-1612397.437 1173531.478)
11226996,POINT (-1609259.248 1175625.435)


### 5. Load the geography!

#### 5.1. In this case, we upload block groups

In [18]:
# Block group file we're using in this case - one spatial deifnition of demand units for all time periods
#s_v = gpd.read_file('BG_2011_2015ADI_us48.shp') # Load geography (oftentimes as shapefile).

s_v = gpd.read_file('WAS_USA/data/US_blck_grp_2015.shp') # Load geography (oftentimes as shapefile).


In [20]:
# Check the data
s_v.head(2)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,GEOID,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,GISJOIN,Shape_Leng,Shape_Area,geometry
0,6,1,400100,1,60014001001,Block Group 1,G5030,S,6894340.0,0.0,37.8676275,-122.231946,G06000104001001,14302.720874,6894336.0,"POLYGON ((-2255602.272 353149.335, -2255597.39..."
1,6,1,400200,1,60014002001,Block Group 1,G5030,S,288960.0,0.0,37.8497418,-122.2488605,G06000104002001,2970.286365,288961.4,"POLYGON ((-2258184.246 353217.527, -2258186.81..."


In [23]:
#Size of the dataset
len(s_v)

219768

In [21]:
# Change the Coordinate Refernce System
s_v = s_v.set_crs('esri:102003', allow_override=True) # Set the Coordinate Reference System
s_v.rename(columns={'GEOID': 'ID'}, inplace=True) # Rename the columns for convenience

In [28]:
# Irene -- adding the centroids because I think that's what your original file was?
s_v = s_v["geometry"].centroid
s_v

0          POINT (-2256868.242 354675.748)
1           POINT (-2258832.974 353148.92)
2          POINT (-2259050.925 352843.123)
3          POINT (-2258992.688 352523.532)
4          POINT (-2259688.712 351991.729)
                        ...               
219763    POINT (-1908234.387 1497325.964)
219764    POINT (-1898568.202 1489014.042)
219765    POINT (-1898338.775 1493691.189)
219766    POINT (-1916369.259 1477086.152)
219767    POINT (-1862164.966 1511347.082)
Length: 219768, dtype: geometry

#### 5.2 Create subsets of data to *avoid* computing irrelevant distances.

In this case, we create a subset of continental US Block Groups to avoid estimating distances between a business in California and a block group in New York.

In [29]:
# Irene, ask Kevin how he set these cutoffs?


s_v1 = s_v.iloc[0:43167]
s_v1 = s_v1.reset_index(drop=True)

s_v2 = s_v.iloc[43167:86333]
s_v2 = s_v2.reset_index(drop=True)

s_v3 = s_v.iloc[86333:129500]
s_v3 = s_v3.reset_index(drop=True)

s_v4 = s_v.iloc[129500:172666]
s_v4 = s_v4.reset_index(drop=True)

s_v5 = s_v.iloc[172666:len(s_v)] # changed last number
s_v5 = s_v5.reset_index(drop=True)

### 6. We have the data ready, let's create the access score!

#### 6.1. Find number of nearest k POI points to each block group

In [30]:
# This cell is creating a function for eastimating nearest neighbors from point to point.

def get_nearest_neighbors(gdf1, gdf2, k_neighbors=2):
    '''Find k nearest neighbors for all source points from a set of candidate points
    modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html    
    Parameters
    ----------
    gdf1 : geopandas.DataFrame
    Geometries to search from.
    gdf2 : geopandas.DataFrame
    Geoemtries to be searched.
    k_neighbors : int, optional
    Number of nearest neighbors. The default is 2.
    Returns
    -------
    gdf_final : geopandas.DataFrame
    gdf1 with distance, index and all other columns from gdf2.'''

    src_points = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
    candidates =  [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]

    # Create tree from the candidate points
    tree = BallTree(candidates, leaf_size=15, metric='euclidean')

    # Find closest points and distances
    distances, indices = tree.query(src_points, k=k_neighbors)

    # Transpose to get distances and indices into arrays
    distances = distances.transpose()
    indices = indices.transpose()

    closest_gdfs = []
    for k in np.arange(k_neighbors):
        gdf_new = gdf2.iloc[indices[k]].reset_index()
        gdf_new['distance'] =  distances[k]
        gdf_new = gdf_new.add_suffix(f'_{k+1}')
        closest_gdfs.append(gdf_new)
    
    closest_gdfs.insert(0,gdf1)    
    gdf_final = pd.concat(closest_gdfs,axis=1)

    return gdf_final

In [31]:
#find closest k amenities for each BG and get also the distance based on Euclidean distance
#whole US subsets
closest_am1 = get_nearest_neighbors(s_v1, am_id, k_neighbors=150)
closest_am2 = get_nearest_neighbors(s_v2, am_id, k_neighbors=150)
closest_am3 = get_nearest_neighbors(s_v3, am_id, k_neighbors=150)
closest_am4 = get_nearest_neighbors(s_v4, am_id, k_neighbors=150)
closest_am5 = get_nearest_neighbors(s_v5, am_id, k_neighbors=150)

In [34]:
# Take a look at one table of the results:
closest_am1.head(2)

Unnamed: 0,0,index_1,geometry_1,distance_1,index_2,geometry_2,distance_2,index_3,geometry_3,distance_3,...,index_148,geometry_148,distance_148,index_149,geometry_149,distance_149,index_150,geometry_150,distance_150,ID2
0,POINT (-2256868.242 354675.748),10494070,POINT (-2257495.198 355774.071),1264.668954,10494068,POINT (-2257495.198 355774.071),1264.668954,10494087,POINT (-2257495.198 355774.071),1264.668954,...,10488966,POINT (-2259066.974 355401.818),2315.513241,10488668,POINT (-2259085.291 355344.569),2315.734962,10488567,POINT (-2259166.639 354959.307),2315.822878,0
1,POINT (-2258832.974 353148.92),10484384,POINT (-2258799.952 353071.899),83.801807,10484276,POINT (-2259087.328 353205.434),260.556732,10484386,POINT (-2259106.787 353144.78),273.844669,...,10489635,POINT (-2258909.537 354150.844),1004.845322,10489734,POINT (-2258909.537 354150.844),1004.845322,10489803,POINT (-2258909.331 354151.252),1005.236012,1


In [32]:
#Wide to long
#Whole US subsets
closest_am1["ID2"] = closest_am1.index
closest_l1 = pd.wide_to_long(closest_am1, ["distance_","index_","geometry_"], i="ID2", j="neighbor")

  super().__setitem__(key, value)


ValueError: Cannot mask with non-boolean array containing NA / NaN values

In [35]:
closest_am2["ID2"] = closest_am2.index
closest_l2 = pd.wide_to_long(closest_am2, ["distance_","index_","geometry_"], i="ID2", j="neighbor")

  super().__setitem__(key, value)


ValueError: Cannot mask with non-boolean array containing NA / NaN values

In [None]:
closest_am3["ID2"] = closest_am3.index
closest_l3 = pd.wide_to_long(closest_am3, ["distance_","index_","geometry_"], i="ID2", j="neighbor")

In [None]:
closest_am4["ID2"] = closest_am4.index
closest_l4 = pd.wide_to_long(closest_am4, ["distance_","index_","geometry_"], i="ID2", j="neighbor")

In [None]:
closest_am5["ID2"] = closest_am5.index
closest_l5 = pd.wide_to_long(closest_am5, ["distance_","index_","geometry_"], i="ID2", j="neighbor")

In [None]:
#rename to 'eucidean', 'origin', 'dest'
#whole US subsets
closest_l1['origin'] = closest_l1['ID']
closest_l1['dest'] = closest_l1['index_']
closest_l1['euclidean'] = closest_l1['distance_']
closest_l1= closest_l1.reset_index(level=("neighbor",))
cost1 = closest_l1[['euclidean', 'origin', 'dest','neighbor']]
cost1.sort_values(by=['origin','euclidean'],inplace=True)

In [None]:
closest_l2['origin'] = closest_l2['ID']
closest_l2['dest'] = closest_l2['index_']
closest_l2['euclidean'] = closest_l2['distance_']
closest_l2= closest_l2.reset_index(level=("neighbor",))
cost2 = closest_l2[['euclidean', 'origin', 'dest','neighbor']]
cost2.sort_values(by=['origin','euclidean'],inplace=True)

In [None]:
closest_l3['origin'] = closest_l3['ID']
closest_l3['dest'] = closest_l3['index_']
closest_l3['euclidean'] = closest_l3['distance_']
closest_l3= closest_l3.reset_index(level=("neighbor",))
cost3 = closest_l3[['euclidean', 'origin', 'dest','neighbor']]
cost3.sort_values(by=['origin','euclidean'],inplace=True)

In [None]:
closest_l4['origin'] = closest_l4['ID']
closest_l4['dest'] = closest_l4['index_']
closest_l4['euclidean'] = closest_l4['distance_']
closest_l4= closest_l4.reset_index(level=("neighbor",))
cost4 = closest_l4[['euclidean', 'origin', 'dest','neighbor']]
cost4.sort_values(by=['origin','euclidean'],inplace=True)

In [None]:
closest_l5['origin'] = closest_l5['ID']
closest_l5['dest'] = closest_l5['index_']
closest_l5['euclidean'] = closest_l5['distance_']
closest_l5= closest_l5.reset_index(level=("neighbor",))
cost5 = closest_l5[['euclidean', 'origin', 'dest','neighbor']]
cost5.sort_values(by=['origin','euclidean'],inplace=True)

#### 6.2. Calculate accessibility measure

In [None]:
# https://journals-sagepub-com.may.idm.oclc.org/doi/10.1177/0265813516641685
#convert distance into time (rate of 5kph)
cost1['time'] = (cost1.euclidean*3600)/5000
cost2['time'] = (cost2.euclidean*3600)/5000
cost3['time'] = (cost3.euclidean*3600)/5000
cost4['time'] = (cost4.euclidean*3600)/5000
cost5['time'] = (cost5.euclidean*3600)/5000

# choose 'upper' parameter (for testing)
# upper = 800
# upper = 1600
# upper = 2400

# choose decay rate
# decay = .005
# decay = .008
# decay = .01

In [None]:
cost1['LogitT_5'] = 1-(1/(np.e**((upper/180)-decay*cost1.time)+1))
cost2['LogitT_5'] = 1-(1/(np.e**((upper/180)-decay*cost2.time)+1))
cost3['LogitT_5'] = 1-(1/(np.e**((upper/180)-decay*cost3.time)+1))
cost4['LogitT_5'] = 1-(1/(np.e**((upper/180)-decay*cost4.time)+1))
cost5['LogitT_5'] = 1-(1/(np.e**((upper/180)-decay*cost5.time)+1))

In [None]:
# plt.hist(cost.LogitT_5, bins=50)
# plt.hist(cost1.LogitT_5, bins=50)

In [None]:
#sum weighted distances by tract (origin) ID
cost_sum1 = cost1.groupby("origin").sum()
cost_sum1['ID'] = cost_sum1.index
cost_sum2 = cost2.groupby("origin").sum()
cost_sum2['ID'] = cost_sum2.index
cost_sum3 = cost3.groupby("origin").sum()
cost_sum3['ID'] = cost_sum3.index
cost_sum4 = cost4.groupby("origin").sum()
cost_sum4['ID'] = cost_sum4.index
cost_sum5 = cost5.groupby("origin").sum()
cost_sum5['ID'] = cost_sum5.index

In [None]:
cost_merge1 = s_v1.merge(cost_sum1, how='inner', on='ID')
cost_merge2 = s_v2.merge(cost_sum2, how='inner', on='ID')
cost_merge3 = s_v3.merge(cost_sum3, how='inner', on='ID')
cost_merge4 = s_v4.merge(cost_sum4, how='inner', on='ID')
cost_merge5 = s_v5.merge(cost_sum5, how='inner', on='ID')

In [None]:
#export for given year
# cost_merge1.to_file('us_walkability_access_score_2019_1.shp')
# cost_merge2.to_file('us_walkability_access_score_2019_2.shp')
# cost_merge3.to_file('us_walkability_access_score_2019_3.shp')
# cost_merge4.to_file('us_walkability_access_score_2019_4.shp')
# cost_merge5.to_file('us_walkability_access_score_2019_5.shp')

### 7. Add the penalty scores