# Walkable Accessibility Score (WAS)

### Date: September 4, 2024

### Compute a Walkable Accessibility Score (WAS) at the block group scale using InfoUSA POI data

This notebook creates a Walkable Accessibility Score (WAS) computing the distance between businesses (points) and the centroids of block groups (points). The goal is to show through an example how to compute an access metric and to make it accessible enough for practitioners and scholars to use for their own purpose. Thus, businesses could be easily changed with other data of interest, such as schools, parks, or any other data. Also, the polygons (in this case, block groups), can be interchanged with other geographies, such as tracts, blocks or a similar type of geography that you might be interested in.

In this example, we use business data from INFO USA and the geometries of the block groups from [IPUMS NHGIS](https://data2.nhgis.org/).

### 1. Load libraries needed

In [1]:
# Load libraries
from sklearn.neighbors import BallTree
import numpy as np
import pandas as pd
import geopandas as gpd

### 2. Load data

Load data that contain latitude and longitude as columns of the table. These could be points or centroids of polygons.

In [2]:
pwd

'/Users/irenefarah/Documents/GitHub/Walkable-Accessibility-Score/src'

In [3]:
# Load 2019 InfoUSA data - other data can be used
# Takes ~2 min to run
df = pd.read_csv('../data/1997_Business_Academic_QCQ.txt', sep=",", encoding='latin-1')

#Similarly, if you have a csv, you could read it as:
df.head(10)

  df = pd.read_csv('../data/1997_Business_Academic_QCQ.txt', sep=",", encoding='latin-1')


Unnamed: 0,Company,Address Line 1,City,State,ZipCode,Zip4,County Code,Area Code,IDCode,Location Employee Size Code,...,Population Code,Census Tract,Census Block,Latitude,Longitude,Match Code,CBSA Code,CBSA Level,CSA Code,FIPS Code
0,BOB'S AUTO REPAIR,1688 MAIN ST,AGAWAM,MA,1001,2577.0,13.0,413,,A,...,1.0,813205.0,4.0,42.03614,-72.61752,P,44140.0,2.0,521.0,25013
1,RIVER STREET AUTO CLINIC INC,27 RIVER,AGAWAM,MA,1001,,13.0,413,,A,...,1.0,813207.0,5.0,42.09897,-72.63442,P,44140.0,2.0,521.0,25013
2,ALWAYS BLOOMING BALLOONS,3 PLANTATION DR,AGAWAM,MA,1001,3231.0,13.0,413,,A,...,1.0,813203.0,4.0,42.07347,-72.60428,P,44140.0,2.0,521.0,25013
3,VICTOR'S HAIRSTYLING,332 WALNUT STREET EXT,AGAWAM,MA,1001,1524.0,13.0,413,,A,...,1.0,813207.0,4.0,42.088669,-72.629398,P,44140.0,2.0,521.0,25013
4,AXLER'S BICYCLE CORNER,313 SPRINGFIELD ST,AGAWAM,MA,1001,1511.0,13.0,413,,A,...,1.0,813207.0,4.0,42.08732,-72.64032,4,44140.0,2.0,521.0,25013
5,RACK N CUE PRO SHOP,80 RAMAH CIR,AGAWAM,MA,1001,,13.0,413,,A,...,1.0,813207.0,4.0,42.08493,-72.63194,P,44140.0,2.0,521.0,25013
6,,1744 MAIN ST,AGAWAM,MA,1001,2513.0,13.0,413,,A,...,1.0,813205.0,4.0,42.035431,-72.617565,P,44140.0,2.0,521.0,25013
7,MC GUIRE PECK & CO,630 SILVER ST,AGAWAM,MA,1001,2987.0,13.0,413,,A,...,1.0,813205.0,5.0,42.0557,-72.65081,4,44140.0,2.0,521.0,25013
8,AFFORDABLE WEDDING & ANNVRSRY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,1.0,813207.0,4.0,42.089474,-72.63168,0,44140.0,2.0,521.0,25013
9,AGAWAM ADVERTISING AGENCY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,1.0,813207.0,4.0,42.089474,-72.63168,0,44140.0,2.0,521.0,25013


### 3. Know your data!

#### Check how large is your data and what information it contains.

In [4]:
"Your data contains " + str(len(df)) + " rows."

'Your data contains 11263921 rows.'

The table contains the followning information:

In [5]:
sorted(list(df.columns.values.tolist()))

['ABI',
 'Address Line 1',
 'Address Type Indicator',
 'Archive Version Year',
 'Area Code',
 'Business Status Code',
 'CBSA Code',
 'CBSA Level',
 'CSA Code',
 'Census Block',
 'Census Tract',
 'City',
 'Company',
 'Company Holding Status',
 'County Code',
 'Employee Size (5) - Location',
 'FIPS Code',
 'IDCode',
 'Industry Specific First Byte',
 'Latitude',
 'Location Employee Size Code',
 'Location Sales Volume Code',
 'Longitude',
 'Match Code',
 'NAICS8 Descriptions',
 'Office Size Code',
 'Parent Actual Employee Size',
 'Parent Actual Sales Volume',
 'Parent Employee Size Code',
 'Parent Number',
 'Parent Sales Volume Code',
 'Population Code',
 'Primary NAICS Code',
 'Primary SIC Code',
 'SIC Code',
 'SIC Code 1',
 'SIC Code 2',
 'SIC Code 3',
 'SIC Code 4',
 'SIC6_Descriptions',
 'SIC6_Descriptions (SIC)',
 'SIC6_Descriptions (SIC1)',
 'SIC6_Descriptions(SIC2)',
 'SIC6_Descriptions(SIC3)',
 'SIC6_Descriptions(SIC4)',
 'Sales Volume (9) - Location',
 'Site Number',
 'State',
 'S

### 4. Clean data of interest

#### 4.1. Filter data

In [6]:
# Amenities: groceries, restaurants, coffee shops, banks, parks, schools, bookstores, entertainment, and general shopping establishments 
#schools (https://nces.ed.gov/programs/edge/geographic/schoollocations) and parks (centroids - https://www.arcgis.com/home/item.html?id=f092c20803a047cba81fbf1e30eff0b5)

#Convert the column to string
df['Primary NAICS Code'].astype(str)

#Created new categories of NAICS codes so it was easier to filter the categories of interest.
df['NAICS'] = df['Primary NAICS Code'].astype(str)
df['NAICS2'] = df.NAICS.str[:2]
df['NAICS4'] = df.NAICS.str[:4]
df['NAICS6'] = df.NAICS.str[:6]
df.NAICS4.value_counts()

NAICS4
6211    538447
5411    516143
7225    449267
8131    340021
8121    327799
         ...  
9271        29
1131        23
1132        17
1122        15
1124         5
Name: count, Length: 312, dtype: int64

In [7]:
# Filter by specific amenity NAICS codes

filtered = df.loc[(df['NAICS2'] == '72') | (df['NAICS4'] == '4421') | (df['NAICS4'] == '4431') | (df['NAICS4'] == '4451') | 
                (df['NAICS4'] == '4461') | (df['NAICS4'] == '4481') | (df['NAICS4'] == '4482') | (df['NAICS4'] == '4483') |
                (df['NAICS4'] == '4511') | (df['NAICS4'] == '4531') | (df['NAICS4'] == '4532') | (df['NAICS4'] == '4539') |
                (df['NAICS4'] == '4453') | (df['NAICS4'] == '4523') | (df['NAICS4'] == '5221') | (df['NAICS6'] == '311811') |
                (df['NAICS6'] == '451211')]

# Remove Puerto Rico, Alaska, Hawaii, and US Virgin Islands because we will be measuring distances and islands will affect our analysis
filtered = filtered[(filtered['State'] != 'PR') & (filtered['State'] != 'AK') & (filtered['State'] != 'HI') & (filtered['State'] != 'VI')]

#### Check your data...How large is your filtered data and how does it look?

In [8]:
"Your filtered data contains " + str(len(filtered)) + " rows."

'Your filtered data contains 1812062 rows.'

In [9]:
filtered.head(3)

Unnamed: 0,Company,Address Line 1,City,State,ZipCode,Zip4,County Code,Area Code,IDCode,Location Employee Size Code,...,Longitude,Match Code,CBSA Code,CBSA Level,CSA Code,FIPS Code,NAICS,NAICS2,NAICS4,NAICS6
2,ALWAYS BLOOMING BALLOONS,3 PLANTATION DR,AGAWAM,MA,1001,3231.0,13.0,413,,A,...,-72.60428,P,44140.0,2.0,521.0,25013,45322002.0,45,4532,453220
4,AXLER'S BICYCLE CORNER,313 SPRINGFIELD ST,AGAWAM,MA,1001,1511.0,13.0,413,,A,...,-72.64032,4,44140.0,2.0,521.0,25013,45111006.0,45,4511,451110
8,AFFORDABLE WEDDING & ANNVRSRY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,-72.63168,0,44140.0,2.0,521.0,25013,45399870.0,45,4539,453998


In [10]:
# Making sure that the latitude and longitude include all decimal points. # Is this right?
filtered = filtered[filtered.Longitude != '-000.000-76']
filtered = filtered[filtered.Latitude != '-000.000-76']

In [11]:
"Your filtered data contains " + str(len(filtered)) + " rows."

'Your filtered data contains 1812062 rows.'

#### 4.2 Bring in the spatial!

In [12]:
# Create a geodataframe from coordinates (latitude and longitude)
gdf = gpd.GeoDataFrame(
    filtered,
    geometry=gpd.points_from_xy(filtered.Longitude, filtered.Latitude),
    crs='epsg:4326') # epsg specifies the projection

In [13]:
# Note that a geometry column is added at the end of the table
gdf.head(3)

Unnamed: 0,Company,Address Line 1,City,State,ZipCode,Zip4,County Code,Area Code,IDCode,Location Employee Size Code,...,Match Code,CBSA Code,CBSA Level,CSA Code,FIPS Code,NAICS,NAICS2,NAICS4,NAICS6,geometry
2,ALWAYS BLOOMING BALLOONS,3 PLANTATION DR,AGAWAM,MA,1001,3231.0,13.0,413,,A,...,P,44140.0,2.0,521.0,25013,45322002.0,45,4532,453220,POINT (-72.60428 42.07347)
4,AXLER'S BICYCLE CORNER,313 SPRINGFIELD ST,AGAWAM,MA,1001,1511.0,13.0,413,,A,...,4,44140.0,2.0,521.0,25013,45111006.0,45,4511,451110,POINT (-72.64032 42.08732)
8,AFFORDABLE WEDDING & ANNVRSRY,65 SPRINGFIELD ST,AGAWAM,MA,1001,1505.0,13.0,413,,A,...,0,44140.0,2.0,521.0,25013,45399870.0,45,4539,453998,POINT (-72.63168 42.08947)


In [14]:
# Change the Coordinate Reference System (CRS)
# Check for different projections here: https://epsg.io/
gdf = gdf.to_crs('esri:102003')

In [15]:
# Check that the CRS actually changed
gdf.crs

<Projected CRS: ESRI:102003>
Name: USA_Contiguous_Albers_Equal_Area_Conic
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: United States (USA) - CONUS onshore - Alabama; Arizona; Arkansas; California; Colorado; Connecticut; Delaware; Florida; Georgia; Idaho; Illinois; Indiana; Iowa; Kansas; Kentucky; Louisiana; Maine; Maryland; Massachusetts; Michigan; Minnesota; Mississippi; Missouri; Montana; Nebraska; Nevada; New Hampshire; New Jersey; New Mexico; New York; North Carolina; North Dakota; Ohio; Oklahoma; Oregon; Pennsylvania; Rhode Island; South Carolina; South Dakota; Tennessee; Texas; Utah; Vermont; Virginia; Washington; West Virginia; Wisconsin; Wyoming.
- bounds: (-124.79, 24.41, -66.91, 49.38)
Coordinate Operation:
- name: USA_Contiguous_Albers_Equal_Area_Conic
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

In [16]:
# Make sure that the geometry for each row has a value
gdf = gdf[~gdf.is_empty]

In [17]:
"The data contains " + str(len(gdf)) + " rows."

'The data contains 1811016 rows.'

#### 4.3 Add more data: schools and parks

In [18]:
# Add 2011 GreatSchools school data (can use other sources)
sch = gpd.read_file('../data/GreatSchools_2011_us48/GreatSchools_2011_us48.shp') 
sch = sch.to_crs('esri:102003')
#2021 ESRI parks data (centroids)
prk = gpd.read_file('../data/Centroids_for_USA_Parks_2021_Buffer2/Centroids_for_USA_Parks_2021_Buffer2.shp') 
prk = prk.to_crs('esri:102003')

In [19]:
lst=[gdf,sch,prk]
am=pd.concat(lst, ignore_index=True, axis=0)
am["ID"] = am.index

In [20]:
#Change this later (Irene)
am_id = gdf[['geometry']]
am_id

Unnamed: 0,geometry
2,POINT (1903292.607 747815.533)
4,POINT (1900043.044 748596.863)
8,POINT (1900672.943 749003.317)
16,POINT (1900885.474 748973.976)
26,POINT (1902633.436 746023.805)
...,...
11226983,POINT (-1609892.81 1175691.993)
11226994,POINT (-1608627.525 1175461.963)
11226995,POINT (-1612397.437 1173531.478)
11226996,POINT (-1609259.248 1175625.435)


### 5. Load the geography!

#### 5.1. In this case, we upload block groups

In [21]:
# Load geography (oftentimes as shapefile).
# Block group file we're using in this case - one spatial definition of demand units for all time periods
s_v = gpd.read_file('../data/nhgis0022_shape/nhgis0022_shapefile_tl2015_us_blck_grp_2015/US_blck_grp_2015.shp') # Load geography (oftentimes as shapefile).


In [22]:
# Check the data
s_v.head(2)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,GEOID,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,GISJOIN,Shape_Leng,Shape_Area,geometry
0,6,1,400100,1,60014001001,Block Group 1,G5030,S,6894340.0,0.0,37.8676275,-122.231946,G06000104001001,14302.720874,6894336.0,"POLYGON ((-2255602.272 353149.335, -2255597.39..."
1,6,1,400200,1,60014002001,Block Group 1,G5030,S,288960.0,0.0,37.8497418,-122.2488605,G06000104002001,2970.286365,288961.4,"POLYGON ((-2258184.246 353217.527, -2258186.81..."


In [23]:
#Size of the dataset
len(s_v)

219768

In [24]:
# Change the Coordinate Refernce System
s_v = s_v.set_crs('esri:102003', allow_override=True) # Set the Coordinate Reference System
s_v.rename(columns={'GEOID': 'ID'}, inplace=True) # Rename the columns for convenience

In [25]:
# Extract the centroids of the polygons.
# Replace the column "geometry" with the centroids of geography.
# This will change the geometry from "polygon" to "point" geometry.
s_v['geometry'] = s_v.centroid

In [26]:
# Check that the geometry is indeed in point form
s_v[['geometry']].head(3)

Unnamed: 0,geometry
0,POINT (-2256868.242 354675.748)
1,POINT (-2258832.974 353148.92)
2,POINT (-2259050.925 352843.123)


#### 5.2 Create subsets of data to *avoid* computing irrelevant distances.

In this case, we create a subset of continental US Block Groups to avoid estimating distances between a business in California and a block group in New York.

In [27]:
# We split s_v into smaller datasets to make the processing more efficient.
num_splits = 5
chunk_size = len(s_v) // num_splits

# Create the smaller dataframes
sv = []

for i in range(num_splits):
    start_idx = i * chunk_size
    if i == num_splits - 1:
        # Ensure the last dataframe includes any remaining rows
        end_idx = len(s_v)
    else:
        end_idx = (i + 1) * chunk_size
    
    chunk = s_v.iloc[start_idx:end_idx].reset_index(drop=True)
    sv.append(chunk)

# Access the smaller dataframes as sv[0], sv[1], ..., sv[4]

In [28]:
sv[0].head(3)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,ID,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,GISJOIN,Shape_Leng,Shape_Area,geometry
0,6,1,400100,1,60014001001,Block Group 1,G5030,S,6894340.0,0.0,37.8676275,-122.231946,G06000104001001,14302.720874,6894336.0,POINT (-2256868.242 354675.748)
1,6,1,400200,1,60014002001,Block Group 1,G5030,S,288960.0,0.0,37.8497418,-122.2488605,G06000104002001,2970.286365,288961.4,POINT (-2258832.974 353148.92)
2,6,1,400200,2,60014002002,Block Group 2,G5030,S,298490.0,0.0,37.8465865,-122.2503095,G06000104002002,3162.343955,298488.7,POINT (-2259050.925 352843.123)


### 6. We have the data ready, let's create the access score!

#### 6.1. Find number of nearest k POI points to each block group

In [29]:
# This cell is creating a function for eastimating nearest neighbors from point to point.
def get_nearest_neighbors(gdf1, gdf2, k_neighbors=2):
    '''Find k nearest neighbors for all source points from a set of candidate points
    modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html    
    Parameters
    ----------
    gdf1 : geopandas.DataFrame
    Geometries to search from.
    gdf2 : geopandas.DataFrame
    Geoemtries to be searched.
    k_neighbors : int, optional
    Number of nearest neighbors. The default is 2.
    Returns
    -------
    gdf_final : geopandas.DataFrame
    gdf1 with distance, index and all other columns from gdf2.'''

    src_points = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
    candidates =  [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]

    # Create tree from the candidate points
    tree = BallTree(candidates, leaf_size=15, metric='euclidean')

    # Find closest points and distances
    distances, indices = tree.query(src_points, k=k_neighbors)

    # Transpose to get distances and indices into arrays
    distances = distances.transpose()
    indices = indices.transpose()

    closest_gdfs = []
    for k in np.arange(k_neighbors):
        gdf_new = gdf2.iloc[indices[k]].reset_index()
        gdf_new['distance'] =  distances[k]
        gdf_new = gdf_new.add_suffix(f'_{k+1}')
        closest_gdfs.append(gdf_new)
    
    closest_gdfs.insert(0,gdf1)    
    gdf_final = pd.concat(closest_gdfs,axis=1)

    return gdf_final

In [30]:
# Find the closest k amenities for each block group and get also the distance based on Euclidean distance
# Whole US subsets
closest_am0 = get_nearest_neighbors(sv[0], am_id, k_neighbors=150)
closest_am1 = get_nearest_neighbors(sv[1], am_id, k_neighbors=150)
closest_am2 = get_nearest_neighbors(sv[2], am_id, k_neighbors=150)
closest_am3 = get_nearest_neighbors(sv[3], am_id, k_neighbors=150)
closest_am4 = get_nearest_neighbors(sv[4], am_id, k_neighbors=150)

In [31]:
# Take a look at one table of the results:
closest_am0.head(3)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,ID,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,...,distance_147,index_148,geometry_148,distance_148,index_149,geometry_149,distance_149,index_150,geometry_150,distance_150
0,6,1,400100,1,60014001001,Block Group 1,G5030,S,6894340.0,0.0,...,2314.808669,10488966,POINT (-2259066.974 355401.818),2315.513241,10488668,POINT (-2259085.291 355344.569),2315.734962,10488567,POINT (-2259166.639 354959.307),2315.822878
1,6,1,400200,1,60014002001,Block Group 1,G5030,S,288960.0,0.0,...,1004.184118,10489635,POINT (-2258909.537 354150.844),1004.845322,10489734,POINT (-2258909.537 354150.844),1004.845322,10489803,POINT (-2258909.331 354151.252),1005.236012
2,6,1,400200,2,60014002002,Block Group 2,G5030,S,298490.0,0.0,...,995.224112,10476608,POINT (-2259814.242 353506.137),1011.059167,10476476,POINT (-2259696.273 353634.617),1021.242829,10484252,POINT (-2259374.485 351872.226),1023.392104


In [32]:
def clean_dataframe(df):
    # Create the ID2 column
    df["ID2"] = df.index

    # Reshape the dataframe from wide to long format using the provided suffix
    long_df = pd.wide_to_long(df, stubnames=["distance_", "index_", "geometry_"], i="ID2", j="neighbor")

    # Rename columns
    long_df.loc[:, 'origin'] = long_df['ID']
    long_df.loc[:, 'dest'] = long_df['index_']
    long_df.loc[:, 'euclidean'] = long_df['distance_']

    # Reset index and keep necessary columns
    long_df = long_df.reset_index(level="neighbor")
    cost_df = long_df[['euclidean', 'origin', 'dest', 'neighbor']]

    # Sort the dataframe by origin and euclidean distance
    cost_df.sort_values(by=['origin', 'euclidean'], inplace=True)

    return cost_df


In [33]:
# Run function for each smaller dataframe
cost0 = clean_dataframe(closest_am0)
cost1 = clean_dataframe(closest_am1)
cost2 = clean_dataframe(closest_am2)
cost3 = clean_dataframe(closest_am3)
cost4 = clean_dataframe(closest_am4)

  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cost_df.sort_values(by=['origin', 'euclidean'], inplace=True)
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cost_df.sort_values(by=['origin', 'euclidean'], inplace=True)
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cost_df.sort_values(by=['origin', 'euclidean'], inplace=True)
  super().__setitem__(key, value)
A value is trying to be set on a copy of a slice from 

#### 6.2. Calculate accessibility measure

In [34]:
def access_measure(df_cost, df_sv, upper, decay):
    # Calculate time from euclidean distance
    # https://journals-sagepub-com.may.idm.oclc.org/doi/10.1177/0265813516641685
    df_cost['time'] = (df_cost['euclidean'] * 3600) / 5000  # convert distance into time (rate of 5kph)
    
    # Calculate LogitT_5 using the provided formula
    df_cost['LogitT_5'] = 1 - (1 / (np.exp((upper / 180) - decay * df_cost['time']) + 1))
    
    # Sum weighted distances by tract (origin) ID
    cost_sum = df_cost.groupby("origin").sum()
    cost_sum['ID'] = cost_sum.index
    
    # Merge with the corresponding smaller sv original dataframe
    cost_merge = df_sv.merge(cost_sum, how='inner', on='ID')
    
    return cost_merge

In [35]:
# choose 'upper' parameter (for testing)
# upper = 800
# upper = 1600
# upper = 2400

# choose decay rate
# decay = .005
# decay = .008
# decay = .01

result0 = access_measure(cost0, sv[0], upper=800, decay=.005)
result1 = access_measure(cost1, sv[1], upper=800, decay=.005)
result2 = access_measure(cost2, sv[2], upper=800, decay=.005)
result3 = access_measure(cost3, sv[3], upper=800, decay=.005)
result4 = access_measure(cost4, sv[4], upper=800, decay=.005)

In [36]:
result0.head(3)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,ID,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,...,INTPTLON,GISJOIN,Shape_Leng,Shape_Area,geometry,euclidean,dest,neighbor,time,LogitT_5
0,6,1,400100,1,60014001001,Block Group 1,G5030,S,6894340.0,0.0,...,-122.231946,G06000104001001,14302.720874,6894336.0,POINT (-2256868.242 354675.748),314099.971161,1573450298,11325,226151.979236,10.921857
1,6,1,400200,1,60014002001,Block Group 1,G5030,S,288960.0,0.0,...,-122.2488605,G06000104002001,2970.286365,288961.4,POINT (-2258832.974 353148.92),100801.29457,1572955376,11325,72576.93209,127.793726
2,6,1,400200,2,60014002002,Block Group 2,G5030,S,298490.0,0.0,...,-122.2503095,G06000104002002,3162.343955,298488.7,POINT (-2259050.925 352843.123),82252.722098,1572574466,11325,59221.959911,133.310892


In [37]:
dataframes = [result0, result1, result2, result3, result4]
combined_df = pd.concat(dataframes, ignore_index=True)

In [38]:
len(combined_df)

219768

In [39]:
combined_df.head(3)

Unnamed: 0,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,ID,NAMELSAD,MTFCC,FUNCSTAT,ALAND,AWATER,...,INTPTLON,GISJOIN,Shape_Leng,Shape_Area,geometry,euclidean,dest,neighbor,time,LogitT_5
0,6,1,400100,1,60014001001,Block Group 1,G5030,S,6894340.0,0.0,...,-122.231946,G06000104001001,14302.720874,6894336.0,POINT (-2256868.242 354675.748),314099.971161,1573450298,11325,226151.979236,10.921857
1,6,1,400200,1,60014002001,Block Group 1,G5030,S,288960.0,0.0,...,-122.2488605,G06000104002001,2970.286365,288961.4,POINT (-2258832.974 353148.92),100801.29457,1572955376,11325,72576.93209,127.793726
2,6,1,400200,2,60014002002,Block Group 2,G5030,S,298490.0,0.0,...,-122.2503095,G06000104002002,3162.343955,298488.7,POINT (-2259050.925 352843.123),82252.722098,1572574466,11325,59221.959911,133.310892
