In this second lab, we're going to do some analysis of spatial statistics with the data we mapped out in lab 1, while also adding some more demographic data and transforming some of the layers from lab 1.

First, let's summarize what we did in Lab 1:
1. Set up the Python environment and workspace for the project, importing necessary libraries, and defining the output geodatabase and map object.
2. Add JSON files for Library Locations and Park Locations data.
3. Convert School Locations and Youth Services Locations from XY table coordinates to points.
4. Select by attribute to narrow down the relevant Youth Services Locations using the 'factype' field from the Facilities Database.

After cleaning up some of the data from Lab 1, we'll import census tract boundaries and income information for tracts in New York City to analyze the relationship between income and school proximity to third places.

Open your project from Lab 1,and make sure that the working environment and geodatabase are properly set up:

In [9]:
import arcpy
import os
import pandas
import geopandas
arcpy.env.workspace = r"G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis" 
default_folder = "G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis\Data" # set this to whatever file structure you are using for this project, I have separate files for each lab but you can do them all in one project 
output_gdb = r"G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis\Lab2_NYTeens_Analysis.gdb" 
aprx = arcpy.mp.ArcGISProject('CURRENT')
arcpy.env.overwriteOutput = True  # allows ArcGIS to overwrite files
map = aprx.listMaps("NYTeensAnalysis")[0]
crs = map.spatialReference # ensures the default projection matches that of the current map

The current map is quite busy with all the different point layers. First, go ahead and remove the Parks layer from the map, since we won't be incorporating them in our spatial analysis (uncheck the box in the Contents pane).

Next, we'll merge the Youth Services Locations and Library Locations layers into one Third Places merged layer.

In [None]:
arcpy.management.Merge(
    inputs="YouthServicesLocations_Feature;'Public Library Locations'",
    output=r"G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis\Lab2_NYTeens_Analysis.gdb\Third_Places_Merge",
    field_mappings=None,
    add_source="NO_SOURCE_INFO",
    field_match_mode="AUTOMATIC"
)

Make sure to turn off the separate Youth Services Locations and Library Locations layers, leaving only School Locations and ThirdPlacesMerge.

Now we're going to select the third places that are within 0.5 mile walking distance of a school, as those are the ones we are interested in. Go to the Geoprocessing Catalog and search for the Select Layer by Location tool. The input layer is ThirdPlacesMerge, and we'll use School Locations as the Selecting Feature. Set the Relationship to Within a Distance, and the Search Distance as 0.5 US Survey Miles.

In [None]:
arcpy.management.SelectLayerByLocation(
    in_layer="Third_Places_Merge",
    overlap_type="WITHIN_A_DISTANCE",
    select_features="School Locations",
    search_distance="0.5 Miles",
    selection_type="NEW_SELECTION",
    invert_spatial_relationship="NOT_INVERT"
)

Create a new layer from this selection, naming it Third Places Within Walking Distance. If necessary, change the symbology so that you can differentiate the original Third Places Merge layer.
Turn off the original Third Places Merge layer.

Import the TIGER/Line census tract shapefile for NY state and do a definition query to pare it down to the five counties in New York City.

In [13]:
map.addDataFromPath(r"G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis\tl_2023_36_tract.shp")

<arcpy._mp.Layer at 0x1a7688309d0>

Right click on the tl_2023_36_tract in the Contents pane, go to Properties > Definition Query > Where COUNTYFP is equal to 005 OR 074 OR 061 OR 081 OR 085. Run the query. 

Now, we'll add ACS income data to the map manually (Map > Add Data > Browse > add the ACS_S1903.csv file), and join the S1903_C03_01E field to the census tract shapefile.

In [21]:
arcpy.management.JoinField(
    in_data="tl_2023_36_tract",
    in_field="GEOIDFQ",
    join_table="ACSST5Y2023.S1903-Data.csv", 
    join_field="GEO_ID",
    fields="S1903_C03_001E"
    fm_option="NOT_USE_FM",
    field_mapping=None,
    index_join_fields="NO_INDEXES"
)

Next we'll get counts of third places per census tract, then change the map to a bivariate chloropleth of median income and place count.

Use the aggregate points tool to get a count of third places per census tract.

In [None]:
arcpy.gapro.AggregatePoints(
    point_layer="Third Places Within Walking Distance",
    out_feature_class=r"G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis\Lab2_NYTeens_Analysis.gdb\ThirdPlaces_AggregatePoints2",
    polygon_or_bin="POLYGON",
    polygon_layer="tl_2023_36_tract",
    bin_type="",
    bin_size=None,
    time_step_interval=None,
    time_step_repeat=None,
    time_step_reference=None,
    summary_fields=None,
    bin_resolution=None
)

Join the aggregate point count of third places to the census tract shapefile, to create a bivariate chloropleth of third places and median income per tract.

In [24]:
arcpy.management.JoinField(
    in_data="tl_2023_36_tract",
    in_field="GEOIDFQ",
    join_table="ThirdPlaces_AggregatePoints2",
    join_field="GEOIDFQ",
    fields="COUNT",
    fm_option="NOT_USE_FM",
    field_mapping=None,
    index_join_fields="NO_INDEXES"
)

To create the bivariate chloropleth, select the symbology of the tl_2023_36_tract layer, change the primary symbology from single symbol to bivariate colors, and put COUNT in field 1 and S1903 (the income vairable) into field 2.

Now we'll run a Bivariate Spatial Association (Lee's L) to see if there is a statistically significant correlation between median income and number of free third places within census tracts.

In [26]:
arcpy.stats.BivariateSpatialAssociation(
    in_features="tl_2023_36_tract",
    analysis_field1="S1903",
    analysis_field2="COUNT",
    out_features=r"G:\My Drive\INFO612_Advanced_GIS\Lab2_NYTeens_Analysis\Lab2_NYTeens_Analysis.gdb\tl_2023_36_tract_BivariateSpatialAssociation",
    neighborhood_type="CONTIGUITY_EDGES_CORNERS",
    distance_band=None,
    num_neighbors=None,
    weights_matrix_file=None,
    local_weighting_scheme="UNWEIGHTED",
    kernel_bandwidth=None,
    num_permutations=499
)

The Lee's L summary suggests that there is a an essentially negligible negative correlation between income and number of third places. The Global Lee's L is -0.0425, the raw Pearson correlation is -0.0757, and the neighborhood averages correlation is -0.1151, all suggesting a very slight negative correlation between the two variables. It's also clear from the spatial smoothing scalar for both variables that income is much more autocorrelated than the location of these third places.