## Spatial analysis in Python

![image-3.png](attachment:image-3.png)

## 1. Data Collection 

### 1.1 Downlaod Data

[Crime Incident Report 2022](https://data.boston.gov/dataset/crime-incident-reports-august-2015-to-date-source-new-system)

[Censsu Tract 2020 Boston](https://data.boston.gov/dataset/2020-census-tracts-in-boston)

Some census tract are removed due to having a populatio of zero. 

In [5]:
import os
import pandas as pd
from arcgis import GIS
from arcgis.features import GeoAccessor, GeoSeriesAccessor

### 1.2 Load Data

os.getcwd() can get the current working directory

In [2]:
paths = os.path.join(os.getcwd(),  "CrimeIncident.csv")
paths

'F:\\Clark_Universiy\\Clark_Teaching\\Git_Repo\\ssj-302\\docs\\Lectures\\Week08_spatailAnalysis\\Week08_Student_SpatialAnalysis\\CrimeIncident.csv'

In [3]:
csv_file = pd.read_csv() 
csv_file.head()



Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,222076257,619,,LARCENY ALL OTHERS,D4,167.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,HARRISON AVE,42.339542,-71.069409,"(42.33954198983014, -71.06940876967543)"
1,222053099,2670,,HARASSMENT/ CRIMINAL HARASSMENT,A7,,0,2022-01-01 00:00:00,2022,1,Saturday,0,,BENNINGTON ST,42.377246,-71.032597,"(42.37724638479816, -71.0325970804128)"
2,222039411,3201,,PROPERTY - LOST/ MISSING,D14,778.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,WASHINGTON ST,42.349056,-71.150498,"(42.34905600030506, -71.15049849975023)"
3,222011090,3201,,PROPERTY - LOST/ MISSING,B3,465.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,BLUE HILL AVE,42.284826,-71.091374,"(42.28482576580488, -71.09137368938802)"
4,222062685,3201,,PROPERTY - LOST/ MISSING,B3,465.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,BLUE HILL AVE,42.284826,-71.091374,"(42.28482576580488, -71.09137368938802)"


dtype in read_csv() can apply data typ to either the whole dataset or individual columns

In [4]:
csv_file = pd.read_csv(paths, dtype =) 
print(len(csv_file.index))
csv_file.head(5)

73852


Unnamed: 0,INCIDENT_NUMBER,OFFENSE_CODE,OFFENSE_CODE_GROUP,OFFENSE_DESCRIPTION,DISTRICT,REPORTING_AREA,SHOOTING,OCCURRED_ON_DATE,YEAR,MONTH,DAY_OF_WEEK,HOUR,UCR_PART,STREET,Lat,Long,Location
0,222076257,619,,LARCENY ALL OTHERS,D4,167.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,HARRISON AVE,42.339542,-71.069409,"(42.33954198983014, -71.06940876967543)"
1,222053099,2670,,HARASSMENT/ CRIMINAL HARASSMENT,A7,,0,2022-01-01 00:00:00,2022,1,Saturday,0,,BENNINGTON ST,42.377246,-71.032597,"(42.37724638479816, -71.0325970804128)"
2,222039411,3201,,PROPERTY - LOST/ MISSING,D14,778.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,WASHINGTON ST,42.349056,-71.150498,"(42.34905600030506, -71.15049849975023)"
3,222011090,3201,,PROPERTY - LOST/ MISSING,B3,465.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,BLUE HILL AVE,42.284826,-71.091374,"(42.28482576580488, -71.09137368938802)"
4,222062685,3201,,PROPERTY - LOST/ MISSING,B3,465.0,0,2022-01-01 00:00:00,2022,1,Saturday,0,,BLUE HILL AVE,42.284826,-71.091374,"(42.28482576580488, -71.09137368938802)"


### 1.3 Remove NA

remove row with NA based on column

specify column using subset()

In [5]:
full_crime = csv_file.dropna()
print(len(full_crime.index))

70044


### 1.4 Save the csv to a temporary file

In [6]:
incidents = r"Incidents.csv"  # Temporary file path for cleaned CSV
full_crime.to_csv(incidents, index=False)
arcpy.management.MakeTableView(incidents, 'Incidents.csv')

### 1.5 Convert data from csv to shapefile

In [7]:
output_folder = os.getcwd()  # Folder to save the shapefile
crime_shp = os.path.join(output_folder, "CrimeInct.shp") 

Create a [SpatialReference object](https://pro.arcgis.com/en/pro-app/latest/arcpy/classes/spatialreference.htm) using a coordinate system's factory code (or authority code or WKID)

In [8]:
spatial_ref = arcpy.SpatialReference()

In [9]:
incidents = r"Incidents"
arcpy.management.XYTableToPoint(incidents, crime_shp, x_field=, y_field=, coordinate_system=)

## 2. Explortary Spatial Data Analysis

### 2.1 Spatial join

[Spatial join](https://pro.arcgis.com/en/pro-app/latest/tool-reference/analysis/spatial-join.htm): Joins attributes from one feature to another based on the spatial relationship. The target features and the joined attributes from the join features are written to the output feature class.

![image.png](attachment:image.png)

In [11]:
inct_shp = r"CrimeInct.shp"
ct_shp = r"2020_Boston_CT.shp"
output_shp = r"CT_inct_total.shp"

In [12]:
arcpy.analysis.SpatialJoin(
    target_features=,
    join_features=,
    out_feature_class=,
    join_operation = "JOIN_ONE_TO_ONE",
    join_type="KEEP_COMMON",
    match_option="CONTAINS")

## 3. Spatial Pattern

### 3.1 Visualize Crime Incident counts with SEDF 

#### Q1: How many incidents are there in each census tract in Boston?

In [13]:
polygon_shapefile = r"CT_inct_total.shp"
crime_total = pd.DataFrame.spatial.from_featureclass(polygon_shapefile)

In [16]:
extent = crime_total.spatial.full_extent
extent

[Spatial Plot](https://developers.arcgis.com/python/latest/guide/visualizing-data-with-the-spatially-enabled-dataframe/)

[cmap in matplotlib](https://matplotlib.org/stable/users/explain/colors/colormaps.html)

In [17]:
map_widget = crime_total.spatial.plot(renderer_type='c',  # 'c' stands for class breaks renderer
                               method='esriClassifyNaturalBreaks',  # Classification method (natural breaks)
                               class_count=10,  # Number of classes
                               col=,  # Column used for symbology
                               cmap='YlOrRd',  # Color map (adjust as needed)
                               figsize=[10, 10],  # Size of the map
                               alpha=0.7,  # Transparency level
                               title="Total Incidents by Polygon",
                                     legend = True,
                                     extent = extent)  # Map title
map_widget

MapView(layout=Layout(height='400px', width='100%'))

### 3.2  Visualize Crime Incident density with SEDF 

#### Q2: What is the density of crime incidents within each census tract (points per square kilometer)?

#### 3.2.1Calcualte Geometry Attributes (Area)

[Calculate Geometry Attributes (Data Management)](https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/calculate-geometry-attributes.htm)

In [2]:
input_table = r"CT_inct_total.shp"
arcpy.management.AddField(input_table, field_name = , field_type = )

arcpy.management.CalculateGeometryAttributes(
    in_features = input_table,
    geometry_property = [["Area", "AREA_GEODESIC"]],
    area_unit="SQUARE_KILOMETERS"
)

#### 3.2.2 Calcualte field

[Calcualte field](https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/calculate-field.htm)

In [3]:
arcpy.management.AddField(input_table, field_name = 'Density', field_type = 'DOUBLE')

arcpy.management.CalculateField(
    in_table = input_table,
    field =,
    expression=,
    expression_type="PYTHON3"
)

In [6]:
crime_sedf = pd.DataFrame.spatial.from_featureclass(input_table)

map_widget = crime_sedf.spatial.plot(renderer_type='c',  # 'c' stands for class breaks renderer
                               method='esriClassifyNaturalBreaks',  # Classification method (natural breaks)
                               class_count=10,  # Number of classes
                               col=,  # Column used for symbology
                               cmap='YlOrRd',  # Color map (adjust as needed)
                               figsize=[10, 10],  # Size of the map
                               alpha=0.7,  # Transparency level
                               title="Crime Incidents per Sqaure Meter")  # Map title
map_widget

MapView(layout=Layout(height='400px', width='100%'))

### 3.3  Proportion analysis of Crime Incidents

#### Q3: How the crime incidents distribute when we consider the underlying population?

#### 3.3.1 Convert population count from string to numeric

In [8]:
# The default field_type is text if do not AddField first

arcpy.management.AddField(input_table,)

arcpy.management.CalculateField(
    in_table=input_table,
    field="pop_num",
    expression="float(!pop!)",  # Convert text to float
    expression_type="PYTHON3"
)

#### 3.3.2 Calculate the sum of population

In [9]:
total_pop = 0

# Use SearchCursor to iterate through the feature class and sum the values of the numeric field
with arcpy.da.SearchCursor(input_table, ['pop_num']) as cursor:
    for row in cursor:
        total_pop += row[0] 
total_pop

675522.0

In [10]:
arcpy.management.AddField(input_table, 'pop_prop', 'DOUBLE')

arcpy.management.CalculateField(
    in_table = input_table,
    field = "pop_prop",
    expression=,
    expression_type="PYTHON3"
)

# Lab 6: Analyzing Spatial Disparities of Crime Incidents

#### 1. Calculate the proportion of Crime incidents for each census tract

- Q1 finished based on CT_inct_total
- Add field 'crime_prop'
- Calculate the total number of crime and assign the value to variable 'total_crime'
- Calculate filed 'crime_prop' = Join_Count (the number of crime incident in each census tract) / total number of crime incident

#### 2. Calculate the ratio of proportion of crime incident and the proportion of population
- Q2 finished based on CT_inct_total
- Add filed 'ratio'
- Calculate field 'ratio' = crime_prop / pop_prop 

#### 3. Visualization the spatial Disparities
- Q3 finished based on CT_inct_total
- Visualize the spaital pattern of 'ratio' in CT_inti_total

#### 4. Conclusion
Based on the spatial distribution, what patterns do you observe?

What is the meaning of the value? For example, Census Tract A has a ratio of 150, which Census Tract B has a ration as 0.6. 

# Final Project idea due Week 8 (midnight Oct 20)
## Submit one page to introduce your idea about final project. 

# Content for next week: Geopandas