# Analysis of the data using the coordinates

In this notebook, we aim to use the spatial data in our data. In particular, we would like to do:

- Finding out the location names from the coordinates. Using the data for further analysis.
- Plotting the crime data on Folium. Heatmaps.
- k-means analysis.


In [3]:
# Load the data
import geopandas as gpd
crime_data = gpd.read_file("../dataset/ALL 2023 AND UNTIL JAN29 2024.geojson")
crime_data.crs = 'EPSG:3857' # CRS represents the coordinate system. crime_data initally is set to have CRS of EPSG:4326 but this is actually wrong. Here I set the CRS to be the correct one.
crime_data = crime_data.to_crs(epsg=4326) # change the coordinate system to EPSG:4326 or (longitude, latitude). Geocoder only accepts (long, lat).
crime_data.head()

Unnamed: 0,CREATE_TIME_INCIDENT,LOCATION_TEXT,BEAT,REPORT_NUMBER,LEGEND2,DISPO_TEXT,OBJECTID,geometry
0,2024-01-03 07:33:12,DUNKIN DONUTS,SW1,,505,COMMUNITY POLICING,1,POINT (-84.34536 30.45588)
1,2024-01-03 11:52:40,,SE1,,505,COMMUNITY POLICING,2,POINT (-84.25513 30.33586)
2,2024-01-03 11:53:16,ST PHILLIPS AME CHURCH,NE1,,505,COMMUNITY POLICING,3,POINT (-84.17594 30.53916)
3,2024-01-03 15:06:07,,SW3,,145,THEFT - GRAND,4,POINT (-84.30023 30.37559)
4,2024-01-03 15:15:23,CANOPY OAKS ELEMENTARY SCHOOL,NW1,,505,COMMUNITY POLICING,5,POINT (-84.34680 30.51014)


# Coordinates -> Location.


Most of the entries have no information about the location (`None`). We would like to figure out the location from the coordinates. To do this, we use `gpd.tools.reverse_geocode`. Geocoders are the software that converts the coordinates into adderss or the other way. We will use `Photon` geocoder which is based on OpenStreetMap.

In [53]:
gpd.tools.reverse_geocode(crime_data['geometry'].iloc[0:100])

Unnamed: 0,geometry,address
0,POINT (-84.34492 30.45600),"Blountstown Highway, 32310, Florida, United St..."
1,POINT (-84.25564 30.33666),"J. Lewis Hall Senior, Woodville Park and Recre..."
2,POINT (-84.17590 30.53915),"Saint Phillip AME Church, Centerville Road, 32..."
3,POINT (-84.29970 30.37542),"Wright Drive, 32305, Florida, United States"
4,POINT (-84.34653 30.51011),"Canopy Oaks Elementary School, 3250, Point Vie..."
...,...,...
95,POINT (-84.33637 30.43665),"Don Price Way, 32304, Tallahassee, Florida, Un..."
96,POINT (-84.33637 30.43665),"Don Price Way, 32304, Tallahassee, Florida, Un..."
97,POINT (-84.29938 30.36048),"Button Willow Lane, 32305, Florida, United States"
98,POINT (-84.34394 30.51410),"Bentwood Way, 32303, Florida, United States"


## ToDo:
- Do this for the entire dataset and combine with the original dataframe. This will most likely take more than a day since our dataset is gigantic, so I recommend you use RCC.
- Save the new data. For safety, do not overwrite the old one.
- Plot a new bar chart for these adresses.

# Plot the crime data using Folium

Here, we will use `Folium` to plot the crime data on an interactive map and also plot the heat map. See `heatmap.html` file in a web browser.


In [58]:
import folium
from folium.plugins import HeatMap

m = folium.Map([30.45588, -84.34536], zoom_start=11)
heat_data = [[row['y'],row['x']] for index, row in crime_data.get_coordinates().iterrows()]
HeatMap(heat_data,
    radius=20, 
    blur=15, 
    max_zoom=5).add_to(m)

m.save('heatmap.html')

# ToDo
Can `HeatMapWithTime` plugin be used for our case? See https://python-visualization.github.io/folium/latest/user_guide/plugins/heatmap_with_time.html. Timelapse over time (day by day)?

# k-means Clustering

We would like to do something similar to k-means clustering in https://developers.arcgis.com/python/samples/crime-analysis-and-clustering-using-geoanalytics-and-pyspark/#Find-the-optimal-number-of-clusters.