<a href="https://colab.research.google.com/github/rasheibani/Trajectory-Analysis/blob/main/SPARC_day_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analysis and Visualization of Intersections and Close Points in GPX Trajectories

## Introduction
In this notebook, we will analyze and visualize points where trajectories from multiple GPX files come close to each other. This process involves the following steps:
1. **Mounting Google Drive**: Access Google Drive.
2. **Loading GPX Files**: Read multiple GPX files from a specified directory into individual DataFrames.
3. **Converting to GeoDataFrames**: Convert these DataFrames into GeoDataFrames for spatial analysis.
4. **Finding Close Points**: Identify points where any two trajectories come within a specified distance of each other.
5. **Generating a Heatmap**: Create a heatmap to visualize these close points directly within the notebook.



In [None]:
!pip install ezgpx geopandas



### Mounting Google Drive
We begin by mounting Google Drive to access the directory containing the GPX files. This allows us to read the GPX files directly from the cloud storage.

Next, we list all the GPX files in the specified folder and load each one into a separate DataFrame using the `ezgpx` library. This library simplifies the process of reading GPX files and converting them into pandas DataFrames.


In [None]:
import os
import ezgpx
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Define the path to the folder containing GPX files
gpx_folder_path = '/content/drive/MyDrive/GPX/MultipleGPX/'

# List all GPX files in the folder
gpx_files = [os.path.join(gpx_folder_path, file) for file in os.listdir(gpx_folder_path) if file.endswith('.gpx')]

# Load each GPX file into a DataFrame
gpx_dataframes = []
for file in gpx_files:
    gpx = ezgpx.GPX(file)
    df = gpx.to_dataframe()
    gpx_dataframes.append(df)

print(f"Loaded {len(gpx_dataframes)} GPX files into dataframes.")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Loaded 9 GPX files into dataframes.


## Introduction to GeoPandas and GeoDataFrames

### What is GeoPandas?

[GeoPandas](https://geopandas.org/) is an open-source project that makes working with geospatial data in Python easier. It extends the datatypes used by pandas to allow spatial operations on geometric types. GeoPandas enables you to read, write, and manipulate geographic data in a way that is both efficient and straightforward.

### What is a GeoDataFrame?

A GeoDataFrame is a tabular data structure that contains a column called `geometry`, which stores geometric information (such as points, lines, and polygons). This allows for the integration of spatial data with the traditional capabilities of pandas DataFrames, enabling powerful spatial analysis and visualization.

### Key Features of GeoPandas

- **Spatial Operations**: Perform operations like overlays, spatial joins, and geoprocessing.
- **CRS Management**: Easily handle coordinate reference systems (CRS) and reproject geometries.
- **Visualization**: Plot spatial data using built-in methods that integrate with Matplotlib.
- **Integration**: Seamlessly work with other geospatial libraries such as Shapely, Fiona, and Pyproj.

### Example Usage

In this notebook, we will use GeoPandas to convert our DataFrames containing GPX data into GeoDataFrames. This will allow us to perform spatial operations necessary for finding close points between trajectories.

---



In [None]:
# Convert DataFrames to GeoDataFrames
geo_dataframes = []
for df in gpx_dataframes:
    gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df.lon, df.lat))
    geo_dataframes.append(gdf)


### Finding Close Points with Spatial Join

To identify points where trajectories from different GPX files come close to each other, we use spatial operations provided by GeoPandas. One of the key operations we use is the `sjoin` (spatial join) function combined with the `intersects` predicate.

#### What is `sjoin`?

The `sjoin` function in GeoPandas performs a spatial join between two GeoDataFrames. It combines the attributes of both GeoDataFrames based on their spatial relationship. This is useful for tasks like finding overlaps, nearest neighbors, or points within a certain distance from each other.

**Parameters of `sjoin`**:
- `gdf1`: The first GeoDataFrame.
- `gdf2`: The second GeoDataFrame.
- `how`: Specifies the type of join. Common options are `'left'`, `'right'`, and `'inner'`.
- `predicate`: Defines the spatial relationship used for the join. Examples include `'intersects'`, `'contains'`, and `'within'`.

For more information, refer to the [GeoPandas spatial join documentation](https://geopandas.org/en/stable/docs/reference/api/geopandas.sjoin.html).

#### Understanding the `intersects` Predicate

The `intersects` predicate checks if geometries in one GeoDataFrame intersect with geometries in another GeoDataFrame. In other words, it returns true if any part of one geometry overlaps with any part of another geometry. This is particularly useful for finding points that are close to each other.

####  Function to Find Close Points

Here, we define a function `find_close_points` that takes two GeoDataFrames and a maximum distance. The function creates a buffer around the geometries in the first GeoDataFrame and then uses `sjoin` with the `intersects` predicate to find intersecting points in the second GeoDataFrame.
![Overlay Operations](https://geopandas.org/en/stable/_images/overlay_operations.png)


In [None]:
# Function to find close points
def find_close_points(gdf1, gdf2, max_distance=0.0001):  # max_distance in degrees, adjust as needed
    gdf1 = gdf1.set_geometry(gdf1.geometry.buffer(max_distance))
    close_points = gpd.sjoin(gdf1, gdf2, how='inner', predicate='intersects')
    return close_points

### Finding Close Points Between All Trajectories

To identify close points between all pairs of trajectories, we iterate over the list of GeoDataFrames. For each pair of trajectories, we use the `find_close_points` function to find points where they come within a specified distance of each other. If close points are found, they are added to a list for further analysis.


In [None]:
# Find all close points between all pairs of trajectories
all_close_points = []
for i in range(len(geo_dataframes)):
    for j in range(i + 1, len(geo_dataframes)):
        close_points = find_close_points(geo_dataframes[i], geo_dataframes[j])
        if not close_points.empty:
            all_close_points.append(close_points)

# Concatenate all close points into a single DataFrame
if all_close_points:
    all_close_points_df = pd.concat(all_close_points, ignore_index=True)
    print(f"Found {len(all_close_points_df)} close points between trajectories.")
else:
    all_close_points_df = pd.DataFrame()
    print("No close points found between any pairs of trajectories.")

Found 151742 close points between trajectories.


### Understanding the `all_close_points_df` DataFrame

After finding the points where trajectories come close, the results are stored in a DataFrame called `all_close_points_df`. This DataFrame contains detailed information about each pair of close points from different trajectories.

#### Columns in `all_close_points_df`

The DataFrame has the following columns:

- **lat_left**: Latitude of the point from the first trajectory (left geometry).
- **lon_left**: Longitude of the point from the first trajectory (left geometry).
- **ele_left**: Elevation of the point from the first trajectory (left geometry).
- **time_left**: Timestamp of the point from the first trajectory (left geometry).
- **geometry**: Geometric representation of the buffered point from the first trajectory.
- **index_right**: Index of the corresponding close point in the second trajectory (right geometry).
- **lat_right**: Latitude of the point from the second trajectory (right geometry).
- **lon_right**: Longitude of the point from the second trajectory (right geometry).
- **ele_right**: Elevation of the point from the second trajectory (right geometry).
- **time_right**: Timestamp of the point from the second trajectory (right geometry).

#### Example Data

Here is a sample row from `all_close_points_df` to illustrate the structure:

| lat_left  | lon_left  | ele_left | time_left             | geometry                           | index_right | lat_right | lon_right | ele_right | time_right            |
|-----------|-----------|----------|-----------------------|------------------------------------|-------------|-----------|-----------|-----------|-----------------------|
| 37.7749   | -122.4194 | 15.2     | 2023-06-01T12:34:56Z  | POLYGON ((...))                    | 10          | 37.7750   | -122.4195 | 15.4      | 2023-06-01T12:35:00Z  |

#### Explanation

- **Left Geometry Columns**: These columns (`lat_left`, `lon_left`, `ele_left`, `time_left`) provide details about the point from the first trajectory that is close to a point in another trajectory.
- **Geometry**: The `geometry` column contains the buffered geometric representation of the point from the first trajectory. This buffer is used to determine the proximity to points in the second trajectory.
- **Right Geometry Columns**: These columns (`index_right`, `lat_right`, `lon_right`, `ele_right`, `time_right`) provide details about the corresponding close point in the second trajectory.

#### Usage

This DataFrame allows us to analyze and vis


In [None]:
all_close_points_df

Unnamed: 0,lat_left,lon_left,ele_left,time_left,geometry,index_right,lat_right,lon_right,ele_right,time_right
0,-37.828298,144.949573,8.8,2024-02-24 20:08:26+00:00,"POLYGON ((144.94967 -37.82830, 144.94967 -37.8...",14621,-37.828391,144.949564,8.8,2023-12-17 03:30:19+00:00
1,-37.828371,144.949565,8.8,2024-02-24 20:08:43+00:00,"POLYGON ((144.94967 -37.82837, 144.94966 -37.8...",14621,-37.828391,144.949564,8.8,2023-12-17 03:30:19+00:00
2,-37.828384,144.949552,8.8,2024-02-24 20:08:44+00:00,"POLYGON ((144.94965 -37.82838, 144.94965 -37.8...",14621,-37.828391,144.949564,8.8,2023-12-17 03:30:19+00:00
3,-37.828400,144.949550,8.8,2024-02-24 20:08:45+00:00,"POLYGON ((144.94965 -37.82840, 144.94965 -37.8...",14621,-37.828391,144.949564,8.8,2023-12-17 03:30:19+00:00
4,-37.828408,144.949539,8.8,2024-02-24 20:08:46+00:00,"POLYGON ((144.94964 -37.82841, 144.94964 -37.8...",14621,-37.828391,144.949564,8.8,2023-12-17 03:30:19+00:00
...,...,...,...,...,...,...,...,...,...,...
151737,-37.787370,144.940662,14.0,2023-09-17 02:31:15+00:00,"POLYGON ((144.94076 -37.78737, 144.94076 -37.7...",445,-37.787374,144.940705,22.2,2023-08-18 23:27:52+00:00
151738,-37.787420,144.940706,14.2,2023-09-17 02:31:16+00:00,"POLYGON ((144.94081 -37.78742, 144.94081 -37.7...",445,-37.787374,144.940705,22.2,2023-08-18 23:27:52+00:00
151739,-37.787465,144.940739,14.2,2023-09-17 02:31:17+00:00,"POLYGON ((144.94084 -37.78746, 144.94084 -37.7...",445,-37.787374,144.940705,22.2,2023-08-18 23:27:52+00:00
151740,-37.787370,144.940662,14.0,2023-09-17 02:31:15+00:00,"POLYGON ((144.94076 -37.78737, 144.94076 -37.7...",447,-37.787433,144.940591,21.9,2023-08-18 23:27:54+00:00


### Visualizing Close Points with a Heatmap

#### Introduction to Folium and Heatmaps

As mentioned in Day 2, [**Folium**](https://python-visualization.github.io/folium/) is a powerful Python library that allows you to create interactive maps quickly. It builds on the JavaScript library Leaflet and integrates smoothly with Python data manipulation libraries like pandas and GeoPandas.

One of the visualization tools provided by Folium is the **HeatMap**. A heatmap is a graphical representation of data where individual values are represented as colors. In the context of spatial data, a heatmap is useful for showing the intensity or density of points on a map.

#### Purpose of Using a Heatmap

In this notebook, we will use a heatmap to visualize the intensity of points where trajectories from multiple GPX files come close to each other. This will help us identify areas with high concentrations of close points, which can be critical for understanding patterns in trajectory data.

#### Implementation

We will follow these steps to create and display the heatmap:

1. **Prepare Data**: Extract the latitude and longitude of each close point.
2. **Create Base Map**: Initialize a Folium map centered on the mean coordinates of the close points.
3. **Add HeatMap Layer**: Use the `HeatMap` class from Folium to add the heatmap layer to the map.
4. **Display Map**: Render the map directly within the Jupyter notebook.


In [None]:
import folium
from folium.plugins import HeatMap
from IPython.display import display


# Create a base map
if not all_close_points_df.empty:
    m = folium.Map(location=[all_close_points_df['lat_left'].mean(), all_close_points_df['lon_left'].mean()], zoom_start=11, width=600, height = 400)

    # Prepare data for the heatmap
    heat_data = [[row['lat_left'], row['lon_left']] for index, row in all_close_points_df.iterrows()]

    # Add heatmap to the map
    HeatMap(heat_data, radius=10).add_to(m)

    # Add GPX tracks to the map
    for df in gpx_dataframes:
        coordinates = df[['lat', 'lon']].values.tolist()
        folium.PolyLine(locations=coordinates, color='blue', weight=2.5, opacity=1).add_to(m)

    # Save the map as an HTML file
    m.save('close_points_heatmap.html')
    # print("Heatmap has been saved to close_points_heatmap.html")
    display(m)
else:
    print("No close points to display on the heatmap.")

