# GPX data analysis (multiple files)

This notebook contains some Python test code to analyze and visualize GPX data from a multiple files. GPX (GPS Exchange Format) is an XML based file format for GPS tracks, and is widely used by dozens of software programs and web services for GPS data analysis and visualization. 

Data manipulation is done using [gpxpy](https://github.com/tkrajina/gpxpy), a GPX parser for Python, and the [NumPy](http://www.numpy.org/) and [Pandas](https://pandas.pydata.org/) data analysis packages for Python. Visualization is done using [gmplot](https://github.com/vgm64/gmplot), a Python library to plot GPS data on Google Maps.

## Prerequisites:

In [1]:
import datetime
import glob # module to find all pathnames matching a specified pattern
import os

import gpxpy
import numpy  as np
import pandas as pd
import gmplot

%load_ext watermark
%watermark -a "Author: gmalim" 
print("")
%watermark -u -n
print("")
%watermark -v -p numpy,pandas,gpxpy,gmplot
print("")
%watermark -m

Author: gmalim

last updated: Fri Jun 22 2018

CPython 3.6.5
IPython 6.4.0

numpy 1.14.5
pandas 0.23.1
gpxpy n
gmplot n

compiler   : GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)
system     : Darwin
release    : 15.6.0
machine    : x86_64
processor  : i386
CPU cores  : 2
interpreter: 64bit


## Load data:

Create function to load list of GPX files:

In [2]:
def load_run_data(gpx_path, filter="*"):
    """
    Loop over files, tracks, segments and points.
    """
    
    gpx_files = glob.glob(os.path.join(gpx_path, filter + ".gpx"))
    
    gpx_points = []
    
    total_track_length   = 0
    total_track_duration = 0
    
    for file_idx, gpx_file in enumerate(gpx_files):
    
        print("--> Processing file", file_idx)
        
        gpx = gpxpy.parse(open(gpx_file, 'r'))
        
        for track_idx, track in enumerate(gpx.tracks):
            
            #print("---> Processing track", track_idx)
            
            track_length   = track.length_3d()
            track_duration = track.get_duration()

            total_track_length   += track_length
            total_track_duration += track_duration
            
            for seg_idx, segment in enumerate(track.segments):
                
                #print("----> Processing segment", seg_idx)
                
                segment_length = segment.length_3d()
                
                for point_idx, point in enumerate(segment.points):
                    
                    #print("----> Processing point", point_idx)
                    
                    gpx_points.append([point.time,
                                       point.latitude, 
                                       point.longitude, 
                                       point.elevation, 
                                       segment.get_speed(point_idx)])
                    
    return gpx_points, total_track_length, total_track_duration

Load list of GPX files and parse data to dataframe:

In [3]:
gpx_points, total_track_length, total_track_duration = load_run_data(gpx_path='./data/', filter="*")

print('')
print('Total distance as summed between points in track = {:6.2f} miles'.format(total_track_length*0.000621371))
print('Total distance as summed between points in track = {:6.2f} km'   .format(total_track_length*0.001))
print('Total time     as summed between points in track = {:6.2f} hours'.format(total_track_duration/3600))

column_list = ['Point_Time', 
               'Point_Latitude',
               'Point_Longitude', 
               'Point_Elevation', 
               'Point_Speed']

df = pd.DataFrame(gpx_points, columns=column_list)

--> Processing file 0
--> Processing file 1
--> Processing file 2
--> Processing file 3
--> Processing file 4
--> Processing file 5
--> Processing file 6

Total distance as summed between points in track =  72.20 miles
Total distance as summed between points in track = 116.20 km
Total time     as summed between points in track =  13.34 hours


Optional: Save dataframe using pickle:

In [4]:
df.to_pickle("alldata.pkl")

print(df.shape)
print(df.head())

(46651, 5)
           Point_Time  Point_Latitude  Point_Longitude  Point_Elevation  \
0 2016-05-15 15:07:53       37.790538      -122.393670              3.6   
1 2016-05-15 15:07:54       37.790533      -122.393673              3.6   
2 2016-05-15 15:07:55       37.790526      -122.393684              3.6   
3 2016-05-15 15:07:56       37.790527      -122.393673              3.6   
4 2016-05-15 15:07:57       37.790519      -122.393669              3.8   

   Point_Speed  
0     0.659532  
1     0.951101  
2     1.100640  
3     0.952015  
4     1.332152  


Optional: Load dataframe from pickled file:

In [5]:
df = pd.read_pickle("alldata.pkl")

print(df.shape)
print(df.head())

(46651, 5)
           Point_Time  Point_Latitude  Point_Longitude  Point_Elevation  \
0 2016-05-15 15:07:53       37.790538      -122.393670              3.6   
1 2016-05-15 15:07:54       37.790533      -122.393673              3.6   
2 2016-05-15 15:07:55       37.790526      -122.393684              3.6   
3 2016-05-15 15:07:56       37.790527      -122.393673              3.6   
4 2016-05-15 15:07:57       37.790519      -122.393669              3.8   

   Point_Speed  
0     0.659532  
1     0.951101  
2     1.100640  
3     0.952015  
4     1.332152  


Optional: Select data according to time:

In [6]:
df['Point_Time'] = pd.to_datetime(df['Point_Time'])

start_date = pd.Timestamp(2015,1,1)   
end_date   = pd.Timestamp(2018,1,1)   

mask = ((df['Point_Time'] > start_date) & (df['Point_Time'] <= end_date))

df = df.loc[mask]

print(df.shape)
print(df.head())

(46651, 5)
           Point_Time  Point_Latitude  Point_Longitude  Point_Elevation  \
0 2016-05-15 15:07:53       37.790538      -122.393670              3.6   
1 2016-05-15 15:07:54       37.790533      -122.393673              3.6   
2 2016-05-15 15:07:55       37.790526      -122.393684              3.6   
3 2016-05-15 15:07:56       37.790527      -122.393673              3.6   
4 2016-05-15 15:07:57       37.790519      -122.393669              3.8   

   Point_Speed  
0     0.659532  
1     0.951101  
2     1.100640  
3     0.952015  
4     1.332152  


## Visualization:

Create heatmap using **gmplot**:

In [7]:
map_center_lat = df.Point_Latitude .mean() + 0.01
map_center_lon = df.Point_Longitude.mean()

print("map_center_lat = {:8.3f} degrees".format(map_center_lat))
print("map_center_lon = {:8.3f} degrees".format(map_center_lon))

gmap = gmplot.GoogleMapPlotter(map_center_lat, map_center_lon, 13) # lat & lon of map center and map zoom level

gmap.heatmap(df['Point_Latitude'], df['Point_Longitude'], maxIntensity=100)

gmap.draw("gmplot_heatmap.html")

map_center_lat =   37.796 degrees
map_center_lon = -122.439 degrees


In [8]:
%%HTML
<iframe width="100%" height="700" src="gmplot_heatmap.html"></iframe>