# Creating new GPX files with unified structure/names

For this project we have amassed quite a few *gpx* files from a wide variety of sources, and it is imperative that we unify their structure (and naming schema) to better store and classify them for future use.

To achieve this unification we will be using the library **Gpxpy**, the very same one that we used earlier to parse all those *gpx* files and store their content into dataframes.

# Defining the scope of our function

This function must meet the following criteria:

1. Substitute the route's name with one that matches our dataset.
2. Fill in the route's creator with our url adress.
3. Delete superflous data such as waypoints and description.

In [23]:
#Importing the necessary libraries.

import pandas as pd
import gpxpy
import gpxpy.gpx
import time
import pathlib
from pathlib import Path
import os
from gpx_converter import Converter

## Testing parameters

We will begin by exploring a parsed *gpx* file, finding our where are the relevant values stored and how to change them.

In [2]:
#Parsing a gpx file for testing purposes.

gpx_file = open('Algorta.gpx', 'r', encoding='utf-8') #We might find some encoding errors, Gpxpy is quite picky.

gpx = gpxpy.parse(gpx_file)
track = gpx.tracks[0]
segment = track.segments[0]

In [6]:
#Exploring the route name.

track.name

'Algorta'

In [7]:
#Description.

track.description

'Paseo Puerto Viejo a rotonda de Leioa tres veces y luego por bidegorri a Sope y volver a punto de partida.'

In [8]:
#Creator.

gpx.creator

'Wikiloc - http://www.wikiloc.com'

In [9]:
#And finally, its waypoints. This route seems to have none.

gpx.waypoints

[]

Now that we have located the information that we need to change it's time to try and actually change it.

In [10]:
#Changing the values.

track.name = 'test_name'
track.description = 'test_description'
gpx.creator = 'https://www.on2wheels.es/'
gpx.waypoints = ''

In [11]:
#To test this process we will save this modified gpx ad a new file and re-parse it.

with open("output.gpx", "w") as f:
    f.write( gpx.to_xml())

In [12]:
#Now it's time to parse this generated gpx file and check for the values again.

gpx_file = open('output.gpx', 'r', encoding='utf-8')

gpx = gpxpy.parse(gpx_file)
track = gpx.tracks[0]
segment = track.segments[0]

In [14]:
#Checking the values.

print(track.name)
print(track.description)
print(gpx.creator)
print(gpx.waypoints)

test_name
test_description
https://www.on2wheels.es/
[]


It's a success! Now we can proceed to the next step: defining a function that performs this operation automatically and assigns a predetermined file name (and route name).

## Creating the function

While our *gpx* library is huge, we just need to change the names of a few routes for the moment. To be more precise, the routes contained in the following dataframe.

In [17]:
routes = pd.read_csv('routes_1607_819.csv')

In [18]:
routes.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,municipality,mountain_passes_ids,municipalities_ids
0,923,"ANGLIRU, CIRCULAR DESDE LA PLAZA, TEVERGA",,,"[-6.101982,43.158859]","[-5.939921,43.235847]",67,124,3476,101,1566,,[0],
1,5611,"Pola de Lena, Cobertoria, Gamoniteiro, Tenebre...",,,"[-5.8297,43.155729]","[-5.929957,43.288199]",51,118,4234,102,1700,,"[0, 1, 84, 131]",
2,5490,PEÑA ESCRITA (POR ALMUÑECAR),,,"[-3.743127,36.734975]","[-3.762692,36.818439]",42,45,1481,6,1191,,[2],
3,881,Ancares-Pandozarco,,,"[-7.157974,42.852246]","[-6.844199,42.889535]",55,130,2861,289,1651,,"[3, 182, 1109]",
4,5618,POLA DE LENA - PUERTO DE PAJARES - CUITU NEGRU...,,,"[-5.806177,43.128166]","[-5.829091,43.083221]",42,121,2917,344,1824,,"[4, 51, 69, 438]",


Since the value in the column *name* actually matches the name of our parsed *gpx* files (but NOT the name of the file!) our function will have to walk parse the gpx files one at a time, check its name against the *name* column and re-assign it to its *id* if there's a positive match.

In [47]:
#Creating a function that meets our criteria.

def parser(file):
    try: 
        gpx_file = open(file, 'r', encoding='utf-8') #Opening our file.
        gpx = gpxpy.parse(gpx_file) #Parsing it.
        track = gpx.tracks[0]
        segment = track.segments[0]
        name = track.name #Storing the route's name as a variable.

        for i in range(len(routes)): #Checking for the name in our dataframe.
            if routes['name'].iloc[i] == name:
                track.name = str(routes['ID'].iloc[i]) #Changing the name for its ID.
                track.description = 'Route number ' + track.name + '.' #Changing description.
                gpx.creator = 'https://www.on2wheels.es/' #Adding a creator.
                gpx.waypoints = '' #Deleting all waypoints.
                with open(track.name + '.gpx', "w") as f:
                    f.write( gpx.to_xml()) #Saving our route with the new name.
            else:
                pass
    except:
        pass

The function is ready but we need a way to make it walk through all gpx files in a directory. We can easily achieve this by using **Pathlib**. We will also be using **Time** to benchmark our function.

In [48]:
#Creating the final function.

def gpx_cleaner():
    start = time.time() #Starting our fimer.

    directory = 'gpx' #The folder containing the gpx files.

    files = Path(directory).glob('*') #Using all files in the folder as input.
    for file in files:
        parser(file) #Applying the previous function to every file.

    stop = time.time() #Stopping our timer.
    duration = (stop - start) / 60
    
    return print('Minutes:', duration) #Returning the elapsed minutes.

## Function testing

In [50]:
# Testing: 100 files.

gpx_cleaner()

Minutes: 0.7209641893704732


5 new *gpx* files have been successfully created! The function works as expected. Since it took about 0.72 minutes to clean 100 files, our full folder will take about 3 hours.

# Using our final function

Now it's simply a matter of placing all our *gpx* files inside the designated folder and running our function.