# Creating new GPX files with unified structure/names

For this project we have amassed quite a few *gpx* files from a wide variety of sources, and it is imperative that we unify their structure (and naming schema) to better store and classify them for future use.

To achieve this unification we will be using the library **Gpxpy**, the very same one that we used earlier to parse all those *gpx* files and store their content into dataframes.

# Defining the scope of our function

This function must meet the following criteria:

1. Substitute the route's name with one that matches our dataset.
2. Fill in the route's creator with our url adress.
3. Delete superflous data such as waypoints and description.

In [6]:
#Importing the necessary libraries.

import pandas as pd
import gpxpy
import gpxpy.gpx
import time
import pathlib
from pathlib import Path
import os
from gpx_converter import Converter

## Testing parameters

We will begin by exploring a parsed *gpx* file, finding our where are the relevant values stored and how to change them.

In [14]:
#Parsing a gpx file for testing purposes.

gpx_file = open('Algorta.gpx', 'r', encoding='utf-8') #We might find some encoding errors, Gpxpy is quite picky.

gpx = gpxpy.parse(gpx_file)
track = gpx.tracks[0]
segment = track.segments[0]

In [15]:
#Exploring the route name.

track.name

'Algorta'

In [16]:
#Description.

track.description

'Paseo Puerto Viejo a rotonda de Leioa tres veces y luego por bidegorri a Sope y volver a punto de partida.'

In [17]:
#Comments.

track.comment

'Paseo Puerto Viejo a rotonda de Leioa tres veces y luego por bidegorri a Sope y volver a punto de partida.'

In [18]:
#Creator.

gpx.creator

'Wikiloc - http://www.wikiloc.com'

In [19]:
#And finally, its waypoints. This route seems to have none.

gpx.waypoints

[]

Now that we have located the information that we need to change it's time to try and actually change it.

In [20]:
#Changing the values.

track.name = 'test_name'
track.description = 'test_description'
track.comment  = ''
gpx.creator = 'https://www.on2wheels.es/'
gpx.waypoints = ''

In [21]:
#To test this process we will save this modified gpx ad a new file and re-parse it.

with open("output.gpx", "w") as f:
    f.write( gpx.to_xml())

In [22]:
#Now it's time to parse this generated gpx file and check for the values again.

gpx_file = open('output.gpx', 'r', encoding='utf-8')

gpx = gpxpy.parse(gpx_file)
track = gpx.tracks[0]
segment = track.segments[0]

In [24]:
#Checking the values.

print(track.name)
print(track.description)
print(track.comment)
print(gpx.creator)
print(gpx.waypoints)

test_name
test_description
None
https://www.on2wheels.es/
[]


It's a success! Now we can proceed to the next step: defining a function that performs this operation automatically and assigns a predetermined file name (and route name).

## Creating the function

While our *gpx* library is huge, we just need to change the names of a few routes for the moment. To be more precise, the routes contained in the following dataframe.

In [7]:
routes = pd.read_csv('routes_2807_476.csv')

In [8]:
routes.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,mountain_passes_ids,municipalities_ids,alt,gpx_link,difficulty_score,old_name
0,1117,Eulate y Opakua por Valle de Yerri.,,,"[-2.10989,42.77131]","[-2.3135,42.7945]",58,49,1097,516,1027,"[596, 776]",[4888],"[625.032, 623.092, 623.011, 622.007, 618.058, ...",,3,Artaza - Puerto de Opakua - Parque Natural Urbasa
1,3338,San Pelaio por Gernika-Lumo y Bakio.,,,"[-2.683899,43.304251]","[-2.799765,43.431816]",56,58,943,2,312,[722],"[7544, 7510]","[17.314, 16.954, 15.309, 15.108, 15.494, 15.81...",,3,Gernika - Bermeo - San Juan de Gaztelugatxe - ...
2,2447,Sagüera De Luna por Soto y Amío.,,,"[-5.740471,42.783061]","[-5.841873,42.777686]",53,122,1640,926,1251,[907],[3769],"[1075.452, 1075.453, 1075.608, 1075.608, 1076....",,5,Comarca de LUNA=Ruta 1 de 2
3,8404,Coll De Jouet por Berga y Sant Llorenç de Moru...,,,"[1.854248,42.108257]","[1.561878,42.0344]",51,90,1749,577,1268,[808],"[887, 3994]","[794.018, 793.098, 791.017, 788.008, 787.042, ...",,5,BERGA-AVIÁ-S.LLORENS DE MORUNY-BERGA
4,2449,Curueña y Andarraso por Soto y Amío.,,,"[-5.940483,42.7426]","[-6.031129,42.809684]",51,89,1912,1007,1412,"[332, 513, 1025]",[3769],"[1008.685, 1008.689, 1008.817, 1008.808, 1008....",,5,"Comarca de OMAÑA,Ruta 2 de 3"


Since the value in the column *name* actually matches the name of our parsed *gpx* files (but NOT the name of the file!) our function will have to walk parse the gpx files one at a time, check its name against the *name* column and re-assign it to its *id* if there's a positive match.

In [13]:
#Creating a function that meets our criteria.

def parser(file):
    """
    Input: gpx file.
    
    Output: new gpx file with its internal values (name, description, comment, creator, waypoints) modified or deleted.
    
    """
    try: 
        gpx_file = open(file, 'r', encoding='utf-8') #Opening our file.
        gpx = gpxpy.parse(gpx_file) #Parsing it.
        track = gpx.tracks[0]
        segment = track.segments[0]
        name = track.name #Storing the route's name as a variable.

        for i in range(len(routes)): #Checking for the name in our dataframe.
            if routes['old_name'].iloc[i] == name:
                track.name = str(routes['ID'].iloc[i]) #Changing the name for its ID.
                track.description = routes['name'].iloc[i] + '.' #Changing description to the new name.
                track.comment = '' #Changing comment.
                gpx.creator = 'https://www.on2wheels.es/' #Adding a creator.
                gpx.waypoints = '' #Deleting all waypoints.
                with open(track.name + '.gpx', "w") as f:
                    f.write( gpx.to_xml()) #Saving our route with the new name.
            else:
                pass
    except:
        try:
            gpx_file = open(file, 'r') #Trying to parse without encoding as a backup.
            gpx = gpxpy.parse(gpx_file) #Parsing it.
            track = gpx.tracks[0]
            segment = track.segments[0]
            name = track.name #Storing the route's name as a variable.

            for i in range(len(routes)): #Checking for the name in our dataframe.
                if routes['name'].iloc[i] == name:
                    track.name = str(routes['ID'].iloc[i]) #Changing the name for its ID.
                    track.description = 'Route number ' + track.name + '.' #Changing description.
                    track.comment = '' #Changing comment.
                    gpx.creator = 'https://www.on2wheels.es/' #Adding a creator.
                    gpx.waypoints = '' #Deleting all waypoints.
                    with open(track.name + '.gpx', "w") as f:
                        f.write( gpx.to_xml()) #Saving our route with the new name.
                else:
                    pass
        except:
            pass

The function is ready but we need a way to make it walk through all gpx files in a directory. We can easily achieve this by using **Pathlib**. We will also be using **Time** to benchmark our function.

In [14]:
#Creating the final function.

def gpx_cleaner():
    """
    Input: none, but all target gpx files must be in a folder named 'gpx'.
    
    Output: new gpx files created as per 'parser' function.
    
    """
    start = time.time() #Starting our fimer.

    directory = 'gpx' #The folder containing the gpx files.

    files = Path(directory).glob('*') #Using all files in the folder as input.
    for file in files:
        parser(file) #Applying the previous function to every file.

    stop = time.time() #Stopping our timer.
    duration = (stop - start) / 60
    
    return print('Minutes:', duration) #Returning the elapsed minutes.

## Function testing

In [15]:
# Testing: 50 files.

gpx_cleaner()

Minutes: 170.92400506734847


5 new *gpx* files have been successfully created! The function works as expected. Since it took about 0.34 minutes to clean 100 files, our full folder will take about 3 hours.

# Using our final function

Now it's simply a matter of placing all our *gpx* files inside the designated folder and running our function.

In [34]:
gpx_cleaner()

Minutes: 255.2301216204961


**<div align="right">Ironhack DA PT 2021</div>**
    
**<div align="right">Xavier Esteban</div>**