# Creating a pipeline for new ports

Our routes database is quite extensive, but in the future we will probable be adding new ports. It's very probable that those new ports already have routes that climb them in our database, so it would be in our best interest to combine all necessary data manipulation and parsing steps in a single notebook (or function) that can be re-run on demand.

## Cleaning our original routes dataframe for testing

Before we begin creating the pipeline we will clean and manipulate our original routes dataframe so that it meets our requirements.

In [1]:
import pandas as pd
import haversine as hs
import time
import math
import folium

In [2]:
routes = pd.read_csv('master_1407.csv')

In [3]:
routes.head()

Unnamed: 0,ID,nombre,ccaa,provincia,coords,alt,start,midpoint,distancia,desnivel,min_alt,max_alt,municipios,puertos,trailrank,url
0,0,01-Madrid - Motilla del Palancar,,,"[(40.39467, -3.67912), (40.39546, -3.67998), (...","[592.065, 597.068, 596.014, 597.008, 598.067, ...","(40.39467, -3.67912)","(40.09315, -2.891046)",229,1884,544,976,,,27,https://es.wikiloc.com/rutas-ciclismo/01-madri...
1,1,01-MAY-16 ALMÁCERA-BÉTERA-OLOCAU-GÁTOVA-ALTO D...,,,"[(39.510125, -0.355943), (39.510517, -0.35574)...","[-79.616, -79.676, -79.613, -79.208, -79.662, ...","(39.510125, -0.355943)","(39.809736, -0.515215)",117,1292,0,729,,,21,https://es.wikiloc.com/rutas-ciclismo/01-may-1...
2,2,"02-AGO-15 Coll de La Gallina, Port de Beixalís...",,,"[(42.511074, 1.549479), (42.511086, 1.549457),...","[1054.713, 1059.043, 1064.307, 1064.808, 1069....","(42.511074, 1.549479)","(42.532589, 1.561706)",93,2850,912,2082,,,62,https://es.wikiloc.com/rutas-ciclismo/02-ago-1...
3,3,02-Motilla del Palancar - Valencia,,,"[(39.561199, -1.906015), (39.561199, -1.906015...","[665.256, 665.259, 665.214, 665.208, 665.036, ...","(39.561199, -1.906015)","(39.374283, -1.012429)",167,1001,0,734,,,38,https://es.wikiloc.com/rutas-ciclismo/02-motil...
4,4,05-ABR-15 Les Tres Cales,,,"[(40.913227, 0.804593), (40.913242, 0.804572),...","[63.634, 63.155, 59.71, 59.307, 56.462, 54.985...","(40.913227, 0.804593)","(40.905964, 0.740497)",27,416,25,191,,,27,https://es.wikiloc.com/rutas-ciclismo/05-abr-1...


In [4]:
#Renaming the columns.

routes.rename(columns = {'nombre': 'name', 'provincia': 'province', 'distancia': 'distance', 'desnivel': 'gradient', 'municipios': 'municipalities_ids', 'puertos': 'mountain_passes_ids'}, inplace = True)

In [5]:
routes.head(1)

Unnamed: 0,ID,name,ccaa,province,coords,alt,start,midpoint,distance,gradient,min_alt,max_alt,municipalities_ids,mountain_passes_ids,trailrank,url
0,0,01-Madrid - Motilla del Palancar,,,"[(40.39467, -3.67912), (40.39546, -3.67998), (...","[592.065, 597.068, 596.014, 597.008, 598.067, ...","(40.39467, -3.67912)","(40.09315, -2.891046)",229,1884,544,976,,,27,https://es.wikiloc.com/rutas-ciclismo/01-madri...


In [6]:
#Creating a new column for the gpx file url.

routes['gpx_link'] = None

In [7]:
#Re-ordering the columns.

routes = routes[['ID', 'name', 'ccaa', 'province', 'start', 'midpoint', 'trailrank', 'distance', 'gradient', 'min_alt', 'max_alt', 'mountain_passes_ids', 'municipalities_ids', 'coords', 'alt','gpx_link']]

In [8]:
#Deleting extremely short, long or high routes.

routes = routes[routes['distance'] < 230]
routes = routes[routes['distance'] > 30]
routes = routes[routes['gradient'] < 4700]

In [9]:
#Resetting the index.

routes = routes.reset_index(drop=True)

In [10]:
routes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9497 entries, 0 to 9496
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   9497 non-null   int64  
 1   name                 9497 non-null   object 
 2   ccaa                 0 non-null      float64
 3   province             0 non-null      float64
 4   start                9497 non-null   object 
 5   midpoint             9497 non-null   object 
 6   trailrank            9497 non-null   int64  
 7   distance             9497 non-null   int64  
 8   gradient             9497 non-null   int64  
 9   min_alt              9497 non-null   int64  
 10  max_alt              9497 non-null   int64  
 11  mountain_passes_ids  0 non-null      float64
 12  municipalities_ids   0 non-null      float64
 13  coords               9497 non-null   object 
 14  alt                  9497 non-null   object 
 15  gpx_link             0 non-null      o

## Deleting non-circular routes

We only want circular routes, so we will create a new column with the last coordinate of the route and calculate its distance from the start point. Routes where that distance exceeds 2Km will be deleted.

In [11]:
#Creating column to hold finish coordinates.

routes['finish'] = None

In [12]:
#Extracting the finish coordinates as the last tuple in the 'coords' list.

for i in range(len(routes)):
    routes['finish'].iloc[i] = eval(routes['coords'].iloc[i])[-1]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [13]:
#Creating a dummy column.

routes['is_circular'] = None

In [14]:
#Populating it with 'yes' if the start and finish are less than 2Km apart. Otherwise it's a 'no'.

start = time.time()

for i in range(len(routes)):
    if hs.haversine(eval(routes['start'].iloc[i]), routes['finish'].iloc[i]) <= 2:
        routes['is_circular'].iloc[i] = 'yes'
    else:
        routes['is_circular'].iloc[i] = 'no'
        
stop = time.time() 
duration = (stop - start) / 60
print('Minutes:', duration)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


Minutes: 0.07371410131454467


In [15]:
#Deleting non-circular routes and the useless columns:

routes = routes[routes['is_circular'] == 'yes']
routes.drop(['is_circular', 'finish'], axis=1, inplace=True)

In [16]:
#Reindexing.

routes = routes.reset_index(drop=True)

In [17]:
#We're down to 8651 routes.

routes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8651 entries, 0 to 8650
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   8651 non-null   int64  
 1   name                 8651 non-null   object 
 2   ccaa                 0 non-null      float64
 3   province             0 non-null      float64
 4   start                8651 non-null   object 
 5   midpoint             8651 non-null   object 
 6   trailrank            8651 non-null   int64  
 7   distance             8651 non-null   int64  
 8   gradient             8651 non-null   int64  
 9   min_alt              8651 non-null   int64  
 10  max_alt              8651 non-null   int64  
 11  mountain_passes_ids  0 non-null      float64
 12  municipalities_ids   0 non-null      float64
 13  coords               8651 non-null   object 
 14  alt                  8651 non-null   object 
 15  gpx_link             0 non-null      o

# Extracting which ports pass through each route

Now that we've cleaned our routes dataframe it's time to search for which ports are climbed in every route.

In [18]:
#Loading our ports dataset.

ports = pd.read_csv('puertos.csv')

In [27]:
#This function checks if two points are less than 80Km apart.

def isnear(a, b):
    if hs.haversine(eval(routes['midpoint'].iloc[a]), eval(ports['peak_coords'].iloc[b])) < 80:
                    return 'Yes'
    else:
                    return 'No'

In [34]:
#This function returns a dataframe of route ID and ports.

start = time.time()

dict_list = []

for i in range(len(routes)):
    lista_puertos = []
    for p in range(len(ports)):
        if isnear(i, p) == 'Yes':
            new_c = eval(routes['coords'].iloc[i])
            for n in new_c[0::30]:
                if hs.haversine(n, eval(ports['peak_coords'].iloc[p])) < 0.3:
                    if ports['ID'].iloc[p] not in lista_puertos:
                        lista_puertos.append(ports['ID'].iloc[p])
                    else:
                        pass
                else:
                    pass
    new = {'ruta': routes['ID'].iloc[i], 'puertos': lista_puertos}
    dict_list.append(new)  
    
test = pd.DataFrame(dict_list)

stop = time.time() 
duration = (stop - start) / 60
print('Minutes:', duration)

Minutes: 585.5805875380834


In [40]:
#Let's check our results. All routes with geolocated ports have them in a list.

test.head(5)

Unnamed: 0,ruta,puertos
0,1,[]
1,2,[]
2,6,[]
3,9,[]
4,11,"[378, 394]"


In [37]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8651 entries, 0 to 8650
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   ruta     8651 non-null   int64 
 1   puertos  8651 non-null   object
dtypes: int64(1), object(1)
memory usage: 135.3+ KB


In [41]:
#Since both dataframes share the same index we can simply populate the mountain_passes_ids with a list.

l = test['puertos'].tolist() #Casting the column containing port IDs as a list.
routes['mountain_passes_ids'] = l #Using the list to populate the column.

In [43]:
#Let's save this dataframe.

routes.to_csv('routes_2807_all.csv', index=False)

## Selecting the best route for each port

Now it's time to keep just the best route for each and every port.

Since our ports are ordered by importance, we can order our routes from best to worst (based on trailrank) and iterate through all port IDs, keeping just the first route with that port.

In [202]:
#Creating a list of all port IDs.

port_list = ports['ID'].tolist()

In [203]:
len(port_list)

1107

In [204]:
#Ordering our routes by trailrank.

routes = routes.sort_values('trailrank', ascending=False)

In [205]:
dict_list = []
spent_routes = []

for i in port_list:
    port_ID = i
    route = 'None'
    for n in range(len(routes)):
        for p in routes['mountain_passes_ids'].iloc[n]:
            if p == i:
                if routes['ID'].iloc[n] not in spent_routes:
                    route = routes['ID'].iloc[n]
                    spent_routes.append(routes['ID'].iloc[n])
                else:
                    pass
            else:
                pass
    port_dict = {'ID': port_ID, 'route': route}
    dict_list.append(port_dict)

In [206]:
#Creating a dataframe with our list of dictionaries.

port_df = pd.DataFrame(dict_list)

In [207]:
#The first column is the port ID, while the second one is the ID of the best route.

port_df.head()

Unnamed: 0,ID,route
0,0,
1,1,
2,2,
3,3,
4,4,


It's time to create a final dataframe containing only the best route for each port, making use of this little dataframe.

In [208]:
#Creating a list for all the best routes IDs.

id_list = port_df['route'].to_list()

In [209]:
#Creating a new dataframe using those filtered values.

df_final = routes[routes['ID'].isin(id_list)]

In [210]:
#Adding route score (re-using functions from notebook 6).

df_final['difficulty_score'] = None

def score(gradient, distance):
    if gradient > 4000:
        return 10
    else: #Checking for the first conditions.
        return math.ceil((gradient - 100)*(1.5/975) + (distance - 30)*(1/75) + 1) #We're rounding the score to the next integer.
    
for i in range(len(df_final)):
    df_final['difficulty_score'].iloc[i] = score(df_final['gradient'].iloc[i], df_final['distance'].iloc[i])

In [211]:
#Inspecting the dataframe.

df_final.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,mountain_passes_ids,municipalities_ids,coords,alt,gpx_link,difficulty_score


# Matching routes with towns

Now that all routes have a column with the ports that can be climbed through them, it's time to do the same with nearby towns.
Most of these steps are being re-used from the first notebook.

In [84]:
#Importing towns.

towns = pd.read_csv('towns_1707.csv')

In [85]:
#Renaming the columns.

towns.rename(columns = {'provincia': 'province', 'municipio': 'municipality', 'poblacion': 'municipality_inhabitants', 'superficie': 'geographic_area', 'altitud': 'alt', 'num_rutas': 'routes_number', 'rutas': 'routes_ids'}, inplace = True)

In [86]:
#Adding a column to store all mountain passes that can be accessed from each town.

towns['mountain_passes_ids'] = None

In [87]:
#Reordering the columns.

towns = towns[['ID', 'municipality', 'ccaa', 'province', 'municipality_inhabitants', 'geographic_area', 'radius', 'routes_number', 'routes_ids', 'mountain_passes_ids', 'coords']]

In [88]:
#Adding the CCAA is easily done by using a simple loop.

for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Álava', 'Bizkaia', 'Gipuzkoa']:
        towns['ccaa'].iloc[i] = 'País Vasco'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Almería', 'Cádiz', 'Córdoba', 'Granada', 'Huelva', 'Jaén', 'Málaga', 'Sevilla']:
        towns['ccaa'].iloc[i] = 'Andalucía'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Huesca', 'Teruel', 'Zaragoza']:
        towns['ccaa'].iloc[i] = 'Aragón'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Asturias']:
        towns['ccaa'].iloc[i] = 'Asturias'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Illes Balears']:
        towns['ccaa'].iloc[i] = 'Illes Balears'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Las Palmas', 'Santa Cruz de Tenerife']:
        towns['ccaa'].iloc[i] = 'Canarias'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Cantabria']:
        towns['ccaa'].iloc[i] = 'Cantabria'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Albacete', 'Ciudad Real', 'Cuenca', 'Guadalajara', 'Toledo']:
        towns['ccaa'].iloc[i] = 'Castilla-La Mancha'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Barcelona', 'Girona', 'Lleida', 'Tarragona']:
        towns['ccaa'].iloc[i] = 'Cataluña'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Badajoz', 'Cáceres']:
        towns['ccaa'].iloc[i] = 'Extremadura'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['A Coruña', 'Luga', 'Ourense', 'Pontevedra']:
        towns['ccaa'].iloc[i] = 'Galicia'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Alicante', 'Castellón', 'Valencia']:
        towns['ccaa'].iloc[i] = 'Comunitat Valenciana'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Madrid']:
        towns['ccaa'].iloc[i] = 'Comunidad de Madrid'
    else:
        pass
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Murcia']:
        towns['ccaa'].iloc[i] = 'Región de Murcia'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Navarra']:
        towns['ccaa'].iloc[i] = 'Navarra'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['La Rioja']:
        towns['ccaa'].iloc[i] = 'La Rioja'
    else:
        pass

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [96]:
#Deleting the columns that we must re-populate.

towns[['routes_number', 'routes_ids']] = None

In [97]:
towns.head()

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords
0,7609,Arratzu,País Vasco,Bizkaia,413,10.0585,1.789333,,,,"[-2.64115701,43.30524234]"
1,7608,Zierbena,País Vasco,Bizkaia,1520,12.1542,1.966925,,,,"[-3.0901373,43.34554558]"
2,7607,Alonsotegi,País Vasco,Bizkaia,2879,20.2176,2.536818,,,,"[-2.98785093,43.24467414]"
3,7603,Murueta,País Vasco,Bizkaia,319,5.6299,1.338674,,,,"[-2.68171532,43.35143893]"
4,7602,Kortezubi,País Vasco,Bizkaia,442,11.861,1.943056,,,,"[-2.65598385,43.34124315]"


In [123]:
#Adding a column for MongoDB coords.

towns['coords_MDB'] = towns['coords']

In [126]:
#Converting the coordinates back to the normal format in our 'coords' column.

for i in range(len(towns)):
    towns['coords'].iloc[i] = '(' + str(eval(towns['coords'].iloc[i])[1]) + ',' + str(eval(towns['coords'].iloc[i])[0]) + ')'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [99]:
#Finally, let's change the name of our dataframe so that we don't run into any conflicts with variable names.

routes = df_final

We can finally match routes with towns.

In [128]:
#This function takes two dataframes (towns and routes) and looks for towns 1.5Km or less from any point of our routes.

def nearby_routes(routes, towns):
    dict_list = []
    for g in range(len(routes)):
        new_c = eval(routes['coords'].iloc[g])
        route_list = []
        for i in new_c[0::60]:
            for n in range(len(towns)):
                try:
                    if hs.haversine((i), eval(towns['coords'].iloc[n])) - towns['radius'].iloc[n] < 1.5:
                        if towns['ID'].iloc[n] not in route_list:
                            route_list.append(towns['ID'].iloc[n])
                        else:
                            pass
                except:
                    pass
        dict_routes = {'route': routes['ID'].iloc[g], 'town': route_list}
        dict_list.append(dict_routes)
    return dict_list

In [131]:
#Using the function on our dataframes. This block of code performs all necessary transformations on it.

start = time.time() #Starting a timer.

test1 = pd.DataFrame(nearby_routes(routes, towns))

df_exploded = test1.explode('town')

town_list = towns['ID'].tolist()

#Using a simple loop to generate a dictionary of nearby routes for every town, and adding that dictionary to a list.

dict_list = []

for i in town_list:
    try: 
        lista_rutas = []
        for n in range(len(df_exploded)):
            if df_exploded['town'].iloc[n] == i:
                lista_rutas.append(df_exploded['route'].iloc[n])
            else:
                pass
        dict_ruta = {'municipio': i, 'rutas': lista_rutas, 'numero_rutas': len(lista_rutas)}
        dict_list.append(dict_ruta)
    except:
        pass
    
#Creating a dataframe out of the dict list.

df_dict = pd.DataFrame.from_dict(dict_list)


final_list  = df_dict['rutas'].tolist()
num_routes = df_dict['numero_rutas'].tolist()
towns['routes_ids'] = final_list
towns['routes_number'] = num_routes


stop = time.time() 
duration = (stop - start) / 60
print('Minutes:', duration)

Minutes: 27.250480163097382


In [133]:
towns.head()

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords,rutas,num_rutas,coords_MDB
0,7609,Arratzu,País Vasco,Bizkaia,413,10.0585,1.789333,4,"[9025, 1522, 10215, 9764]",,"(43.30524234,-2.64115701)",[],0,"[-2.64115701,43.30524234]"
1,7608,Zierbena,País Vasco,Bizkaia,1520,12.1542,1.966925,11,"[2221, 9022, 9369, 3908, 6672, 1615, 1277, 752...",,"(43.34554558,-3.0901373)",[],0,"[-3.0901373,43.34554558]"
2,7607,Alonsotegi,País Vasco,Bizkaia,2879,20.2176,2.536818,19,"[9022, 6244, 5692, 3290, 6938, 9960, 1615, 127...",,"(43.24467414,-2.98785093)",[],0,"[-2.98785093,43.24467414]"
3,7603,Murueta,País Vasco,Bizkaia,319,5.6299,1.338674,5,"[9025, 1522, 10215, 10216, 9764]",,"(43.35143893,-2.68171532)",[],0,"[-2.68171532,43.35143893]"
4,7602,Kortezubi,País Vasco,Bizkaia,442,11.861,1.943056,6,"[3338, 9025, 1522, 10215, 10216, 9764]",,"(43.34124315,-2.65598385)",[],0,"[-2.65598385,43.34124315]"


## Filtering our towns

We only want to keep towns with 3 or more routes.

In [134]:
towns = towns[towns['routes_number'] >= 3]

In [135]:
towns.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 486 entries, 0 to 726
Data columns (total 14 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        486 non-null    int64  
 1   municipality              486 non-null    object 
 2   ccaa                      443 non-null    object 
 3   province                  486 non-null    object 
 4   municipality_inhabitants  486 non-null    int64  
 5   geographic_area           486 non-null    float64
 6   radius                    486 non-null    float64
 7   routes_number             486 non-null    int64  
 8   routes_ids                486 non-null    object 
 9   mountain_passes_ids       0 non-null      object 
 10  coords                    486 non-null    object 
 11  rutas                     486 non-null    object 
 12  num_rutas                 486 non-null    int64  
 13  coords_MDB                486 non-null    object 
dtypes: float64

Now let's delete towns that share the same routes and keep the biggest one.

In [137]:
#Creating a loop that evaluates each route_id string as a list and orders it.

for i in range(len(towns)):
    ordered = towns['routes_ids'].iloc[i]
    ordered.sort()#Evaluating the string as a list and sorting it.
    towns['routes_ids'].iloc[i] = str(ordered) #Replacing the non-ordered list with the new one (as a string).

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [138]:
#Now we can simply call drop_duplicates on that column to get rid of duplicate routes, but before that we will order our
#dataframe by town size so that we actually keep the bigger town.

towns = towns.sort_values('municipality_inhabitants', ascending=False)
towns = towns.drop_duplicates('routes_ids', keep='first')

## Keeping the biggest town for each zone 

Right now we have about 445 destinations with 3 or more routes that climb unique mountain passes, but thanks to Spain's wonderfully uneven orography many of those destinations are clustered very densely.

This issue might be solved by establishing a filter or our making that discards secondary locations if they are less than a given distance from another, bigger destination.

First of all let's visually inspect the clusters using **Folium**.

In [142]:
map = folium.Map(location=[40.4167, -3.70325], zoom_start=6, tiles="OpenStreetMap") #Initiating our map.

for i in range(len(towns)):
    folium.Marker(eval(towns['coords'].iloc[i]), tooltip=towns['municipality'].iloc[i]).add_to(map)
    
map #Displaying the map.

In [143]:
#Let's use a function from the notebook 4 that keeps the biggest town in a given radius.

def spacer(municipalities, distance):
    df = municipalities.copy() #Making a copy of our dataframe.

    for i in range(len(df)): #Iterating through all towns.
        try:
            for n in range(len(df)): #Iterating through every permutation between towns to check their distance.
                try:
                        if hs.haversine(eval(df['coords'].iloc[i]), eval(df['coords'].iloc[n])) < distance:
                            if df['municipality'].iloc[i] != df['municipality'].iloc[n]: #Checking for itself.
                                df = df.drop([n]) #Dropping the smaller town.
                                df = df.reset_index(drop=True) #Re-indexing.
                except:
                    pass
        except:
            pass

    map = folium.Map(location=[40.4167, -3.70325], zoom_start=6, tiles="OpenStreetMap") #Initiating our map.

    for i in range(len(df)):
        folium.Marker(eval(df['coords'].iloc[i]), tooltip=df['municipality'].iloc[i]).add_to(map) #Adding each town.
        
    return df

In [145]:
#Applying the function.

spaced_towns = spacer(towns, 15)

In [146]:
#We're down to 155 entries, let's visualize them on a map.

spaced_towns.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 155 entries, 0 to 154
Data columns (total 14 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        155 non-null    int64  
 1   municipality              155 non-null    object 
 2   ccaa                      131 non-null    object 
 3   province                  155 non-null    object 
 4   municipality_inhabitants  155 non-null    int64  
 5   geographic_area           155 non-null    float64
 6   radius                    155 non-null    float64
 7   routes_number             155 non-null    int64  
 8   routes_ids                155 non-null    object 
 9   mountain_passes_ids       0 non-null      object 
 10  coords                    155 non-null    object 
 11  rutas                     155 non-null    object 
 12  num_rutas                 155 non-null    int64  
 13  coords_MDB                155 non-null    object 
dtypes: float64

In [148]:
map = folium.Map(location=[40.4167, -3.70325], zoom_start=6, tiles="OpenStreetMap") #Initiating our map.

for i in range(len(spaced_towns)):
    folium.Marker(eval(spaced_towns['coords'].iloc[i]), tooltip=spaced_towns['municipality'].iloc[i]).add_to(map)
    
map 

In [157]:
#Re-assigning name.

towns = spaced_towns

In [158]:
towns.head()

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords,rutas,num_rutas,coords_MDB
0,884,Barcelona,Cataluña,Barcelona,1664182,100.7644,5.663411,3,"[1247, 6787, 10190]",,"(41.38424664,2.17634927)",[],0,"[2.17634927,41.38424664]"
1,7257,València,Comunitat Valenciana,Valencia,800215,139.2687,6.658115,3,"[486, 1460, 9283]",,"(39.47534441,-0.37565717)",[],0,"[-0.37565717,39.47534441]"
2,4613,Murcia,Región de Murcia,Murcia,459403,885.1149,16.785117,6,"[691, 1691, 6089, 6099, 6769, 8083]",,"(37.98436361,-1.1285408)",[],0,"[-1.1285408,37.98436361]"
3,151,Alicante,Comunitat Valenciana,Alicante,337482,201.265845,8.004046,9,"[301, 484, 678, 1239, 1974, 9425, 9503, 9550, ...",,"(38.34548705,-0.4831832)",[],0,"[-0.4831832,38.34548705]"
4,2076,Córdoba,Andalucía,Córdoba,326039,1254.9326,19.986408,3,"[2620, 2642, 7498]",,"(37.87954225,-4.78032455)",[],0,"[-4.78032455,37.87954225]"


# Adding ports to our towns dataframe

The last step for our *towns* dataframe will be adding the ports that can be climbed from each town.<br>

While it isn't the most elegants of solutions, we will be using a simple loop to iterate through all towns, routes and ports to append the result to the corresponding row in the towns dataframe.

In [159]:
#Running our loop.

for i in range(len(towns)): #Iterating through each town.
    list_ports = [] #This list holds ports.
    list_routes = eval(towns['routes_ids'].iloc[i]) #Using eval to iterate through each port of a route as a list.
    for n in list_routes:
        port = routes[routes['ID'] == n]['mountain_passes_ids'].iloc[0] #Grabbing the port.
        list_ports.append(port) #Appending the list of ports to the town.
    towns['mountain_passes_ids'].iloc[i] = list_ports #Assinging the ports.

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-d

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
  arr_value = np.array(value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-d

Now we have the ports as a list of lists, with duplicated ports. Let's fix that using the unlimited power of loops.

In [160]:
for i in range(len(towns)): #Iterating through teach town.
    final_list = []
    for n in towns['mountain_passes_ids'].iloc[i]:
        for p in n:
            if p not in final_list: #Not appending duplicates.
                final_list.append(p) #Appending the port.
    towns['mountain_passes_ids'].iloc[i] = final_list #Assigning the port list to the column.

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [165]:
#Dropping two redundant columns.

towns.drop(['rutas', 'num_rutas'], axis=1, inplace=True)

In [166]:
#Checking the results.

towns.head()

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords,coords_MDB
0,884,Barcelona,Cataluña,Barcelona,1664182,100.7644,5.663411,3,"[1247, 6787, 10190]","[388, 315, 323, 822]","(41.38424664,2.17634927)","[2.17634927,41.38424664]"
1,7257,València,Comunitat Valenciana,Valencia,800215,139.2687,6.658115,3,"[486, 1460, 9283]","[238, 533, 906]","(39.47534441,-0.37565717)","[-0.37565717,39.47534441]"
2,4613,Murcia,Región de Murcia,Murcia,459403,885.1149,16.785117,6,"[691, 1691, 6089, 6099, 6769, 8083]","[11, 787, 1055, 978, 1001, 1012, 1103]","(37.98436361,-1.1285408)","[-1.1285408,37.98436361]"
3,151,Alicante,Comunitat Valenciana,Alicante,337482,201.265845,8.004046,9,"[301, 484, 678, 1239, 1974, 9425, 9503, 9550, ...","[644, 956, 534, 918, 429, 601, 626, 643, 802, ...","(38.34548705,-0.4831832)","[-0.4831832,38.34548705]"
4,2076,Córdoba,Andalucía,Córdoba,326039,1254.9326,19.986408,3,"[2620, 2642, 7498]","[944, 925, 964]","(37.87954225,-4.78032455)","[-4.78032455,37.87954225]"


In [162]:
towns.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 155 entries, 0 to 154
Data columns (total 14 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        155 non-null    int64  
 1   municipality              155 non-null    object 
 2   ccaa                      131 non-null    object 
 3   province                  155 non-null    object 
 4   municipality_inhabitants  155 non-null    int64  
 5   geographic_area           155 non-null    float64
 6   radius                    155 non-null    float64
 7   routes_number             155 non-null    int64  
 8   routes_ids                155 non-null    object 
 9   mountain_passes_ids       155 non-null    object 
 10  coords                    155 non-null    object 
 11  rutas                     155 non-null    object 
 12  num_rutas                 155 non-null    int64  
 13  coords_MDB                155 non-null    object 
dtypes: float64

In [167]:
#Exporting the results.

towns.to_csv('towns_2807_155.csv', index=False)

# Finishing our routes dataframe

Before the dataframe is ready for **MongoDB** integration we must solve some issues:

- Coordinates must be inverted (lat/long -> long/lat).
- The route must be assigned a custom name.


Thankfully we did all those things in the past, so it's simply a matter of re-using our previously defined functions. 

## Adding cycling destinations near a route

Our first step will be finding which towns host a route. This can be achieved by using our *towns* dataframe.

In [169]:
#We will be using nested loops to add each town ID to the routes dataframe.

for i in range(len(routes)):
    town_list = []
    for n in range(len(towns)):
        for p in eval(towns['routes_ids'].iloc[n]):
            if p == routes['ID'].iloc[i]:
                town_list.append(towns['ID'].iloc[n])
    routes['municipalities_ids'].iloc[i] = str(town_list)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [172]:
#We can see that many routed don't have a municipality, that's because they got wiped out during our filtering. 

routes.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,mountain_passes_ids,municipalities_ids,coords,alt,gpx_link,difficulty_score
4623,5727,Prado Llano por las Sabinas - Sierra nevada - ...,,,"(37.182527, -3.598462)","(37.09466, -3.39311)",66,74,1799,672,2373,[427],[],"[(37.182527, -3.598462), (37.18257, -3.598902)...","[700.077, 699.061, 699.027, 696.006, 694.094, ...",,5
908,1117,Artaza - Puerto de Opakua - Parque Natural Urbasa,,,"(42.77131, -2.10989)","(42.7945, -2.3135)",58,49,1097,516,1027,"[596, 776]",[4888],"[(42.77131, -2.10989), (42.7712, -2.10991), (4...","[625.032, 623.092, 623.011, 622.007, 618.058, ...",,3
2688,3338,Gernika - Bermeo - San Juan de Gaztelugatxe - ...,,,"(43.304251, -2.683899)","(43.431816, -2.799765)",56,58,943,2,312,[722],"[7544, 7510]","[(43.304251, -2.683899), (43.304237, -2.683159...","[17.314, 16.954, 15.309, 15.108, 15.494, 15.81...",,3
1972,2447,Comarca de LUNA=Ruta 1 de 2,,,"(42.783061, -5.740471)","(42.777686, -5.841873)",53,122,1640,926,1251,[907],[3769],"[(42.783061, -5.740471), (42.782985, -5.740429...","[1075.452, 1075.453, 1075.608, 1075.608, 1076....",,5
6763,8404,BERGA-AVIÁ-S.LLORENS DE MORUNY-BERGA,,,"(42.108257,1.854248)","(42.0344, 1.561878)",51,90,1749,577,1268,[808],"[887, 3994]","[(42.108257, 1.854248), (42.108178, 1.854208),...","[794.018, 793.098, 791.017, 788.008, 787.042, ...",,5


In [174]:
#It's simply a matter of filtering them out.

routes = routes[routes['municipalities_ids'] != '[]']

## Creating a custom name

As in notebook 2, we will be using two separate functions and combine them together.

In [185]:
#Defining our first function:

def name_creator(routes, towns, ports):
    """
    Input : dataframes of routes, towns and ports.
    
    Output: dataframe with the custom names containing both ports and towns.
    
    """
    routes = routes.copy()
    for i in range(len(routes)): #Iterating through each route.
        list_ports = routes['mountain_passes_ids'].iloc[i] #Generating a list of ports.
        list_towns = eval(routes['municipalities_ids'].iloc[i]) #The same procedure for the towns.
        routes['name'].iloc[i] = composer(list_ports, list_towns, towns, ports) #Assigning the name returned by the second function.
        
    return routes

In [186]:
#Defining the second one:

def composer(list_ports, list_towns, towns, ports):
    """
    Input : two lists of port and town IDs, towns and ports dataframes.
    
    Output: custom name containing those ports and towns.
    
    """
    if len(list_ports) == 1 and len(list_towns) == 1: #First case, 1 port 1 town.
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + '.'
    elif len(list_ports) == 1 and len(list_towns) == 2: 
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + ' y ' + towns[towns['ID'] == list_towns[1]]['municipality'].iloc[0] + '.'
    elif len(list_ports) == 1 and len(list_towns) > 2: 
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + ' y ' + towns[towns['ID'] == list_towns[1]]['municipality'].iloc[0] + '.'
    elif len(list_ports) == 2 and len(list_towns) == 1: #Second case, 2 ports 1 town.
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' y ' + ports[ports['ID'] == list_ports[1]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + '.'
    elif len(list_ports) > 2 and len(list_towns) == 1: #Second case, 2 ports 1 town.
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' y ' + ports[ports['ID'] == list_ports[1]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + '.'
    elif len(list_ports) > 2 and len(list_towns) > 2: #Third case, 2 ports 2 towns (etc).
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' y ' + ports[ports['ID'] == list_ports[1]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + ' y ' + towns[towns['ID'] == list_towns[1]]['municipality'].iloc[0] + '.'
    elif len(list_ports) > 2 and len(list_towns) == 2: #Third case, 2 ports 2 towns (etc).
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' y ' + ports[ports['ID'] == list_ports[1]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + ' y ' + towns[towns['ID'] == list_towns[1]]['municipality'].iloc[0] + '.'
    elif len(list_ports) == 2 and len(list_towns) > 2: #Third case, 2 ports 2 towns (etc).
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' y ' + ports[ports['ID'] == list_ports[1]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + ' y ' + towns[towns['ID'] == list_towns[1]]['municipality'].iloc[0] + '.'
    elif len(list_ports) == 2 and len(list_towns) == 2: #Third case, 2 ports 2 towns (etc).
        return ports[ports['ID'] == list_ports[0]]['name'].iloc[0] + ' y ' + ports[ports['ID'] == list_ports[1]]['name'].iloc[0] + ' por ' + towns[towns['ID'] == list_towns[0]]['municipality'].iloc[0] + ' y ' + towns[towns['ID'] == list_towns[1]]['municipality'].iloc[0] + '.'

In [187]:
#Running our functions:

test = name_creator(routes, towns, ports)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_

In [190]:
test.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,mountain_passes_ids,municipalities_ids,coords,alt,gpx_link,difficulty_score
908,1117,Eulate y Opakua por Valle de Yerri.,,,"(42.77131, -2.10989)","(42.7945, -2.3135)",58,49,1097,516,1027,"[596, 776]",[4888],"[(42.77131, -2.10989), (42.7712, -2.10991), (4...","[625.032, 623.092, 623.011, 622.007, 618.058, ...",,3
2688,3338,San Pelaio por Gernika-Lumo y Bakio.,,,"(43.304251, -2.683899)","(43.431816, -2.799765)",56,58,943,2,312,[722],"[7544, 7510]","[(43.304251, -2.683899), (43.304237, -2.683159...","[17.314, 16.954, 15.309, 15.108, 15.494, 15.81...",,3
1972,2447,Sagüera De Luna por Soto y Amío.,,,"(42.783061, -5.740471)","(42.777686, -5.841873)",53,122,1640,926,1251,[907],[3769],"[(42.783061, -5.740471), (42.782985, -5.740429...","[1075.452, 1075.453, 1075.608, 1075.608, 1076....",,5
6763,8404,Coll De Jouet por Berga y Sant Llorenç de Moru...,,,"(42.108257,1.854248)","(42.0344, 1.561878)",51,90,1749,577,1268,[808],"[887, 3994]","[(42.108257, 1.854248), (42.108178, 1.854208),...","[794.018, 793.098, 791.017, 788.008, 787.042, ...",,5
1974,2449,Curueña y Andarraso por Soto y Amío.,,,"(42.7426, -5.940483)","(42.809684, -6.031129)",51,89,1912,1007,1412,"[332, 513, 1025]",[3769],"[(42.7426, -5.940483), (42.742549, -5.940416),...","[1008.685, 1008.689, 1008.817, 1008.808, 1008....",,5


In [191]:
#Assigning the new dataframe to the routes variable.

routes = test

## Inverting coordinates for MongoDB.

In [192]:
#Defining our function.

def route_converter(df):
    
    """
    Input : dataframe of routes.
    
    Output: dataframe with switched coordinates (lat/long -> long/lat) and substitutes all '(' with '['.
    
    """
    
    for i in range(len(df)):
        df['start'].iloc[i] = '[' + str(eval(df['start'].iloc[i])[1]) + ',' + str(eval(df['start'].iloc[i])[0]) + ']'
        df['midpoint'].iloc[i] = '[' + str(eval(df['midpoint'].iloc[i])[1]) + ',' + str(eval(df['midpoint'].iloc[i])[0]) + ']'
        
    return df

In [193]:
#Converting coords.

routes = route_converter(routes)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [195]:
#Since MongoDB cannot use the coords column we will be deleting it.

routes = routes.drop('coords', axis=1)

In [196]:
routes.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,mountain_passes_ids,municipalities_ids,alt,gpx_link,difficulty_score
908,1117,Eulate y Opakua por Valle de Yerri.,,,"[-2.10989,42.77131]","[-2.3135,42.7945]",58,49,1097,516,1027,"[596, 776]",[4888],"[625.032, 623.092, 623.011, 622.007, 618.058, ...",,3
2688,3338,San Pelaio por Gernika-Lumo y Bakio.,,,"[-2.683899,43.304251]","[-2.799765,43.431816]",56,58,943,2,312,[722],"[7544, 7510]","[17.314, 16.954, 15.309, 15.108, 15.494, 15.81...",,3
1972,2447,Sagüera De Luna por Soto y Amío.,,,"[-5.740471,42.783061]","[-5.841873,42.777686]",53,122,1640,926,1251,[907],[3769],"[1075.452, 1075.453, 1075.608, 1075.608, 1076....",,5
6763,8404,Coll De Jouet por Berga y Sant Llorenç de Moru...,,,"[1.854248,42.108257]","[1.561878,42.0344]",51,90,1749,577,1268,[808],"[887, 3994]","[794.018, 793.098, 791.017, 788.008, 787.042, ...",,5
1974,2449,Curueña y Andarraso por Soto y Amío.,,,"[-5.940483,42.7426]","[-6.031129,42.809684]",51,89,1912,1007,1412,"[332, 513, 1025]",[3769],"[1008.685, 1008.689, 1008.817, 1008.808, 1008....",,5


In [197]:
routes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 476 entries, 908 to 7926
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   476 non-null    int64  
 1   name                 476 non-null    object 
 2   ccaa                 0 non-null      float64
 3   province             0 non-null      float64
 4   start                476 non-null    object 
 5   midpoint             476 non-null    object 
 6   trailrank            476 non-null    int64  
 7   distance             476 non-null    int64  
 8   gradient             476 non-null    int64  
 9   min_alt              476 non-null    int64  
 10  max_alt              476 non-null    int64  
 11  mountain_passes_ids  476 non-null    object 
 12  municipalities_ids   476 non-null    object 
 13  alt                  476 non-null    object 
 14  gpx_link             0 non-null      object 
 15  difficulty_score     476 non-null    

In [6]:
#Finally saving the dataframe.

routes_new.to_csv('routes_2807_476.csv', index=False)

NameError: name 'routes_new' is not defined