# Final dataframe creation

After all  transformations have been made we want to have just 3 dataframes that contain all useful information:

**1. Routes:** the best cycling routes to climb each mountain pass.

**2. Towns:** all towns with 3 or more routes that pass near them.

**3. Ports:** all geolocated mountain ports.

## Routes: general data cleaning

In [152]:
import pandas as pd
import time
import haversine as hs

In [157]:
routes = pd.read_csv('routes_1607_819.csv')

In [158]:
routes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 819 entries, 0 to 818
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   819 non-null    int64  
 1   name                 819 non-null    object 
 2   ccaa                 0 non-null      float64
 3   province             0 non-null      float64
 4   start                819 non-null    object 
 5   midpoint             819 non-null    object 
 6   trailrank            819 non-null    int64  
 7   distance             819 non-null    int64  
 8   gradient             819 non-null    int64  
 9   min_alt              819 non-null    int64  
 10  max_alt              819 non-null    int64  
 11  municipality         0 non-null      float64
 12  mountain_passes_ids  819 non-null    object 
 13  municipalities_ids   0 non-null      float64
 14  coords               819 non-null    object 
 15  alt                  819 non-null    obj

In [159]:
#Adding _id and municipalities_ids columns:

routes['municipality'] = None

In [160]:
routes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 819 entries, 0 to 818
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   819 non-null    int64  
 1   name                 819 non-null    object 
 2   ccaa                 0 non-null      float64
 3   province             0 non-null      float64
 4   start                819 non-null    object 
 5   midpoint             819 non-null    object 
 6   trailrank            819 non-null    int64  
 7   distance             819 non-null    int64  
 8   gradient             819 non-null    int64  
 9   min_alt              819 non-null    int64  
 10  max_alt              819 non-null    int64  
 11  municipality         0 non-null      object 
 12  mountain_passes_ids  819 non-null    object 
 13  municipalities_ids   0 non-null      float64
 14  coords               819 non-null    object 
 15  alt                  819 non-null    obj

In [161]:
#Renaming the columns.

routes.rename(columns = {'nombre': 'name', 'provincia': 'province', 'distancia': 'distance', 'desnivel': 'gradient', 'municipios': 'municipalities_ids', 'puertos': 'mountain_passes_ids'}, inplace = True)

In [162]:
#Re-ordering the columns.

routes = routes[['ID', 'name', 'ccaa', 'province', 'start', 'midpoint', 'trailrank', 'distance', 'gradient', 'min_alt', 'max_alt', 'municipality', 'mountain_passes_ids', 'municipalities_ids', 'coords', 'alt']]

In [163]:
#Dropping coords and alt since they can't be loaded into MongoDB.

routes.drop(['coords', 'alt'], axis=1, inplace=True)

In [164]:
routes.head()

Unnamed: 0,ID,name,ccaa,province,start,midpoint,trailrank,distance,gradient,min_alt,max_alt,municipality,mountain_passes_ids,municipalities_ids
0,923,"ANGLIRU, CIRCULAR DESDE LA PLAZA, TEVERGA",,,"(43.158859, -6.101982)","(43.235847, -5.939921)",67,124,3476,101,1566,,[0],
1,5611,"Pola de Lena, Cobertoria, Gamoniteiro, Tenebre...",,,"(43.155729, -5.8297)","(43.288199, -5.929957)",51,118,4234,102,1700,,"[0, 1, 84, 131]",
2,5490,PEÑA ESCRITA (POR ALMUÑECAR),,,"(36.734975, -3.743127)","(36.818439, -3.762692)",42,45,1481,6,1191,,[2],
3,881,Ancares-Pandozarco,,,"(42.852246, -7.157974)","(42.889535, -6.844199)",55,130,2861,289,1651,,"[3, 182, 1109]",
4,5618,POLA DE LENA - PUERTO DE PAJARES - CUITU NEGRU...,,,"(43.128166, -5.806177)","(43.083221, -5.829091)",42,121,2917,344,1824,,"[4, 51, 69, 438]",


## Routes: converting coordinates for MongoDB

**MongoDB** uses a different formatting for latitude/longitude, so we will have to modify our dataset accordingly.

In [166]:
#Defining our function.

def route_converter(df):
    
    """
    Input : dataframe of routes.
    
    Output: dataframe with switched coordinates (lat/long -> long/lat) and substitutes all '(' with '['.
    
    """
    
    for i in range(len(df)):
        df['start'].iloc[i] = '[' + str(eval(df['start'].iloc[i])[1]) + ',' + str(eval(df['start'].iloc[i])[0]) + ']'
        df['midpoint'].iloc[i] = '[' + str(eval(df['midpoint'].iloc[i])[1]) + ',' + str(eval(df['midpoint'].iloc[i])[0]) + ']'
        
    return df

In [167]:
#Converting coords.

routes = route_converter(routes)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [169]:
#Let's save this dataframe.

routes.to_csv('routes_1807_819i.csv', index=False)

In [67]:
#We also need this dataframe with its coordinates inverted to use it on MongoDB. For this purpose we will be using a function:

def converter(df):
    for i in range(len(df)):
        df['start'].iloc[i] = '[' + str(eval(df['start'].iloc[i])[1]) + ',' + str(eval(df['start'].iloc[i])[0]) + ']'
        df['midpoint'].iloc[i] = '[' + str(eval(df['midpoint'].iloc[i])[1]) + ',' + str(eval(df['midpoint'].iloc[i])[0]) + ']'


    for i in range(len(df)):
        newc = eval(df['coords'].iloc[i])
        c_list = []
        for n in newc:
            c_list.append([n[1],n[0]])
        df['coords'].iloc[i] = c_list
        
    return df

In [68]:
routes = converter(routes)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [71]:
#Finally, exporting our dataframe ready to use on MongoDB.

routes.to_csv('routes_1607_819_DB.csv', index=False)

## Routes: matching routes with nearby towns

Now that we have all 819 routes it's time to find which towns are near each one. Once this is done we can finally find which towns have 3 or more routes near them, thus making them ideal cycling destinations.

In [46]:
#Let's import our municipalities dataframe.

towns = pd.read_csv('compartir/municipios y rutas.csv')

In [48]:
#This function takes two dataframes (towns and routes) and looks for towns 1.5Km or less from any point of our routes.

def nearby_routes(routes, towns):
    dict_list = []
    for g in range(len(routes)):
        new_c = eval(routes['coords'].iloc[g])
        route_list = []
        for i in new_c[0::60]:
            for n in range(len(towns)):
                try:
                    if hs.haversine((i), eval(towns['coords'].iloc[n])) - towns['radius'].iloc[n] < 1.5:
                        if towns['ID'].iloc[n] not in route_list:
                            route_list.append(towns['ID'].iloc[n])
                        else:
                            pass
                except:
                    pass
        dict_routes = {'route': routes['ID'].iloc[g], 'town': route_list}
        dict_list.append(dict_routes)
    return dict_list

In [50]:
#Using the function on our dataframes. This block of code performs all necessary transformations on it.

start = time.time()

test1 = pd.DataFrame(nearby_routes(routes, towns))

df_exploded = test1.explode('town')

town_list = towns['ID'].tolist()

#Using a simple loop to generate a dictionary of nearby routes for every town, and adding that dictionary to a list.

dict_list = []

for i in town_list:
    try: 
        lista_rutas = []
        for n in range(len(df_exploded)):
            if df_exploded['town'].iloc[n] == i:
                lista_rutas.append(df_exploded['route'].iloc[n])
            else:
                pass
        dict_ruta = {'municipio': i, 'rutas': lista_rutas, 'numero_rutas': len(lista_rutas)}
        dict_list.append(dict_ruta)
    except:
        pass
    
#Creating a dataframe out of the dict list.

df_dict = pd.DataFrame.from_dict(dict_list)


final_list  = df_dict['rutas'].tolist()
num_routes = df_dict['numero_rutas'].tolist()
towns['rutas'] = final_list
towns['num_rutas'] = num_routes


stop = time.time() 
duration = (stop - start) / 60
print('Minutes:', duration)

towns.to_csv('towns_1707.csv', index=False)

towns.head(50)

Minutes: 431.01491024096805


Unnamed: 0,ID,municipio,provincia,poblacion,superficie,altitud,radius,coords,num_rutas,rutas
0,0,Alegría-Dulantzi,Álava,2935,19.945872,568.0,2.519713,"(42.83981158,-2.51243731)",3,"[3305, 1026, 8755]"
1,1,Amurrio,Álava,10264,96.4953,219.0,5.542142,"(43.05427776,-3.00007326)",15,"[5904, 1525, 3251, 9685, 877, 4981, 5599, 4489..."
2,2,Aramaio,Álava,1442,73.2584,333.0,4.828956,"(43.05119653,-2.56540037)",18,"[7133, 3147, 3744, 311, 334, 1026, 93, 3307, 1..."
3,3,Artziniega,Álava,1800,27.2873,210.0,2.947168,"(43.12084358,-3.12791718)",11,"[3251, 9685, 5491, 6940, 7995, 4489, 1280, 624..."
4,4,Armiñón,Álava,223,12.9727,467.0,2.032075,"(42.72326199,-2.87183475)",1,[7522]
5,5,Arratzua-Ubarrundia,Álava,984,57.5598,528.0,4.280398,"(42.89116171,-2.63878816)",19,"[5752, 3147, 1854, 311, 3305, 8057, 1026, 93, ..."
6,6,Asparrena,Álava,1623,65.3364,602.0,4.560392,"(42.88968768,-2.31670754)",4,"[765, 9557, 8502, 8047]"
7,7,Ayala,Álava,2968,140.7415,306.0,6.693228,"(43.07768283,-3.04306746)",20,"[5752, 3290, 5904, 3251, 9685, 877, 4981, 5599..."
8,8,Baños de Ebro,Álava,294,9.5039,421.0,1.739304,"(42.52925602,-2.67863583)",1,[2513]
9,9,Barrundia,Álava,881,97.4234,575.0,5.56873,"(42.91537538,-2.49407452)",6,"[3305, 9557, 8502, 3200, 3411, 8047]"


# Towns: general data cleaning

Cleaning and renaming columns on our *towns* dataset. We will also need to convert it for its use with **MongoDB**, thankfully we have a function for that.

In [117]:
towns = pd.read_csv('towns_1707.csv')

In [118]:
towns.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8156 entries, 0 to 8155
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   ID          8156 non-null   int64  
 1   municipio   8156 non-null   object 
 2   provincia   8156 non-null   object 
 3   poblacion   8156 non-null   int64  
 4   superficie  8156 non-null   float64
 5   altitud     8156 non-null   float64
 6   radius      8156 non-null   float64
 7   coords      8156 non-null   object 
 8   num_rutas   8156 non-null   int64  
 9   rutas       8156 non-null   object 
dtypes: float64(3), int64(3), object(4)
memory usage: 637.3+ KB


In [119]:
#Renaming the columns.

towns.rename(columns = {'provincia': 'province', 'municipio': 'municipality', 'poblacion': 'municipality_inhabitants', 'superficie': 'geographic_area', 'altitud': 'alt', 'num_rutas': 'routes_number', 'rutas': 'routes_ids'}, inplace = True)

In [120]:
#Adding a column to store all mountain passes that can be accessed from each town.

towns['mountain_passes_ids'] = None

In [121]:
#Adding the ccaa column, it will be populated in a few minutes.

towns['ccaa'] = None

In [122]:
#Reordering the columns.

towns = towns[['ID', 'municipality', 'ccaa', 'province', 'municipality_inhabitants', 'geographic_area', 'radius', 'routes_number', 'routes_ids', 'mountain_passes_ids', 'coords']]

In [123]:
towns.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8156 entries, 0 to 8155
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        8156 non-null   int64  
 1   municipality              8156 non-null   object 
 2   ccaa                      0 non-null      object 
 3   province                  8156 non-null   object 
 4   municipality_inhabitants  8156 non-null   int64  
 5   geographic_area           8156 non-null   float64
 6   radius                    8156 non-null   float64
 7   routes_number             8156 non-null   int64  
 8   routes_ids                8156 non-null   object 
 9   mountain_passes_ids       0 non-null      object 
 10  coords                    8156 non-null   object 
dtypes: float64(2), int64(3), object(6)
memory usage: 701.0+ KB


## Towns: adding 'ccaa' to each town

My first approach was to use **Geopy** to obtain the **CCAA** of every town, but the process encountered many errors and the library wasn't fully reliable for this purpose. I instead chose to use a simple series of loops to assign the correct **CCAA** to each province. This process, while labor-intensive at first, allows me to re-use this loop on any given dataframe simply by changing some column names.

In [124]:
#Adding the CCAA is easily done by using a simple loop.

for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Álava', 'Bizkaia', 'Gipuzkoa']:
        towns['ccaa'].iloc[i] = 'País Vasco'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Almería', 'Cádiz', 'Córdoba', 'Granada', 'Huelva', 'Jaén', 'Málaga', 'Sevilla']:
        towns['ccaa'].iloc[i] = 'Andalucía'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Huesca', 'Teruel', 'Zaragoza']:
        towns['ccaa'].iloc[i] = 'Aragón'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Asturias']:
        towns['ccaa'].iloc[i] = 'Asturias'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Illes Balears']:
        towns['ccaa'].iloc[i] = 'Illes Balears'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Las Palmas', 'Santa Cruz de Tenerife']:
        towns['ccaa'].iloc[i] = 'Canarias'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Cantabria']:
        towns['ccaa'].iloc[i] = 'Cantabria'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Albacete', 'Ciudad Real', 'Cuenca', 'Guadalajara', 'Toledo']:
        towns['ccaa'].iloc[i] = 'Castilla-La Mancha'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Barcelona', 'Girona', 'Lleida', 'Tarragona']:
        towns['ccaa'].iloc[i] = 'Cataluña'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Badajoz', 'Cáceres']:
        towns['ccaa'].iloc[i] = 'Extremadura'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['A Coruña', 'Luga', 'Ourense', 'Pontevedra']:
        towns['ccaa'].iloc[i] = 'Galicia'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Alicante', 'Castellón', 'Valencia']:
        towns['ccaa'].iloc[i] = 'Comunitat Valenciana'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Madrid']:
        towns['ccaa'].iloc[i] = 'Comunidad de Madrid'
    else:
        pass
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Murcia']:
        towns['ccaa'].iloc[i] = 'Región de Murcia'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['Navarra']:
        towns['ccaa'].iloc[i] = 'Navarra'
    else:
        pass
    
for i in range(len(towns)):
    if towns['province'].iloc[i] in ['La Rioja']:
        towns['ccaa'].iloc[i] = 'La Rioja'
    else:
        pass

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [125]:
#Checking the values. Everything seems to be in order.

towns['ccaa'].value_counts()

Cataluña                948
Castilla-La Mancha      922
Andalucía               788
Aragón                  734
Comunitat Valenciana    544
Extremadura             388
Navarra                 273
País Vasco              251
Galicia                 249
Comunidad de Madrid     180
La Rioja                175
Cantabria               103
Canarias                 89
Asturias                 79
Illes Balears            67
Región de Murcia         45
Name: ccaa, dtype: int64

## Towns: keeping towns with 3 or more routes

In [126]:
# Finally, let's filter out towns with less than 3 routes.

towns = towns[towns['routes_number'] >= 3]

In [127]:
towns.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1386 entries, 0 to 7778
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        1386 non-null   int64  
 1   municipality              1386 non-null   object 
 2   ccaa                      1203 non-null   object 
 3   province                  1386 non-null   object 
 4   municipality_inhabitants  1386 non-null   int64  
 5   geographic_area           1386 non-null   float64
 6   radius                    1386 non-null   float64
 7   routes_number             1386 non-null   int64  
 8   routes_ids                1386 non-null   object 
 9   mountain_passes_ids       0 non-null      object 
 10  coords                    1386 non-null   object 
dtypes: float64(2), int64(3), object(6)
memory usage: 129.9+ KB


## Towns:  filtering out towns that share the same routes

In the event of several towns sharing the same routes we want to keep the bigger one, since it will have the most services and accomodations. To achieve this we will have to search for duplicates on the *routes_ids* column, but since they are stored in non-sorted lists we will have to order them first.

In [128]:
#Creating a loop that evaluates each route_id string as a list and orders it.

for i in range(len(towns)):
    ordered = eval(towns['routes_ids'].iloc[i])
    ordered.sort()#Evaluating the string as a list and sorting it.
    towns['routes_ids'].iloc[i] = str(ordered) #Replacing the non-ordered list with the new one (as a string).

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [129]:
#All route ids have been neatly ordered.

towns.head(1)

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords
0,0,Alegría-Dulantzi,País Vasco,Álava,2935,19.945872,2.519713,3,"[1026, 3305, 8755]",,"(42.83981158,-2.51243731)"


In [130]:
#Now we can simply call drop_duplicates on that column to get rid of duplicate routes, but before that we will order our
#dataframe by town size so that we actually keep the bigger town.

towns = towns.sort_values('municipality_inhabitants', ascending=False)
towns = towns.drop_duplicates('routes_ids', keep='first')

In [131]:
#We're finally left with 841 towns.

towns.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 841 entries, 884 to 4171
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        841 non-null    int64  
 1   municipality              841 non-null    object 
 2   ccaa                      732 non-null    object 
 3   province                  841 non-null    object 
 4   municipality_inhabitants  841 non-null    int64  
 5   geographic_area           841 non-null    float64
 6   radius                    841 non-null    float64
 7   routes_number             841 non-null    int64  
 8   routes_ids                841 non-null    object 
 9   mountain_passes_ids       0 non-null      object 
 10  coords                    841 non-null    object 
dtypes: float64(2), int64(3), object(6)
memory usage: 78.8+ KB


In [133]:
#Let's save this dataset before re-arranging its coordinates. The final 'n' marks it as non-inverted.

towns.to_csv('towns_1807_841n.csv', index=False)

## Towns: converting coordinates

To use our datasets in conjunction with **MongoDB** some preparations must be made, mainly substituting all **( )** with **[ ]** and inverting all coordinates (lat/long to long/lat). To make the process easier I've defined a simple function that takes care of it.

In [134]:
#Defining our converter function.

def town_converter(df):
    
    """
    Input : dataframe of towns.
    
    Output: dataframe with switched coordinates (lat/long -> long/lat) and substitutes all '(' with '['.
    
    """
    
    for i in range(len(df)):
        df['coords'].iloc[i] = '[' + str(eval(df['coords'].iloc[i])[1]) + ',' + str(eval(df['coords'].iloc[i])[0]) + ']'
        
    return df

In [135]:
#Converting our dataframe.

converted = town_converter(towns)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [136]:
converted.head()

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords
884,884,Barcelona,Cataluña,Barcelona,1664182,100.7644,5.663411,4,"[1292, 1732, 6228, 8149]",,"[2.17634927,41.38424664]"
7257,7257,València,Comunitat Valenciana,Valencia,800215,139.2687,6.658115,9,"[528, 1469, 1472, 2478, 5225, 7040, 7231, 7734...",,"[-0.37565717,39.47534441]"
4547,4547,Málaga,Andalucía,Málaga,578460,395.7069,11.223062,10,"[2933, 4541, 4546, 5035, 5997, 5998, 8379, 841...",,"[-4.41997511,36.72034267]"
4613,4613,Murcia,Región de Murcia,Murcia,459403,885.1149,16.785117,7,"[691, 1691, 6089, 6769, 7049, 8099, 8196]",,"[-1.1285408,37.98436361]"
7518,7518,Bilbao,País Vasco,Bizkaia,350184,41.3426,3.627634,42,"[32, 181, 182, 1070, 1280, 1282, 1504, 1572, 1...",,"[-2.92390606,43.25721957]"


In [138]:
#Exporting our dataframe. The final 'i' marks it as an inverted dataframe, ready for MongoDB integration.

converted.to_csv('towns_1807_841i.csv', index=False)

# Ports: general data cleaning

Since this dataframe has had a lot of work behind it, it's already quite tidy and just needs a few tweaks. Column names will be translated into english.

In [139]:
ports = pd.read_csv('compartir/puertos.csv')

In [141]:
ports.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1123 entries, 0 to 1122
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   ID           1123 non-null   int64  
 1   puerto       1123 non-null   object 
 2   provincia    1123 non-null   object 
 3   pueblo       1123 non-null   object 
 4   altitud      1123 non-null   int64  
 5   desnivel     1123 non-null   int64  
 6   distancia    1123 non-null   float64
 7   pendiente    1123 non-null   float64
 8   coeficiente  1123 non-null   int64  
 9   url          1123 non-null   object 
 10  coords       1123 non-null   object 
dtypes: float64(2), int64(4), object(5)
memory usage: 96.6+ KB


In [148]:
#Renaming the columns and adding a new one.

ports['photo'] = None
ports = ports.rename(columns = {'puerto': 'name', 'provincia': 'province', 'pueblo': 'municipality', 'altitud': 'altitude', 'desnivel': 'gradient', 'distancia': 'distance', 'pendiente': 'mountain_slope', 'coeficiente': 'technical_difficulty', 'coords': 'peak_coords'})

In [144]:
#Saving our dataframe:

ports.to_csv('compartir/puertos.csv', index=False)

In [149]:
#Before we save it for use on MongoDB we must perform the customary cleanup.

for i in range(len(ports)):
    ports['peak_coords'].iloc[i] = '[' + str(eval(ports['peak_coords'].iloc[i])[1]) + ',' + str(eval(ports['peak_coords'].iloc[i])[0]) + ']'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [151]:
#Exporting our inverted dataframe.

ports.to_csv('compartir/puertos_i.csv', index=False)

# Conclusion and notes

Now our 3 dataframes are clean and ready for integration. Some additional features will be added at a later date, but the general format and layout will probably remain unchanged. 

This processing and use of functions will need to be structured into a pipeline for future data intake.