# Filtering the destinations

Right now we have about 841 destinations with 3 or more routes that climb unique mountain passes, but thanks to Spain's wonderfully uneven orography many of those destinations are clustered very densely.

This issue might be solved by establishing a filter or our making that discards secondary locations if they are less than a given distance from another, bigger destination.

First of all let's visually inspect the clusters using **Folium**.

# Mapping the destinations

In [1]:
import folium
import pandas as pd
import haversine as hs

In [2]:
#Importing our dataframe, it contains the 841 destinations.

municipalities = pd.read_csv('towns_1807_841n.csv')

In [3]:
map = folium.Map(location=[40.4167, -3.70325], zoom_start=6, tiles="OpenStreetMap") #Initiating our map.

for i in range(len(municipalities)):
    folium.Marker(eval(municipalities['coords'].iloc[i]), tooltip=municipalities['municipality'].iloc[i]).add_to(map)
    
map #Displaying the map.

In [12]:
#Exporting our map.

map.save(outfile= "841 destinations map.html")

The destinations are very clustered, reaching a huge density in the north. Let's try to space them out a bit.

# Spacing the destinations

To achieve our desired results we want to create a function that accomplishes the following:

- Can use as input our standard format *towns* dataframe.
- Checks every destination's distance from the others and keeps the biggest one for a given distance.
- Displays the result on a map.

In [4]:
#Creating our function.

def spacer(municipalities, distance):
    df = municipalities.copy() #Making a copy of our dataframe.

    for i in range(len(df)): #Iterating through all towns.
        try:
            for n in range(len(df)): #Iterating through every permutation between towns to check their distance.
                try:
                        if hs.haversine(eval(df['coords'].iloc[i]), eval(df['coords'].iloc[n])) < distance:
                            if df['municipality'].iloc[i] != df['municipality'].iloc[n]: #Checking for itself.
                                df = df.drop([n]) #Dropping the smaller town.
                                df = df.reset_index(drop=True) #Re-indexing.
                except:
                    pass
        except:
            pass

    map = folium.Map(location=[40.4167, -3.70325], zoom_start=6, tiles="OpenStreetMap") #Initiating our map.

    for i in range(len(df)):
        folium.Marker(eval(df['coords'].iloc[i]), tooltip=df['municipality'].iloc[i]).add_to(map) #Adding each town.
        
    return df #Displaying our map.

In [5]:
#Let's run our function with a distance of 15km.

spacer(municipalities, 15)

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords
0,884,Barcelona,Cataluña,Barcelona,1664182,100.764400,5.663411,4,"[1292, 1732, 6228, 8149]",,"(41.38424664,2.17634927)"
1,7257,València,Comunitat Valenciana,Valencia,800215,139.268700,6.658115,9,"[528, 1469, 1472, 2478, 5225, 7040, 7231, 7734...",,"(39.47534441,-0.37565717)"
2,4547,Málaga,Andalucía,Málaga,578460,395.706900,11.223062,10,"[2933, 4541, 4546, 5035, 5997, 5998, 8379, 841...",,"(36.72034267,-4.41997511)"
3,4613,Murcia,Región de Murcia,Murcia,459403,885.114900,16.785117,7,"[691, 1691, 6089, 6769, 7049, 8099, 8196]",,"(37.98436361,-1.1285408)"
4,151,Alicante,Comunitat Valenciana,Alicante,337482,201.265845,8.004046,9,"[621, 629, 647, 680, 1469, 3189, 8145, 9500, 9...",,"(38.34548705,-0.4831832)"
...,...,...,...,...,...,...,...,...,...,...,...
247,1357,Neila,,Burgos,153,68.589516,4.672544,3,"[4032, 4034, 8418]",,"(42.06026327,-2.99647104)"
248,6299,Montejo de Tiermes,,Soria,151,167.269427,7.296807,4,"[7331, 8364, 8450, 8677]",,"(41.36854561,-3.19973878)"
249,2914,Cantalojas,Castilla-La Mancha,Guadalajara,146,157.101383,7.071549,3,"[7331, 8364, 8677]",,"(41.23578603,-3.24439345)"
250,1839,Benafigos,Comunitat Valenciana,Castellón,138,35.599592,3.366255,3,"[527, 528, 8492]",,"(40.27646952,-0.20921861)"


We can see that the towns are much more spaced out now. We also went from 841 destinations to 252, but some zones still have many towns nearby and many of them won't really have good services because of their diminutive size.

Before we settle on a final destination list, let's find out what would happen if we got rid of all small towns (<1000 population).

In [7]:
#Creating a new dataframe that meets the criteria.

df_big = municipalities[municipalities['municipality_inhabitants'] >= 1000]

In [8]:
#We're down to 523 destinations.

df_big.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 523 entries, 0 to 522
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   ID                        523 non-null    int64  
 1   municipality              523 non-null    object 
 2   ccaa                      479 non-null    object 
 3   province                  523 non-null    object 
 4   municipality_inhabitants  523 non-null    int64  
 5   geographic_area           523 non-null    float64
 6   radius                    523 non-null    float64
 7   routes_number             523 non-null    int64  
 8   routes_ids                523 non-null    object 
 9   mountain_passes_ids       0 non-null      float64
 10  coords                    523 non-null    object 
dtypes: float64(3), int64(3), object(5)
memory usage: 49.0+ KB


In [9]:
#Let's run the function again with our new dataframe.

spacer(df_big, 15)

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords
0,884,Barcelona,Cataluña,Barcelona,1664182,100.764400,5.663411,4,"[1292, 1732, 6228, 8149]",,"(41.38424664,2.17634927)"
1,7257,València,Comunitat Valenciana,Valencia,800215,139.268700,6.658115,9,"[528, 1469, 1472, 2478, 5225, 7040, 7231, 7734...",,"(39.47534441,-0.37565717)"
2,4547,Málaga,Andalucía,Málaga,578460,395.706900,11.223062,10,"[2933, 4541, 4546, 5035, 5997, 5998, 8379, 841...",,"(36.72034267,-4.41997511)"
3,4613,Murcia,Región de Murcia,Murcia,459403,885.114900,16.785117,7,"[691, 1691, 6089, 6769, 7049, 8099, 8196]",,"(37.98436361,-1.1285408)"
4,151,Alicante,Comunitat Valenciana,Alicante,337482,201.265845,8.004046,9,"[621, 629, 647, 680, 1469, 3189, 8145, 9500, 9...",,"(38.34548705,-0.4831832)"
...,...,...,...,...,...,...,...,...,...,...,...
196,7008,Ademuz,Comunitat Valenciana,Valencia,1037,100.313800,5.650734,3,"[4195, 7139, 7661]",,"(40.06120013,-1.28669047)"
197,12,Campezo,País Vasco,Álava,1034,85.360875,5.212595,5,"[1026, 1116, 2515, 8394, 8755]",,"(42.66971209,-2.35207693)"
198,3867,La Vall de Boí,Cataluña,Lleida,1019,220.829200,8.384030,3,"[1313, 1315, 7304]",,"(42.50428276,0.80227472)"
199,6572,Albarracín,Aragón,Teruel,1006,452.701500,12.004125,3,"[496, 8497, 8723]",,"(40.40668217,-1.44449794)"


We're now down to 201 towns, which is more reasonable. For the moment we will be sticking with this list of destinations.

In [133]:
#Saving our dataframe, for this occasion we have slightly modified our function so that it returns the dataframe instead
#of the Folium map.

new_df = spacer(df_big, 15)

Before we save this dataframe we will create a new column to store the inverted coordinates to use it with **MongoDB**.

In [137]:
#Creating the new column.

new_df['coords_MDB'] = None

In [143]:
#This loops takes the original coordinates, flips them and uses the result to populate the new column. Also, all parentheses
#are replaced with brackets (as MongoDB requests.)

for i in range(len(new_df)):
    new_df['coords_MDB'].iloc[i] = '[' + str(eval(new_df['coords'].iloc[i])[1]) + ',' + str(eval(new_df['coords'].iloc[i])[0]) + ']'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_block(indexer, value, name)


In [145]:
new_df.head(1)

Unnamed: 0,ID,municipality,ccaa,province,municipality_inhabitants,geographic_area,radius,routes_number,routes_ids,mountain_passes_ids,coords,coords_MDB
0,884,Barcelona,Cataluña,Barcelona,1664182,100.7644,5.663411,4,"[1292, 1732, 6228, 8149]",,"(41.38424664,2.17634927)","[2.17634927,41.38424664]"


In [146]:
#We can finally save our dataframe.

new_df.to_csv('towns_1907_201.csv', index=False)