# Il giro del mondo in 80 giorni

Questo progetto utilizza un dataset contenente le principali città del mondo, con annessa locazione geografica e altre informazioni, per calcolare il tempo minimo che ci si metterebbe a viaggiare tra due città, il percorso migliore, il percorso più turistico e altre funzionalità.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from geopy import distance as geopy
import sklearn.metrics.pairwise as geo
import csv
import time

In [2]:
data_file = './data/worldcities_ascii.csv'
with open(data_file, 'r') as f:
    reader = csv.reader(f)
    all_lines = []
    for row in reader:
        all_lines.append(row)

## Preprocessing

Preprocesso i dati pulendo le virgole in eccesso, corregendo qualche nome e aggiungendo i campi che mi servono.

>TODO: Aggiungere dei campi tipo cose da visitare, prezzo medio ecc...

In [3]:
newLines = []
for line in all_lines:
    newLine = []
    for field in line:
        field = field.replace("Korea, South", "South Korea")
        field = field.replace("Korea, North", "North Korea")
        field = field.replace("Gambia, The", "The Gambia")
        field = field.replace("Micronesia, Federated States Of", "Federated States Of Micronesia")
        field = field.replace("Bahamas, The", "The Bahamas")
        field = field.replace("Saint Helena, Ascension, And Tristan Da Cunha", "Saint Helena, Ascension and Tristan da Cunha")
        field = field.replace("Islamorada, Village of Islands", "Village of Islands Islamorada")
        field = field.replace("\n", "")
        newLine.append(field)
    newLines.append(newLine)

#Lo esporto in caso mi servisse in altri progetti
pd.DataFrame(newLines[1:], columns=newLines[0]).to_csv("./data/worldcities_preprocessed.csv", index=False)

In [4]:
#Siccome sono troppe per lavorarci inizio a prenderne solo 2000 a caso
lineIndexes = np.random.choice(range(1, len(newLines)), size=1000, replace=False)
df = pd.DataFrame([newLines[i] for i in lineIndexes], columns=newLines[0])
df

Unnamed: 0,city,lat,lng,country,iso3,population,id
0,Washington,41.2982,-91.6928,United States,USA,6594,1840010255
1,Buyende,1.1517,33.155,Uganda,UGA,,1800567800
2,Batocina,44.15,21.0833,Serbia,SRB,,1688299986
3,Rajin,42.3444,130.3844,North Korea,PRK,196954,1408449973
4,Zhuhai,22.2769,113.5678,China,CHN,1562000,1156722242
...,...,...,...,...,...,...,...
995,Vaslui,46.6383,27.7292,Romania,ROU,55407,1642644428
996,Five Forks,34.8069,-82.2271,United States,USA,18004,1840013491
997,Apaxco de Ocampo,19.9733,-99.17,Mexico,MEX,13836,1484505240
998,Quickborn,53.7333,9.8972,Germany,DEU,21296,1276737039


In [5]:
class CityNode:
    
    def __init__(self, cityID: int, cityName: str, lat: float, lng: float, population: int, countryISO3: str):
        self.cityID = cityID
        self.cityName = cityName
        self.lat = lat
        self.lng = lng
        self.coordinates = np.array([lat, lng])
        self.population = population
        self.countryISO3 = countryISO3
        self.neighbour1 = None
        self.neighbour2 = None
        self.neighbour3 = None
    
    def updateNeighbour(self, cityIndex: int, position: int):
        if position == 1:
            self.neighbour1 = cityIndex
        elif position == 2:
            self.neighbour2 = cityIndex
        elif position == 3:
            self.neighbour3 = cityIndex
    
    def isEligible(self, city):
        if self.neighbour3 is None:
            return true
        
            
    def __repr__(self):
        return f"{self.cityName} at {self.lat}, {self.lng}"

In [6]:
graph = [CityNode(int(c.id), c.city, float(c.lat), float(c.lng), c.population, c.iso3) for _, c in df.iterrows()]

In [7]:
def updateMin(min1, min2, min3, dist: float, index: int):
    if (min1 is None) or (min2 is None) or (min3 is None):
        min1 = {"index": index, "dist": dist}
        min2 = {"index": index, "dist": dist}
        min3 = {"index": index, "dist": dist}
    elif dist <= min1["dist"]:
        min3 = min2
        min2 = min1
        min1 = {"index": index, "dist": dist}
    elif dist <= min2["dist"]:
        min3 = min2
        min2 = {"index": index, "dist": dist}
    elif dist <= min3["dist"]:
        min3 = {"index": index, "dist": dist}
    return (min1, min2, min3)

In [9]:
start = time.time()
for i in range(len(graph)):
    min1 = None
    min2 = None
    min3 = None
    for j in range(len(graph)):
        if i == j:
            continue
        #dist = ((graph[i].lat-graph[j].lat)**2+(graph[i].lng-graph[j].lng)**2)**(1/2)
        dist = geopy.distance(graph[i].coordinates, graph[j].coordinates).km
        min1, min2, min3 = updateMin(min1, min2, min3, dist, j)
    graph[i].updateNeighbour(min1["index"], 1)
    graph[i].updateNeighbour(min2["index"], 2)
    graph[i].updateNeighbour(min3["index"], 3)
    if i % 100 == 0:
        print(f"Execution: {i}\t\tTime elapsed (s): {time.time() - start}")

Execution: 0		Time elapsed (s): 0.17949724197387695
Execution: 100		Time elapsed (s): 17.778458833694458
Execution: 200		Time elapsed (s): 35.8293080329895
Execution: 300		Time elapsed (s): 54.00816082954407
Execution: 400		Time elapsed (s): 72.45257639884949


KeyboardInterrupt: 

## Come procedere

1. Decidere grandezza quadrati per dividere mondo
2. Assegnare ogni città a quadrato
3. Per ogni città in ogni quadrato trovare 3 città più vicine cercando nello stesso quadrato della città e nei 9 intorno
4. In caso non ci siano almeno 4 città in questi 10 quadrati cercare anche nei 16 quadrati più esterni
5. Scegliere rappresentazione migliore grafo e calcolare pesi degli archi
6. Scegliere algoritmo di navigazione grafo
7. Implementare metodi per fare query di vario tipo