# Visualization of the problem

The travelling salesman problem is one of the most well-known problems in computer sciences. As Wikipedia says, this problem ask for: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?" 

It is a NP-complete problem so it doesn't exist a suitable method that will be solve this easily. Unlike other problem, for example, solving linear equations where we have a formula for returning the solution, here there is'nt anything like that. This is where arises the heuristics or metaheuristics: algorithm of search algorithms, genetic algorithms, etc. Most of these algorithms do not return the optimal solution but a one good solution. 

In a first approach, we would think about force brute. The character of this problem could suggest us to explore all the possible paths and keep the shortest one. This is unfeasible because of the computational cost $\mathcal{O}(n!)$, e.g. if we would have $n = 1000$ cities we will have to analyze about $\mathcal{O}(10^{2568})$ paths. So we will have to discard this approach and find more selective search algorithms.

This **Kaggle challenge** suggests a variant of the TSP because we will have a constraint: "every 10th step is 10% more lengthy unless coming from a prime CityId". The reasonable try to solve this challenge is to apply heuristics that have been successfully applied to the TSP but keeping in mind the prime constraint.

**In this kernel**, I will introduce this challenge showing how to load the data and some visualizations.

In [None]:
import numpy as np
import pandas as pd

Every path it will have to start in the North Pole and go to all cities once in order to do not miss any present. Once he will have been in every city he will have to come back to the North Pole where he will wait for the next Christmas. He is a magic prime Santa so if every tenth city is prime he will be very happy and he will be more fast. How can we help Santa to find the most efficient way?
There is 197768 cities (without the North Pole) as we said before, brute force is not a reasonable option.

In [None]:
df_cities = pd.read_csv('../input/cities.csv')
df_cities.tail()

**New column:** We will create a column in the dataset indicating whether each city is prime or not.

In [None]:
from sympy import isprime
df_cities.loc[:,'prime'] = df_cities.loc[:,'CityId'].apply(isprime)
df_cities.head()

## Map of the cities

Surprise, we will realize in the **visualization** of the citys in the map, that the cities form a reidneer pattern. 
縒hich world is it? 緾hristmasland? 
The red dots are the prime cities and the yellow dot is the north pole. We would look that the prime cities are distributed along the map so in each $10$th city will would be possible to pass through a prime city. So we expect that this constraint can be decisive for good solutions and we will not have to ignore it.

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 10))
plt.scatter(df_cities[df_cities['prime']==False].X , df_cities[df_cities['prime']==False].Y, s= 0.1)
plt.scatter(df_cities[df_cities['CityId']==0].X , df_cities[df_cities['CityId']==0].Y, s= 200, color = 'yellow')
plt.scatter(df_cities[df_cities['prime']==True].X , df_cities[df_cities['prime']==True].Y, s= 0.5, color = 'red')
plt.grid(False)
plt.show()

## How much prime cities?

How much prime cities are? As it is reasonable to think there is so much non-prime cities that prime.  Unfortunately, there will be tenth paths that will not be assigned to a prime city. So it would be important to assign every tenth path to a prime city.

In [None]:
print('How many tenth cities will not have been assigned to a prime', (len(df_cities.index)/10) - df_cities['prime'].value_counts()[1])
print('How much bigger is the total amount of cities related to the prime cities', len(df_cities.index)/df_cities['prime'].value_counts()[1])

In [None]:
plt.title('Number of primes')
df_cities['prime'].value_counts().plot(kind='bar')

### Distance function

**Bonus**: *total_distance* will calculate the resulting total distance given a path, this will be the objetive function that
        we want to minimize. How long will be to travel the cities following the ID order?      
        

In [None]:
# calculate the value of the objective function (total distance)
def pair_distance(x,y):
    x1 = (df_cities.X[x] - df_cities.X[y]) ** 2
    x2 = (df_cities.Y[x] - df_cities.Y[y]) ** 2
    return np.sqrt(x1 + x2)

def total_distance(path):
    distance = [pair_distance(path[x], path[x+1]) + 0.1 * pair_distance(path[x], path[x+1])
                if (x+1)%10 == 0 and df_cities.prime[path[x]] == False else pair_distance(path[x], path[x+1]) for x in range(len(path)-1)]
    return np.sum(distance)

In [None]:
#Path following the Ids, every solution we think will have to beat it
path = df_cities['CityId'].values
path =  np.append(path, 0)
total_distance(path)