**Hamiltonian Cycles**

In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a Hamiltonian path that is a cycle. Determining whether such paths and cycles exist in graphs is the Hamiltonian path problem, which is NP-complete.

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import euclidean_distances

In [None]:
cities = pd.read_csv('../input/cities.csv')
cities.head()

We will try and find hamiltonian paths and the least distance associated with those paths. For exercise we will take just 5 cities and in the submission we can take all the cities. For finding the hamiltonian cycles we need the graph in the format {0:{10,23,...},1:{34,45,...},...)

In [None]:
dict_path ={}
cities_50_algo = cities['CityId'].unique()[:5]
cities_50_algo_df = cities[cities['CityId'].isin(cities_50_algo)]
for city in cities_50_algo:
    dist =np.argsort(euclidean_distances(cities_50_algo_df[cities_50_algo_df['CityId']==city][['X','Y']].values,cities_50_algo_df[cities_50_algo_df['CityId']!=city][['X','Y']].values))+1
    dict_path[city] = dist[0].tolist()

Utility functions to find hamiltonian cycles.

In [None]:

graph = dict_path

def find_all_paths(graph, start, end, path=[]):
        #http://www.python.org/doc/essays/graphs/
        path = path + [start]
        if start == end:
            return [path]
        if start not in graph.keys():
            return []
        paths = []
        for node in graph[start]:
            if node not in path:
                newpaths = find_all_paths(graph, node, end, path)
                for newpath in newpaths:
                    paths.append(newpath)
        return paths
def find_paths(graph):
    cycles=[]
    for startnode in graph:
        for endnode in graph:
            newpaths = find_all_paths(graph, startnode, endnode)
            for path in newpaths:
                if (len(path)==len(graph)):                    
                    cycles.append(path)
    return cycles

def find_cycle(graph):
    cycles=[]
    for startnode in graph:
        for endnode in graph:
            newpaths = find_all_paths(graph, startnode, endnode)
            for path in newpaths:
                if (len(path)==len(graph)):
                    if path[0] in graph[path[len(graph)-1]]:
                        #print path[0], graph[path[len(graph)-1]]
                        path.append(path[0])
                        cycles.append(path)
    return cycles


In [None]:
cycles_ = find_paths(dict_path)

In [None]:
print (len(cycles_))

A look at the cycles, a total of 362880 cycles are found for just 10 cities, we define few functions to find the distances among the various cities. The objective is to find the distance covered for each cycle.

In [None]:
import itertools
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b) 
def euclidean_dist(a,b):
    dist = np.linalg.norm(a-b)
    return dist
def dist_cal(list_cities):
    list_ =[]
    for i in pairwise(list_cities):
        list_.append(euclidean_dist(cities_50_algo_df[cities_50_algo_df['CityId']==i[0]][['X','Y']].values,cities_50_algo_df[cities_50_algo_df['CityId']==i[1]][['X','Y']].values))   
    return np.sum(list_)

In [None]:
dist_cycles =[]
for j in cycles_:
   dist_cycles.append((j,dist_cal(j)))

In [None]:
dist_cycles

**Observations**

1. We have found unique cycles for five different cities
2. The complexity will increase if we increase the cities, for eg(for 10 cities there would be 362880 different cycles)
3. We can apply the penalty(1.1x(distance_covered)) if it is not for prime cities
4. We can use pyconcorde but the twist(1.1x(distance)) in the problem forces to think differently.
