**Problem Statement:**

Determine optimal path for Rudolph and his team to deliver toys on Christmas Eve.

**What we know:**
* Every "prime" city leaves carrots for the reindeers; these carrots provide energy to keep a better pace; must visit every 10th step on the path.
* Paths must start and end at the North Pole (CityId = 0)
* You must visit every city exactly once
* The distance between two paths is the 2D Euclidean distance, except...
* Every 10th step (stepNumber % 10 == 0) is 10% more lengthy unless coming from a prime city

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from numpy.linalg import norm
from scipy.spatial import distance
import sklearn

santa = pd.read_csv("../input/cities.csv")
CityID = santa.iloc[:, 0]
XY = santa.iloc[:, 1:]
path = santa.iloc[0]

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
print(os.listdir("../input"))

# Any results you write to the current directory are saved as output.


df = pd.read_csv("../input/cities.csv")

**Take a look at the data:**
* 197,769 cities in the data frame

In [None]:
santa.head()

In [None]:
santa.shape

**Prime Cities:**

Prime city locations will have carrots for the reindeer so we will need identified.  The best route will stop at these locations every 10th step, if not there is a penalty of 10% since the reindeers will be slower lacking the energy they require.   

Here is a function to identify the prime cities:


In [None]:
def is_prime(num):
    if num > 1:
        for i in np.arange(2, np.sqrt(num+1)) :
            if num % i == 0:
                return 0
        
        return 1
    
    return 0

In [None]:
prime_cities = santa['CityId'].apply(is_prime)
santa['Prime'] = prime_cities

In [None]:
santa.head()

**Distance Calculation:**

Distance for paths based on Euclidean; let's calculate the distance going in order from the list of cities taking into account the prime city penalty.

Using function for distance provided in this awesome kernel by Seshadri https://www.kaggle.com/seshadrikolluri/understanding-the-problem-and-some-sample-paths 

In [None]:
def total_distance(dfcity,path):
    prev_city = path[0]
    total_distance = 0
    step_num = 1
    for city_num in path[1:]:
        next_city = city_num
        total_distance = total_distance + \
            np.sqrt(pow((dfcity.X[city_num] - dfcity.X[prev_city]),2) + pow((dfcity.Y[city_num] - dfcity.Y[prev_city]),2)) * \
            (1+ 0.1*((step_num % 10 == 0)*int(not(prime_cities[prev_city]))))
        prev_city = next_city
        step_num = step_num + 1
    return total_distance

no_path = list(santa.CityId[:].append(pd.Series([0])))
print('Total distance with no path is '+ "{:,}".format(total_distance(santa,no_path)))

**Sorted Path by distance**

In [None]:
sorted_cities = list(santa.iloc[1:,].sort_values(['X','Y'])['CityId'])
sorted_cities = [0] + sorted_cities + [0]
print('Total distance with the sorted city path is '+ "{:,}".format(total_distance(santa,sorted_cities)))

**Submit sorted path for now**

In [None]:
submission = pd.DataFrame(sorted_cities)
submission.columns = ['Path']
submission.head()
submission.to_csv('sample_submission.csv', index=False)

In [None]:
y_pred = classifier.predict(X_test)  

In [None]:
from sklearn.metrics import classification_report, confusion_matrix  
print(confusion_matrix(y_test, y_pred))  
print(classification_report(y_test, y_pred))