# Reindeer Rebellion: Traveling Santa 2018
### A Scoring change to avert a Reindeer strike
The Rudolph-proposed scoring method, while well-intentioned, gives too little weight to the Reindeer getting their prime-city-carrots, and so the Reindeer are proposing the following alternate scoring (with some talk of going on strike if it is not approved):

After receiving carrots (at the North Pole and then at any Prime City regardless of step number) the Reindeer are happy and efficient for up to 10 steps; after that the Reindeer impose a penalty starting at 5% percent and increasing by 5% with each step until the Reindeer visit a prime city.
[as suggested by **Ole Kr鰃er** at https://www.kaggle.com/c/traveling-santa-2018-prime-paths/discussion/72389 ]

For example, if Reindeer leave the North Pole and encounter no Prime Cities they will take 10 steps arriving in the 10th city and then charge a 5% penalty to go to the 11th city, a 10% penalty to contune to the 12th city, a 15% penalty to the 13th, etc. Arriving at the 15th city (having charged a 25% penalty) if that City is Prime, then they will make the next 10 steps without penalty, arriving at city 25.

### Why the Reindeer want the scoring changed

The new scoring is implemented below by: `get_score(path, coords, prime_flags, RR=False)` <br>
The `RR=False` gives the usual Rudolph scoring, and `RR=True` uses the Reindeer-proposed scoring. <br>
Taking the path generated by the basic greedy algorithm (ignoring primeness) and scoring it with these gives: <br>
* Rudolph Score: 1812602. ;    Carrots: 1786 ;    Length: 1796336.   Penalty/Length ~ 0.91 % <br>
The overall penalty is a very small fraction of the length even though the Reindeer are getting very few carrots and will have to go large stretches with no carrots.
* Reindeer Score: 2140304. ;  Carrots: 17802 ;  Length: 1796336.   Penalty/Length ~ 19.1 % <br>
Here the Reindeer get all their carrots but there are still long streches without carrots, as significantly indicated by the 19% penalty. Optimizing this score will put emphasis on giving the Reindeer their carrots uniformly thoughout the trip. 



This kernel was forked from the Viel's 'Greedy Reindeer', the beginning of which is left largely intact with deletions ("##dd) and modifications ( "#dd") described in comments.  The later "Now to the Algorithm" and "Results" sections have been extensively borrowed from and made more compact.<br>

###Notes/Diary:  (some more details in version notes at end)<br>

* 30 Nov 2018: Forked Viel's kernel, added intro, ran and tested submission(v1).<br>
Compactified the code(v4). Implemented RR option for scoring (v5) and compared scores (v6).  <br>
* 1 Dec 2018: (v7) Include distance adjustments for primes depending on stepNumber (only slightly different from Viel's penalization implementation). <br>
* 1-2 Dec 2018: Try to improve the TSP (even slightly) by encouraging an overall right-to-left "sweep". Do this by reducing `dist_array()` distances that are more to the right of city i... Not very promising and wish I hadn't committed versions 8 to 10 ;-) Commit version 11 with the best (slight) improvement this scheme made, and take a time-out to learn more about TSP.<br>


# - - - - - - - Greedy Reindeer - A Starter code... - Theo Viel - - - - - - -
### Using a greedy algorithm to solve the problem

Prime cities are taken into account

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from numpy.linalg import norm
from collections import Counter
from time import time
from matplotlib import collections  as mc

sns.set_style('whitegrid')

## Loading Data

In [None]:
df = pd.read_csv("../input/cities.csv")  #dd for running on kaggle
##df = pd.read_csv("./input/cities.csv")  #dd my local location

In [None]:
##dd df.head()

In [None]:
##dd plt.figure(figsize=(15, 10))
##dd plt.scatter(df.X, df.Y, s=1)
##dd plt.scatter(df.iloc[0: 1, 1], df.iloc[0: 1, 2], s=10, c="red")
##dd plt.grid(False)
##dd plt.show()

In [None]:
nb_cities = max(df.CityId)
print("Number of cities to visit : ", nb_cities)

In [None]:
##dd df.tail()

## Getting Prime Cities

In [None]:
#dd: change primes to prime_flags
def sieve_eratosthenes(n):
    prime_flags = [False, False] + [True for i in range(n-1)]
    p = 2
    while (p * p <= n):
        if (prime_flags[p] == True):
            for i in range(p * 2, n + 1, p):
                prime_flags[i] = False
        p += 1
    return prime_flags

In [None]:
prime_flags = np.array(sieve_eratosthenes(nb_cities)).astype(int)
df['Prime'] = prime_flags

In [None]:
#dd This is what Santa wants, but it is only applied to stepNumber % 10 == 0 cities ?!
#dd  "Not acceptable!" - R'deer
#dd This variable is only used in dist_matrix() below if penalize=True
penalization = 0.1 * (1 - prime_flags) + 1

In [None]:
df.head()

In [None]:
#dd Shows 179967 non-primes and 17802 primes
##dd plt.figure(figsize=(15, 10))
##dd sns.countplot(df.Prime)
##dd plt.title("Prime repartition : " + str(Counter(df.Prime)))
##dd plt.show()

Almost a tenth of the cities is prime, which is a good because we want to visit prime a prime city every 10 cities.

In [None]:
#dd Change colors to green, red, and black(NP)
plt.figure(figsize=(15, 10))
plt.scatter(df[df['Prime'] == 0].X, df[df['Prime'] == 0].Y, s=1, alpha=0.3, c='green')
plt.scatter(df[df['Prime'] == 1].X, df[df['Prime'] == 1].Y, s=1, alpha=0.7, c='red')
plt.scatter(df.iloc[0: 1, 1], df.iloc[0: 1, 2], s=10, c="black")
plt.grid(False)
plt.title('Visualisation of cities')
plt.show()

There are prime cities approximately all around the map. Which is a good thing as well.

## Now to the Algorithm
The contents of this section have been compacted and used below.

## Results
The contents of this section have been compacted and used below.

# - - - - - - - End of Forked Kernel - - - - - - -

## More compact implementation of the path creation

In [None]:
# Put all the city corrdinates in an np array
coordinates = np.array([df.X, df.Y])

In [None]:
# Various routines

# Assign a distance measure from city i to all others
def dist_array(coords_in, i, RightLeft=False):
    begin = np.array([df.X[i], df.Y[i]])[:, np.newaxis]
    # if RightLeft then scale/reduce x,y coords/distances
    # that are more than some distance to the right of city i.
    # This encourages not leaving cities far behind (on the right)
    # and once on the right the path will have a general trend to the left.
    if RightLeft:
        # scale the X,Y values to be smaller if to the right of city i
        coords_mod = coords_in.copy()
        # Different values tried: 500.0, 700.0, 900.0, 600.0, 400.0, 250.0, 160.0, 40.0, 16.0, 20.0, 100.0
        x_width = 600.0  # 600 is best so far
        bound_right = begin[0] + x_width
        x_far_right = 1.0*((coords_in[0] - bound_right) > 0.0)
        coords_mod[0] = bound_right + (coords_mod[0]-bound_right)*(1.0-0.75*x_far_right)
        coords_mod[1] = begin[1] + (coords_mod[1]-begin[1])*(1.0-0.50*x_far_right)
        mat = coords_mod - begin
    else:
        mat =  coords_in - begin
    return np.linalg.norm(mat, ord=2, axis=0)

# return the index of the nearest available city
def get_next_city(dist, avail):
    return avail[np.argmin(dist[avail])]

def plot_path(path, coordinates):
    # Plot tour
    lines = [[coordinates[: ,path[i-1]], coordinates[:, path[i]]] for i in range(1, len(path))]
    lc = mc.LineCollection(lines, linewidths=2)
    fig, ax = plt.subplots(figsize=(20,20))
    ax.set_aspect('equal')
    plt.grid(False)
    ax.add_collection(lc)
    ax.autoscale()
    # add the North Pole location
    plt.scatter(coordinates[0][0], coordinates[1][0], s=150, c="red", marker="*", linewidth=3)
    # and first cities on the path
    plt.scatter(coordinates[0][path[1:10]], coordinates[1][path[1:10]], s=15, c="black")
    plt.show()
    
# Calculate the Score, Carrots, Length (RR=True to select the Reindeer Rebellion scoring)
def get_score(path, coords, prime_flags, RR=False):
    # RR=True calculates the Reindeer preferred scoring
    score = 0
    carrots = 0 
    length = 0
    steps_since_carrot = 0
    for i in range(1, len(path)):
        begin = path[i-1]
        end = path[i]
        distance = np.linalg.norm(coords[:, end] - coords[:, begin], ord=2)
        length += distance
        # Choose the scoring method:
        if not RR:
            # Usual scoring, is this one of the 10th-city steps?
            if i % 10 == 0:
                # if the starting city is prime then a carrot and no penalties
                if prime_flags[begin]:
                    carrots += 1
                # if not prime, no carrot and a penalty
                else:
                    distance *= 1.1
            score += distance
        else:
            # RR scoring
            steps_since_carrot += 1
            if prime_flags[end]:
                # got carrots here!
                carrots += 1
                steps_since_carrot = 0
            # any penalty?
            if steps_since_carrot > 10:
                distance *= (1.0 + 0.05*(steps_since_carrot - 10))
            score += distance
    return score, carrots, length

In [None]:
# Initialize the left_cities

# All cities:
Nth = 1;  city_start = 1

# Can use only every Nth city for quicker testing
##Nth = 37; city_start = 1
# primes: 1-0.09, 1+0.03
#    Rudolph Score: 242742  Penalty frac:  0.79 %  Carrots: 57   Length: 240818 .
#   Reindeer Score: 286664  Penalty frac: 19.03 %  Carrots: 492   Length: 240818 .
# No prime considerations
#    Rudolph Score: 243210 Penalty frac:  0.88 %  Carrots: 61  
#   Reindeer Score: 287413 Penalty frac: 19.22 %  Carrots: 492 (all of them)

# All odd cities - about twice as many carrots !
##Nth = 50;  city_start = 1
# All even cities - no carrots :(
##Nth = 50;  city_start = 4

# Select the cities desired
left_cities = np.array(df.CityId)[city_start: :Nth]

# Or put in a know set of cities to test scoring
# "The carrot run"
##left_cities = np.array([2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67])
# "No carrots"
##left_cities = 2*np.array(list(range(1,20)))

print("Number of cities besides the NP: ", len(left_cities), "  Total primes:",
      sum(prime_flags[left_cities]))

In [None]:
# Initialize the path, etc.
path = [0]
current_city = 0
stepNumber = 1
t0 = time()

if len(left_cities) < 15000:
    show_every = 1000
else:
    show_every = 10000

# For Rudolph scoring:
# factor to reduce prime distance to account for prime's no-penalty advantage
prime_reduce = (1.0 - 0.09*prime_flags)
# factor to increase prime distance when a prime doesn't matter (save them for when it matters)
prime_increase = (1.0 + 0.03*prime_flags)

# And loop though the cities
while left_cities.size > 0:
    if stepNumber % show_every == 0: # We print the progress of the algorithm
        print(f"Time elapsed : {time() - t0} - Number of cities left : {left_cities.size}")
    # Compute the distance matrix
    ##distances = dist_array(coordinates, current_city, RightLeft=False) # same as Viel's
    distances = dist_array(coordinates, current_city, RightLeft=True) # modified distances
    # Encourage a prime every 10th city (%10==9) by reducing prime's distances
    if stepNumber % 10 == 9:
        distances = distances * prime_reduce  # reduce distance for primes
    else:
        distances = distances * prime_increase  # increase distance for primes
    # Get the closest city and go to it
    current_city = get_next_city(distances, left_cities)
    # Update the list of not visited cities
    left_cities = np.setdiff1d(left_cities, np.array([current_city]))
    # Append the city to the path
    path.append(current_city)
    # Add one step
    stepNumber += 1
    
# Add the North Pole and we're done
path.append(0)
print(f"Loop lasted {(time() - t0) // 60} minutes ")

In [None]:
# Show the path
plot_path(path, coordinates)

In [None]:
# Show the Rudolph Score results
score, carrots, length = get_score(path, coordinates, prime_flags, RR=False)
# and without going back to the NP
score_noNP, dummy1, dummy2 = get_score(path[:-1], coordinates, prime_flags, RR=False)

print("Rudolph Score:", int(score), "   Carrots:", carrots, "   Length:", int(length), ".\n" +
      " Penalty frac:", int(10000*(score-length)/length)/100,
      "%   Final step to NP has distance ", int(score - score_noNP))


In [None]:
# Show the *** Reindeer Rebellion *** Score results
score, carrots, length = get_score(path, coordinates, prime_flags, RR=True)
# and without going back to the NP
score_noNP, dummy1, dummy2 = get_score(path[:-1], coordinates, prime_flags, RR=True)

print("Reindeer Score:", int(score), "  Carrots:", int(carrots), "  Length:", int(length), ".\n" +
      " Penalty frac:", int(10000*(score-length)/length)/100,
      "%   Final step to NP has distance ", int(score - score_noNP))

In [None]:
# Output the path to a file that we can submit
if True:
    submission = pd.DataFrame({"Path": path})
    submission.to_csv("submission.csv", index=None)

### Summary of versions and outputs: <br>

* Recommended Version:
* **version 7**: include distance adjustments to primes based on `stepNumber % 10 == 9`. This is similar to (and a very slight improvement over) what Viel's result using penalization. <br>
Rudolph Score: 1811546 ;    Carrots: 2751 ;    Length: 1796014.   Penalty/Length ~ 0.86 % <br>
Reindeer Score: 2125349. ;  Carrots: 17802 ;  Length: 1796014.   Penalty/Length ~ 18.3 % <br>
<br>
* Other versions:
* version 1 <br>
Solution scored  1811964. <br>
Final step to NP has distance  4259 <br>
* version 4 (same as Viel's but without any penalization in path selection, increased by 700ish): <br>
Solution scored  1812602. with a length of  1796336 . <br>
Final step to NP has distance  2189. <br>
* version 5,6: same path solution; comparing the scoring of the Rudolph and Reindeer methods: <br>
Rudolph Score: 1812602. ;    Carrots: 1786 ;    Length: 1796336.   Penalty/Length ~ 0.9 % <br>
Reindeer Score: 2140304. ;  Carrots: 17802 ;  Length: 1796336.   Penalty/Length ~ 19.1 % <br>
<br>
* versions 8-11: `dist_array()` is modified so that points that are more than X_width units to the right of city i have their x,y values (and hence distances) scaled w.r.t. city i by 0.25x, 0.50y to encourage not leaving points far to the right. This is supposed to cause the path to get to the right and then generally wander to the left where the NP is. <br>
<br>
   x_width: +++  Rudolph Score: 1811546    Carrots: 2751    Length: 1796014   Penalty/Length ~ 0.86 % <br>
   x_width: 900  Rudolph Score: 1811543    Carrots: 2751    Length: 1796012   Penalty/Length ~ 0.86 % <br>
   x_width: 700  Rudolph Score: 1810758    Carrots: 2760    Length: 1795395   Penalty/Length ~ 0.85 % <br>
   **x_width: 600 Rudolph Score: 1810034    Carrots: 2749    Length: 1794670   Penalty/Length ~ 0.85 %** <br>
   x_width: 500  Rudolph Score: 1812581    Carrots: 2755    Length: 1797040   Penalty/Length ~ 0.86 % <br>
   x_width: 400  Rudolph Score: 1810903    Carrots: 2761    Length: 1795567   Penalty/Length ~ 0.85 % <br>
   x_width: 250  Rudolph Score: 1811318    Carrots: 2749    Length: 1795979   Penalty/Length ~ 0.85 % <br>
   x_width: 160  Rudolph Score: 1813114    Carrots: 2741    Length: 1797953   Penalty/Length ~ 0.84 % <br>
   x_width: 100  Rudolph Score: 1811804    Carrots: 2674    Length: 1796233   Penalty/Length ~ 0.86 % <br>
   x_width: _40  Rudolph Score: 1816430    Carrots: 2688    Length: 1800851   Penalty/Length ~ 0.86 % <br>
   x_width: _20  Rudolph Score: 1815703    Carrots: 2757    Length: 1800129   Penalty/Length ~ 0.86 % <br>
   x_width: _16  Rudolph Score: 1817859    Carrots: 2676    Length: 1802083   Penalty/Length ~ 0.87 % <br>
<br>
   x_width: +++  Reindeer Score: 2125349   Carrots: 17802   Length: 1796014   Penalty/Length ~ 18.33 % <br>
   x_width: 900  Reindeer Score: 2125370   Carrots: 17802   Length: 1796012   Penalty/Length ~ 18.33 % <br>
   x_width: 700  Reindeer Score: 2128203   Carrots: 17802   Length: 1795395   Penalty/Length ~ 18.53 % <br>
   **x_width: 600 Reindeer Score: 2125915   Carrots: 17802   Length: 1794670   Penalty/Length ~ 18.45 %** <br>
   x_width: 500  Reindeer Score: 2126214   Carrots: 17802   Length: 1797040   Penalty/Length ~ 18.31 % <br>
   x_width: 400  Reindeer Score: 2123749   Carrots: 17802   Length: 1795567   Penalty/Length ~ 18.27 % <br>
   x_width: 250  Reindeer Score: 2123493   Carrots: 17802   Length: 1795979   Penalty/Length ~ 18.23 % <br>
   X_width: 160  Reindeer Score: 2122414   Carrots: 17802   Length: 1797953   Penalty/Length ~ 18.04 % <br>
   X_width: 100  Reindeer Score: 2119948   Carrots: 17802   Length: 1796233   Penalty/Length ~ 18.02 % <br>
   X_width: _40  Reindeer Score: 2141576   Carrots: 17802   Length: 1800851   Penalty/Length ~ 18.92 % <br>
   X_width: _20  Reindeer Score: 2133879   Carrots: 17802   Length: 1800129   Penalty/Length ~ 18.54 % <br>
   X_width: _16  Reindeer Score: 2132361   Carrots: 17802   Length: 1802083   Penalty/Length ~ 18.32 % <br>
