One of the fundamental parts for a Kore agent is to calculate the Kore harvest along various routes.
What is the fastest way to do these calculations?
We will explore a few options.  We will assume that we can precompute route information once ahead of time.  That is, we have a set of routes we will examine and we can precompute those before our first agent() call.

* Edited (v2):  Added np.einsum thanks to @robga
* Edited (v3):  Don't be alarmed at the possible weird shape of the graph below.  The timing depends on other things the processor cores are doing, and I have 
seen some strange things sometimes.  When I run it to save the notebook, the shape changes, so I don't know how to comment on the shape that you will get when you see it.
* Edited (v5):  Added some more cases thanks to @robga and @qihuaz

We will compare the following methods.  Note that certains ones (mat vector and mat mult) do 1000 routes at a time.
1. **mat_mult** Storing the kore levels in a matrix and storing the route in another 21x21 matrix, where the entries in the matrix are the fraction of the kore that gets harvested for the entire route, e.g. 0.10 for an 8-ship fleet.  Note that we could have this value increase by 2% each step along the route at no additional calculation cost.
2. **mat_lookup** Storing the kore levels in a matrix and using a list of positions for the route, and then using a list comprehension to sum up the kore harvest along the route.
3. **mat_index** Using advanced indexing, we can do the matrix lookup without list comprehension
4. **dict_lookup** Store the kore levels as a dict mapping position->kore and using a list of positions for the route
5. **dict_incr** same but have an increasing weight at 2% per step representing the growth in kore
6. **mat_einsum** same as mat mult, but use np.einsum to vectorize 1000 routes at once.
7. **mat_vector** similar to mat_einsum, but we use array broadcasting to multiply element-wise the 21x21 kore field times the 21x21x1000 routes and then sum along proper dimensions

A few more assumptions:
* About 200 of the 441 cells have positive kore

In [None]:
import numpy as np # linear algebra
import pprint
import sys
import numpy as np
import random
import timeit
import matplotlib.pyplot as plt


In [None]:
#Just to check data sizes:
m16=np.zeros((21,21),dtype=np.float16)
m32=np.zeros((21,21),dtype=np.float32)
m64=np.zeros((21,21),dtype=np.float64)
amt=4.5
print(f'm16 {m16.dtype}')
print(f'm32 {m32.dtype}')
print(f'm64 {m64.dtype}')
print(np.finfo(np.float64))
print(f'float {sys.float_info}')


Looks like a np.float64 is the same as a python 'float'

Here is our compare function that tries all the methods.  It also uses 3 different matrix datatypes: 16, 32 and 64 bit entries for the matrix.

In [None]:
def compare_float(N):
  #N is the length of a route
  kore=dict()
  #assume 200 squares have positive kore
  mat_kore16 = np.zeros((21,21), dtype=np.float16)
  mat_kore32 = np.zeros((21,21), dtype=np.float32)
  mat_kore64 = np.zeros((21,21), dtype=np.float64)
  mat_kore64_dim3 = mat_kore64.reshape(21,21,1)   # Need to reshape so broadcasting will work below
  for i in range(200):
    #put random core in a spot
    x,y=random.randint(0,20),random.randint(0,20)
    amt=random.random()*100
    mat_kore16[(x,y)]=amt
    mat_kore32[(x,y)]=amt
    mat_kore64[(x,y)]=amt
    kore[(x,y)]=amt
  #Set the routes
  mat_route16 = np.zeros((21,21), dtype=np.float16)      
  mat_route32 = np.zeros((21,21), dtype=np.float32)
  mat_route64 = np.zeros((21,21), dtype=np.float64)
  big_route=np.zeros((21,21,1000),dtype=np.float64)
  route=[]
  for i in range(N):
    #The route, use a float
    x,y=random.randint(0,20),random.randint(0,20)
    mat_route16[x,y]=.30
    mat_route32[x,y]=.30
    mat_route64[x,y]=.30    
    for j in range(1000):
      big_route[x,y,j]=0.30
    route.append((x,y))
  #make a list of P values increasing at 2% per step  
  plist=[.10*(1.02)**i for i in range(N)]  
  #Make the route into a matrix for mat_lookup method
  route_mat=np.array(route,dtype=np.int32)
  #How long does it take to compute the kore along the route of length N
  def comp_mat_mult16():
    #a single matrix multiply with precomputed route and kore matrix
    return (mat_kore16 * mat_route16).sum()
  def comp_mat_mult32():
    #a single matrix multiply with precomputed route and kore matrix
    return (mat_kore32 * mat_route32).sum()
  def comp_mat_mult64():
    #a single matrix multiply with precomputed route and kore matrix
    return (mat_kore64 * mat_route64).sum()
  def comp_mat_lookup():
    p=0.10
    #use a matrix to look up the kore and sum
    return p*sum([ mat_kore32[x,y] for x,y in route ])
  def comp_mat_index():
    #a use advanced indexing to so the lookup with sum
    return mat_kore64[route_mat[:,0],route_mat[:,1]].sum()
  def comp_dict_lookup():
    #Use a dict to look up the kore and sum
    p=0.10
    return p*sum([ kore.get((x,y),0) for x,y in route ])
  def comp_dict_incr():
    #Use a dict to look up the kore and sum, but have weights increase at .02 per step
    return sum([ p*kore.get((x,y),0) for p,(x,y) in zip(plist,route) ])
  def comp_mat_einsum():
    return np.einsum('ij,ijk->k',mat_kore64,big_route)
  def comp_mat_vector():
    return np.sum(mat_kore64_dim3 * big_route, axis=(0,1))
        
        
  num=10000
  results=[]
  results.append(timeit.timeit(comp_mat_mult16,  number=num))
  results.append(timeit.timeit(comp_mat_mult32,  number=num))
  results.append(timeit.timeit(comp_mat_mult64,  number=num))
  results.append(timeit.timeit(comp_mat_lookup,  number=num))
  results.append(timeit.timeit(comp_mat_index,  number=num))
  results.append(timeit.timeit(comp_dict_lookup, number=num))
  results.append(timeit.timeit(comp_dict_incr, number=num))
  results.append(timeit.timeit(comp_mat_einsum, number=num)/1000)   # because we are doing 1000 routes at once
  results.append(timeit.timeit(comp_mat_vector, number=num)/1000)   # because we are doing 1000 routes at once

  #convert to microseconds per iteration
  results=[1e6*x/num for x in results]
  labels=['mat mult 16','mat mult 32','mat mult 64', 'mat lookup','mat index','dict lookup','dict incr','mat einsum','mat vector']
  return results, labels

In [None]:
#Let's look at the numbers before plotting them:
res,labels=compare_float(10)
res

Remember that last small number.  Remember its 1/1000 of the time to do 1000 routes.

In [None]:

vals=[]
results=[]
for N in range(4,32,2):
  #N=int(N)
  print(f'N={N}')
  res,labels=compare_float(N)
  results.append(res)
  vals.append(N)
data=np.array(results)
for i, label in enumerate(labels):
  plt.plot(vals,data[:,i],label=label,marker='.')
#plt.ylim(0,8e-6)
plt.title('time per route calculation vs route length')
plt.ylabel('microseconds per route')
plt.xlabel('length of route in steps')
plt.legend()
plt.show()

RESULTS:
EDIT:  Clearly, if you can precompute the routes and can group them in large batches, einsum is the way to go

For single routes at a time:
For shorter routes, the dict lookup is faster, but has more overhead if you want to carefully calculate the kore with 2% growth.  For a longer route, say a 5x5 box (around 20 steps), the 2% growth results in a 50% increase of kore towards the end of the route, so it might be important depending on how accurate you want your results.

The surprising result here is that float16 math is quite slow.