# Measuring performance of Halite 4 Runners

Currently, the performance of the online runners is much lower than one usually expects from an online competition. This notebook shows some measurements that have been done locally and on the online runners to get a feeling about what to expect.

Local evaluations are very consistent over different runs. Notebook and online evaluation show very unpredictable spikes in performance and an 80% safety margin (capping loops at 4s) is definitely not enough to prevent an agent from erroring out. Especially when starting to do heavy work in python, one should not use more than 10% of the maximum available time.

In [None]:
import matplotlib.pyplot as plt

plt.figure(tight_layout=True, figsize=(12,4))
for i, time in enumerate([13.70, 18.60, 2.7921 * 10, 42, 7 * 10]):
    plt.bar(i, 420e3 * 350 / time, .5)
plt.gca().set_xticks(range(5))
plt.gca().set_xticklabels(["workstation", "laptop", "codeforces\n(extrapolated)", "notebook", "online\n(extrapolated, optimistic)"])
plt.title("append(random()) on different hardware")
plt.ylabel("iterations/s")
plt.grid()
plt.show()

# Setup

In [None]:
import kaggle_environments

# patch renderer: https://github.com/Kaggle/kaggle-environments/pull/55
old_html = kaggle_environments.envs.halite.halite.html_renderer()
patched_html = old_html.replace(".action;", ".action || {};")
def evaluate(agent):
    environment = kaggle_environments.make("halite")
    environment.html_renderer = lambda: patched_html
    environment.run([agent] * 4)
    environment.render(mode="ipython")
    return environment



To be able to report performance we need some way to export the step timing from a running episode evaluation. Unfortunately, this information is not readily accessable right now. Replays are the only information that we get after an episode has been evaluated. Thereby we have to encode the measurements done during the evaluation as actions. A simple way to do that is to use the first 10 steps of the simulation to convert to a shipyard and spawn 9 ships that move to individual locations on the board. Also we make sure that the all agentes error out after step 350, so it won't play ranked games.


In [None]:
def filter_actions(actions):
	return { id: action for id, action in actions.items() if action is not None }

def run_base(act):
    def run(observation):
        step = observation["step"]
        player = observation["player"]
        players = observation["players"]
        my_halite, my_shipyards, my_ships = players[player]
        if step == 0:
            return { id: "CONVERT" for id in my_ships }
        if step < 10:
            return {
                **{ id: "SPAWN" for id in my_shipyards },
                **{ id: "EAST" for id in my_ships },
            }
        if step < 350 + player:
            return filter_actions(act(step, player, my_ships))
        raise Exception(42)
    return run

def idle(step, player, my_ships):
    return {}

idle_steps = evaluate(run_base(idle)).steps


Now we have 4 agents with 9 ships each that have nothing to do between steps 10 and 350. We can now use this interval to issue actions and export information about the evaluation. For example, we can let the ships dance in binary, where an even y-position means 0 (up/north) and an odd y-position means 1 (down/south). The bit position of a ship can be derived from its spawn step. The spawn step is part of the ship's id: `"{spawn_step}-{step_local_id}"`.

In [None]:
def bit(id):
	return int(id.split("-")[0]) - 2

def display(position, value):
	y = position // 21
	target = y & 1
	if target == value:
		return None
	elif target == 0:
		return "NORTH"
	elif target == 1:
		return "SOUTH"

def run_binary(measure):
    def run(step, player, my_ships):
        quantized = int(measure(step, player))
        return { id: display(my_ships[id][0], (quantized >> bit(id)) & 1) for id in my_ships }
    return run_base(run)

import math
def dance(step, player):
    return step * (player + 1) + math.sin(step * math.pi / 10) * 10

dance_steps = evaluate(run_binary(dance)).steps

All 9 ships of each player can now move independently between two positions. Thereby we can encode 9bits per player and step. The last thing left to do is to decode this information and to render some nice debug graphs.

In [None]:
def value(position):
	y = position // 21
	target = y & 1
	return target

players = 4
start_step = 10
player_labels = lambda: plt.legend(["1", "2", "3", "4"])

def decode(steps):
    counts = [[None] * players] * start_step
    for step in steps[start_step + 1:]:
        count = []
        for player in step[0]["observation"]["players"]:
            ships = player[2]
            if len(ships) == 0:
                z = None
            else:
                z = 0
                for id in ships:
                    z += value(ships[id][0]) << bit(id)
            count.append(z)
        counts.append(count)
    return counts

counts = decode(dance_steps)
figure = plt.figure(tight_layout=True, figsize=(12, 4))
for player in range(4): # encoded counts
    steps = range(50, 301, 50)
    plt.scatter(steps, [dance(step, player) for step in steps])
plt.plot(counts) # decoded counts
player_labels()
plt.xlabel("step")
plt.ylabel("value")
plt.grid()
plt.show()

print("encoded", dance(255, 0), "decoded", counts[255][0])

The diagram shows the encoded (dots) and decoded (lines) values for each player at the given timestep. Like the halite board, our encoding scheme exhibits modular wraparound. This happens whenever the value crosses the 9bit boundary at 0 and 512. When the values only change by a small amount, it is possible to recover the real absolute values that have been encoded by accumulating the change between timesteps.

In [None]:
def get_total(counts):
    unrolled_counts = list(counts[:start_step + 1])
    for last_count, count in zip(counts[start_step:-1], counts[start_step + 1:]):
        unrolled = []
        for total, y, z in zip(unrolled_counts[-1], last_count, count):
            if z is None or total is None:
                unrolled.append(None)
            else:
                delta = (z - y + 256) % 512 - 256
                unrolled.append(total + delta)
        unrolled_counts.append(unrolled)
    return unrolled_counts

total = get_total(counts)
figure = plt.figure(tight_layout=True, figsize=(12, 4))
for player in range(4): # encoded counts
    steps = range(50, 301, 50)
    plt.scatter(steps, [dance(step, player) for step in steps])
plt.plot(total) # decoded counts
player_labels()
plt.xlabel("step")
plt.ylabel("value")
plt.grid()
plt.show()

for i in [10, 11, 12, 255]:
    print(i, "encoded", dance(i, 3), "decoded", total[i][3])
for i in range(20):
    print(i, counts[i], total[i])

Another way to look at the data is to decode the change of the data instead of accumulating the changes. This is used for converting an encoded timestamp to an interval. Following code does not exploit the fact that timestamps are monotonic. To do that, you can just remove "+ 256" and "- 256".

In [None]:
def get_deltas(counts):
    deltas = [[None] * players] * (start_step + 1)
    for last_count, count in zip(counts[start_step:-1], counts[start_step + 1:]):
        unrolled = []
        for y, z in zip(last_count, count):
            if z is None:
                unrolled.append(None)
            else:
                delta = (z - y + 256) % 512 - 256
                unrolled.append(delta)
        deltas.append(unrolled)
    return deltas

deltas = get_deltas(counts)
figure = plt.figure(tight_layout=True, figsize=(12, 4))
for player in range(4): # encoded counts
    steps = range(50, 301, 50)
    plt.scatter(steps, [dance(step, player) - dance(step - 1, player) for step in steps])
plt.plot(deltas) # decoded counts
player_labels()
plt.xlabel("step")
plt.ylabel("change")
plt.grid()
plt.show()

print(len(counts), len(total), len(deltas))
for i in [11, 12, 256]:
    print("encoded", int(dance(i, 3)) - int(dance(i-1, 3)), "decoded", deltas[i][3])
for i in range(20):
    print(i, counts[i], total[i], deltas[i])

# Tests Performed



1. Verify that timeouts are consistent with the local clock. Note that the timestamps are discretized to 1/10s. Therefore we have a resolution of 1/10s and can represent +-25.6s without overflow. Every agent measures the absolute time at the beginning of its step and then sleep for .15s and on every tenth step for (step)/50s. Therefore the timeout is tested at intervals of .2s and the interval at step 300 is exactly 6s.

In [None]:
import time

def test_timeout(step, player):
    now = time.perf_counter()
    if (step % 10) == 0:
        print(player, step, step / 50)
        time.sleep(step / 50)
    else:
        time.sleep(.15)
    return now * 10

2. Perform heavy work and measure loop interval as Player 2 + 3, Measure start and end as Player 1 + 4. Note that the loop interval with a resolution of 1/100s. Therefore we can represent a change of loop duration of +-2.56s per step without overflow. The work consists of a loop that queries a random number and appends it to an array. The amount of iterations increases linearly at a rate of 420e3 iterations per step. At step 350 this means that the array will contain 147e6 items that take up 5.7gb of memory.

In [None]:
import random

def test_iterations(iterations_per_step, players):
    def test(step, player):
        now = time.perf_counter()
        if player not in players:
            print(player, step, "%.5f" % now)
            return now * 10
        else:
            z = []
            for i in range(iterations_per_step * step):
                z.append(random.random())
            then = time.perf_counter()
            interval = then - now
            print(player, step, "%.5f" % now, "%.5f" % interval)
            return interval * 100
    return test

test_performance = test_iterations(420000, [1, 2])

3. Perform heavy work and measure duration as Player 2, Measure start and end as Player 1 + 4

In [None]:
test_performance_once = test_iterations(420000, [1])

4. Same as 2. but 10x less work.

In [None]:
test_performance_light = test_iterations(42000, [1, 2])

# Hardware Specs

- Workstation:
    - 16gb DDR4
    - i9 ten-core 3.7ghz
    - (12.34s, 12.35s) for the the heavy loop at step 350 (without del z)
    - (13.69s, 13.70s) (with del z)
- Old Gaming Laptop:
    - 32gb DDR3
    - i7 quad-core 2.4ghz
    - (17.38s, 17.39s)
    - (18.58s, 18.60s)
- Kaggle Notebook:
    - 16gb
    - 36-42s
- Online Runner:
    - > 70s (estimate from graphs in the next cells)
- Codeforces: (only using 42000)
    - (2.6520170000000003, 2.650512571)
    - (2.7612177, 2.7920572359999998)

In [None]:
import time
import random

def final_loop():
    a = time.process_time()
    at = time.perf_counter()
    z = []
    for i in range(420000 * 350):
        z.append(random.random())
    del z
    b = time.process_time()
    bt = time.perf_counter()
    return b - a, bt - at

#final_loop()

# Results

In [None]:
import json

def fix_scale(data, scales):
    return [[None if x is None else x * scale for x, scale in zip(values, scales)] for values in data]

def load_result(file, scales):
    with open(file, "r") as f:
        data = json.load(f)
    steps = data["steps"]
    counts = decode(steps)
    total = fix_scale(get_total(counts), scales)
    deltas = fix_scale(get_deltas(counts), scales)
    return total, deltas

result_labels = ["workstation", "laptop", "notebook", "online"]
results_timeout = [load_result(file, [1/10, 1/10, 1/10, 1/10]) for file in [
    "/kaggle/input/halite-4-hardware-performance/local_test_timeout.json",
    "/kaggle/input/halite-4-hardware-performance/laptop_test_timeout.json",
    "/kaggle/input/halite-4-hardware-performance/notebook_test_timeout.json",
    "/kaggle/input/halite-4-hardware-performance/1577246.json",
]]
results_performance = [load_result(file, [1/10, 1/100, 1/100, 1/10]) for file in [
    "/kaggle/input/halite-4-hardware-performance/local_test_performance.json",
    "/kaggle/input/halite-4-hardware-performance/laptop_test_performance.json",
    "/kaggle/input/halite-4-hardware-performance/notebook_test_performance.json",
    "/kaggle/input/halite-4-hardware-performance/1577603.json",
]]
results_performance_once = [load_result(file, [1/10, 1/100, 1/10, 1/10]) for file in [
    "/kaggle/input/halite-4-hardware-performance/local_test_performance_once.json",
    "/kaggle/input/halite-4-hardware-performance/laptop_test_performance_once.json",
    "/kaggle/input/halite-4-hardware-performance/notebook_test_performance_once.json",
    "/kaggle/input/halite-4-hardware-performance/1578314.json",
]]
results_performance_light = [load_result(file, [1/10, 1/100, 1/100, 1/10]) for file in [
    "/kaggle/input/halite-4-hardware-performance/local_test_performance_light.json",
    "/kaggle/input/halite-4-hardware-performance/laptop_test_performance_light.json",
    "/kaggle/input/halite-4-hardware-performance/notebook_test_performance_light.json",
    "/kaggle/input/halite-4-hardware-performance/1581528.json",
]]

# 1. Timeout

In [None]:
for title, (total, deltas) in zip(result_labels, results_timeout):
    plt.figure(tight_layout=True, figsize=(12, 4))
    plt.title(title)
    plt.grid()
    plt.plot(deltas)
    player_labels()
    plt.axhline(23.2, color="black")
plt.show()


The graphs all show expected behaviour. They show that all agents sleep for .15s (therefore each agent measures `4*.15s` = .6s). And on every 10th step each agent sleeps longer. They all error out on step 300, as every agent wants to sleep for the maximum timeout of 6s. On step 290, all agents sleep for 5.8s which accumulates to `4*5.8s` = 23.2s, the peak value of player 1. This means that all agents are evaluated in sequence. Another thing to note is that players 2, 3 and 4 show lower peak values. Unlike player 1, they see the long waits distributed over 2 timesteps.

# 2. Performance

In [None]:
def pick(data, players):
    return [[x if player in players else None for player, x in enumerate(values)] for values in data]

def inverse_pick(data, players):
    return [[x if player not in players else None for player, x in enumerate(values)] for values in data]

def show_performance(results, players):
    for title, (total, deltas) in zip(result_labels, results):
        plt.figure(tight_layout=True, figsize=(24, 4))
        plt.subplot(121)
        plt.title(title)
        plt.grid()
        plt.plot([None if values[0] is None or values[3] is None else values[3] - values[0] for values in total])
        plt.legend(["start(4)-start(1)"])
        plt.subplot(122)
        plt.title(title)
        plt.grid()
        plt.plot(pick(total, players))
        player_labels()
    plt.show()

show_performance(results_performance, [1, 2])

The left diagram shows the time between entering run of the the first player and entering run of the last player. The right diagram shows the tight timing of executing the loops.  The workstation and laptop runs show a smooth linear increasing time. However the notebook and especially the online runner perform very poorly with big spikes.

# 3. Performance Single Agent

In [None]:
show_performance(results_performance_once, [1])

# 4. Performance Light

In [None]:
show_performance(results_performance_light, [1, 2])

# Performance Histogram

Following diagrams show iterations per second for each device. Black bar shows average iterations per second, red bar shows the minimum iterations per second. However, the minimum iterations per second does not include the step where the agent has failed.

In [None]:
def show_iterations_per_second(results, players, ops):
    results_speeds = []
    for total, delta in results:
        speeds = []
        for step, values in enumerate(total):
            for player in players:
                interval = values[player]
                if interval is not None:
                    speeds.append(ops * step / interval)
        results_speeds.append(speeds)
    plt.figure(tight_layout=True, figsize=(12, 4))
    plt.xlim(0, 1.5e7)
    for speeds in results_speeds:
        plt.hist(speeds, histtype="step", bins=10, weights=[1/len(speeds)]*len(speeds))
    plt.legend(result_labels)
    plt.xlabel("iterations/s")
    for speeds in results_speeds:
        plt.axvline(sum(speeds) / len(speeds), color="black")
        plt.axvline(min(speeds), color="red")
    plt.grid()
    plt.show()
    print(result_labels)
    print([sum(x)/len(x) for x in results_speeds])
    print([min(x) for x in results_speeds])
    return results_speeds

ips_performance = show_iterations_per_second(results_performance, [1, 2], 420e3)

In [None]:
ips_performance_light = show_iterations_per_second(results_performance_light, [1, 2], 42e3)

In [None]:
# Evaluation in Notebook

In [None]:
run_notebook = False

In [None]:
if run_notebook:
    result = evaluate(run_binary(test_timeout)).render(mode="json")
    with open("/kaggle/working/notebook_test_timeout.json", "w") as f:
        f.write(result)

In [None]:
if run_notebook:
    result = evaluate(run_binary(test_performance)).render(mode="json")
    with open("/kaggle/working/notebook_test_performance.json", "w") as f:
        f.write(result)

In [None]:
if run_notebook:
    result = evaluate(run_binary(test_performance_once)).render(mode="json")
    with open("/kaggle/working/notebook_test_performance_once.json", "w") as f:
        f.write(result)

In [None]:
if run_notebook:
    result = evaluate(run_binary(test_performance_light)).render(mode="json")
    with open("/kaggle/working/notebook_test_performance_light.json", "w") as f:
        f.write(result)