This notebook attempts to identify random bots in the leaderboard. It uses a block frequency test on the game outcome.

Please check and comment! Any other methods to try?

The conclusion seems to be that there may be 1 or 2 random bots in the top 100, but overall all bots in the top 100 are legit.
However, just below position 100 a lot of randomness seems to be going on.

Note, that an algorithmic bot can seem random, too, when it faces random or a strong bot. And here only one particular test is performed.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

# Load and clean data

In [None]:
D=pd.read_csv("/kaggle/input/rps-competition-games-info/rps.csv")

D=D.query("Player1 != Player2")
D["round"]=D["Rounds"].str.extract("round: ([0-9]+)")
D.dropna(subset=["Movement1", "Movement2"], inplace=True)  # no idea why some are NA
D["outcome"]=D["Movement1"].sub(D["Movement2"]).mod(3).map({0:0,1:1,2:-1})
D

In [None]:
# for simplicity we take the highest score for each player to determine the ranking
player_scores = pd.concat([D.groupby("Player1")["P1UpdateScore"].max(), D.groupby("Player2")["P2UpdateScore"].max()], axis=1).max(axis=1)
player_scores

# Test for randomness

In [None]:
from scipy.stats.distributions import chi2

def block_freq(seq, block_size=30, round_var=2/3):
    seq=list(seq)
    blocks=[seq[i*block_size:(i+1)*block_size] for i in range(len(seq)//block_size)]
    block_sums=list(map(np.sum, blocks))
    block_sum_var=np.var(block_sums)
    n=len(blocks)
    pval=chi2.cdf(block_sum_var*(n-1)/(round_var*block_size), df=n-1)
    if pval > 0.5:
        pval = 1 - pval  # two-sided?!
    return pval

# Validate test on real random numbers

In [None]:
from cytoolz import partition

# Test on real random numbers; should be rather uniform distribution
pvals = [block_freq(np.random.randint(-1, 2, size=1000)) for _ in range(10000)]

num_games=30
mean_min_pval = np.mean([np.min(random_player_pvals) for random_player_pvals in partition(num_games, pvals)])
print(f"Average minimum p-value for a random player with {num_games} games: {mean_min_pval:.3g}")

plt.hist(pvals, bins="doane");



# Calculate minimum p-values on all game outcomes for each player

In [None]:
# Calculate min block freq p-values for all players
player_min_pval = pd.concat([D.groupby([col, "Game"])["outcome"].apply(block_freq).groupby(col).min() for col in ["Player1", "Player2"]], axis=1).min(axis=1)
player_min_pval

# Plot randomness score

In [None]:
import warnings
warnings.filterwarnings("ignore")

player_randomness = player_min_pval.loc[player_scores.sort_values().index].pipe(lambda d:d.rename(index={name: f"{i}. {name}" for i, name in zip(reversed(range(1, len(d)+1)), d.index)}))
player_randomness.plot.barh(figsize=(5, 0.2 * len(player_min_pval)))
plt.xlim(0, 0.2)
plt.axvline(mean_min_pval);

# If a player has won at least once convincingly, the minimum p-value should be very small
# Players with high p-values can be random
# It's meaningless how big the p-values are, though, if they are above some threshold
# But remember that algorithms may seem random, too, when they face a tough opponent

In [None]:
# same for overview
player_randomness.iloc[::-1].plot.bar(figsize=(30,5));