[@yalikesifulei](https://www.kaggle.com/yalikesifulei) made a nice Notebook ["Bot statistics with Selenium & Beautiful Soup"](https://www.kaggle.com/yalikesifulei/bot-statistics-with-selenium-beautiful-soup) which shows various plots about Kaggle Simulation submissions. 

He said "Meta Kaggle dataset is not used because of extremely slow data loading." 

Kaggle datasets don't really attach until you start using them, and Kaggle has a system of caching which means it can take minutes to attach the 16GB compressed dataset. Then a couple of minutes to load the 7GB episode file into pandas.

However, in my opinion this is neater than using a web scraper and once loaded, charts per submission are evaluated in milliseconds. So, in case you want to use it, here it is.

Note that if you fork this notebook you will need to re-attach the meta kaggle dataset as it changes daily. Click the **+ Add Data** in the top right and add the Meta Kaggle dataset and you are good to go.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use('seaborn-whitegrid')
META = "../input/meta-kaggle/"

## Load Meta Kaggle datasets

In [None]:
%%time
# most of this time is taken by kaggle attaching the dataset
episodes_df = pd.DataFrame()
for i, chunk in enumerate(pd.read_csv(META + "Episodes.csv", usecols=['Id','CompetitionId'], chunksize=1e4)):
    if i>260: episodes_df = pd.concat([episodes_df,chunk], ignore_index=True)
episodes_df["CompetitionId"] = episodes_df["CompetitionId"].astype("category")
episodes_df = episodes_df[episodes_df.CompetitionId == 30067]

episodes_df.info(verbose=False, memory_usage="deep")

In [None]:
%%time
epagents_df = pd.DataFrame()
for i, chunk in enumerate(pd.read_csv(META + "EpisodeAgents.csv", usecols=['EpisodeId','Reward','SubmissionId','InitialScore','UpdatedScore'], chunksize=1e6)):
    if i>66: epagents_df = pd.concat([epagents_df,chunk], ignore_index=True)
        
epagents_df = epagents_df[epagents_df.EpisodeId.isin(episodes_df.Id)]
epagents_df.fillna(0, inplace=True)
epagents_df = epagents_df.sort_values(by=['EpisodeId'], ascending=True)

epagents_df.info(verbose=False, memory_usage="deep")

In [None]:
def getStats(SUB_ID):

    scores = epagents_df[epagents_df['SubmissionId']==SUB_ID].UpdatedScore.tolist()
    
    scores_delta = np.diff(scores)
    
    eps = sorted(epagents_df[epagents_df['SubmissionId']==SUB_ID]['EpisodeId'].values)[2:]
    us = epagents_df[(epagents_df['EpisodeId'].isin(eps)) & (epagents_df['SubmissionId']==SUB_ID)].sort_values('EpisodeId').Reward.values
    them = epagents_df[(epagents_df['EpisodeId'].isin(eps)) & (epagents_df['SubmissionId']!=SUB_ID)].sort_values('EpisodeId').Reward.values
    outcomes = []
    for u,t in zip(us,them):
        if u>t: outcomes.append(1)
        if u<t: outcomes.append(0)
        if u==t: 
            outcomes.append(0.5)
            print(u,t)

    return np.array(scores), np.array(outcomes), np.array(scores_delta)


The meta kaggle data is now loaded. It takes only milliseconds to run charts from this point.

## Setting up

The rest of the notebook is 100% the same as @yalikesifulei's notebook plots.

`SUB_ID` is the number in the end of link in form https://www.kaggle.com/c/lux-ai-2021/leaderboard?dialog=episodes-submission-23032370. It also can be seen from submission's page:

![SUB_ID](https://i.imgur.com/vniyMkL.png)

In [None]:
SUB_ID = 23032370
scores, outcomes, scores_delta = getStats(SUB_ID)

The rest of the notebook is 100% the same as @yalikesifulei's notebook plots.

## Score growth plot

In [None]:
plt.figure(figsize=(15, 8))
plt.plot(scores, label='scores')
plt.hlines(np.mean(scores), 0, len(scores), color='tab:orange', label=f'mean score {np.mean(scores):.2f}')
plt.hlines(np.median(scores), 0, len(scores), color='tab:olive', label=f'median score {np.median(scores):.0f}')

plt.scatter(np.argmax(scores), np.max(scores), color='tab:green', label=f'top score {np.max(scores)}')
plt.legend()
plt.show()

## Score changes (delta) plot

In [None]:
plt.figure(figsize=(15, 8))
plt.plot(scores_delta)

plt.scatter(np.argwhere(scores_delta > 0), scores_delta[scores_delta > 0], c='tab:green', label='Positive')
plt.scatter(np.argwhere(scores_delta < 0), scores_delta[scores_delta < 0], c='tab:red', label='Negative')

plt.hlines(0, 0, len(scores_delta), color='black', linestyles='--')
plt.title('score delta')
plt.legend()
plt.show()

## Win/Loss/Tie plot by match

In [None]:
plt.figure(figsize=(15, 8))
plt.plot(outcomes, c='lightgray', linestyle='--')

plt.scatter(np.argwhere(outcomes == 1), outcomes[outcomes == 1], c='tab:green', label='Win')
plt.scatter(np.argwhere(outcomes == 0), outcomes[outcomes == 0], c='tab:red', label='Loss')
plt.scatter(np.argwhere(outcomes == 0.5), outcomes[outcomes == 0.5], c='tab:blue', label='Tie')

plt.hlines(np.mean(outcomes), 0, len(outcomes), color='tab:orange', label='win rate')
plt.legend()
plt.title(f'win rate = {np.mean(outcomes):.3f}')
plt.show()

## Win rate change by match

In [None]:
plt.figure(figsize=(15, 8))

plt.plot(range(1, len(outcomes)+1), [sum(outcomes[:n])/n for n in range(1, len(outcomes)+1)], label='win rate')
plt.hlines(np.mean(outcomes), 1, len(outcomes), color='tab:orange', label='current win rate')
plt.title('win rate change')
plt.legend()
plt.show()