## 2 CHEATER PERFORMANCE ANALYSIS

To decide on a baseline for cheating detection, we compare cheaters and non-cheaters who played the game between March 1 and March 3 in terms of performance. We first assume that 651 cheaters who were banned during this period always did cheat and then compare cheaters and non-cheaters using two performance measures.<br> To see the relevant figures, turn to 'paper-general-stats-visualization.ipynb'.

In [1]:
import pandas as pd
import scipy.stats
import analyze_cheaters

In [2]:
td = spark.read.parquet("s3://social-research-cheating/cheater-analysis/data_for_cheater_analysis.parquet")
td.registerTempTable("td")

players = spark.read.parquet("s3://social-research-cheating/players.parquet")
players.registerTempTable("players")

cheaters = spark.sql("SELECT * FROM players WHERE ban_date <= '2019-03-03'")
cheaters.registerTempTable("cheaters")

### 2.1 THE AVERAGE KILL RATIO OF CHEATERS

We first calculate the average kill ratio per day and then get the overall average of the measurements for each player.

In [3]:
kills = spark.sql("SELECT mid, src, time, m_date FROM td t JOIN cheaters c ON t.src = c.id")
kills.registerTempTable("kills")

deaths = spark.sql("SELECT mid, dst, time, m_date FROM td t JOIN cheaters c ON t.dst = c.id")
deaths.registerTempTable("deaths")

cheater_kill_ratio = analyze_cheaters.get_avg_kill_ratio(kills, deaths)

print(cheater_kill_ratio['avg_kill_ratio'].mean())
print(cheater_kill_ratio['avg_kill_ratio'].median())
print(len(cheater_kill_ratio['avg_kill_ratio']))
print(cheater_kill_ratio.head(10))

0.7661
0.8235
651


### 2.2 THE AVERAGE TIME DIFFERENCE BETWEEN KILLS OF CHEATERS

Note that players should kill at least two other players to be evaluated by this measure.

In [9]:
cheater_kill_interval = analyze_cheaters.get_avg_time_diff_between_kills(cheater_kills)

print(cheater_kill_interval['delta'].mean())
print(cheater_kill_interval['delta'].median())
print(len(cheater_kill_interval['delta']))
print(cheater_kill_interval.head(10))

139.6698
123.9302
629


### 2.3 THE AVERAGE KILL RATIO OF NON-CHEATERS

In [6]:
kills = spark.sql("""SELECT mid, src, time, m_date FROM td t JOIN players p ON t.src = p.id 
                     WHERE cheating_flag = 0""")
kills.registerTempTable("kills")

deaths = spark.sql("""SELECT mid, dst, time, m_date FROM td t JOIN players p ON t.dst = p.id 
                      WHERE cheating_flag = 0""")
deaths.registerTempTable("deaths")

non_cheater_kill_ratio = analyze_cheaters.get_avg_kill_ratio(kills, deaths)

print(non_cheater_kill_ratio['avg_kill_ratio'].mean())
print(non_cheater_kill_ratio['avg_kill_ratio'].median())
print(len(non_cheater_kill_ratio['avg_kill_ratio']))
print(non_cheater_kill_ratio.head(10))

0.4045
0.4437
854153


### 2.4 THE AVERAGE TIME DIFFERENCE BETWEEN KILLS OF NON-CHEATERS

In [11]:
non_cheater_kill_interval = analyze_cheaters.get_avg_time_diff_between_kills(kills)

print(non_cheater_kill_interval['delta'].mean())
print(non_cheater_kill_interval['delta'].median())
print(len(non_cheater_kill_interval['delta']))
print(non_cheater_kill_interval.head(10))

194.109
172.6348
623678


### 2.5 COMPARING TWO GROUPS

In [13]:
print(scipy.stats.ttest_ind(cheater_kill_ratio['avg_kill_ratio'], non_cheater_kill_ratio['avg_kill_ratio'], 
                            equal_var=False))

print(scipy.stats.ttest_ind(cheater_kill_interval['delta'], non_cheater_kill_interval['delta'], equal_var=False))

Ttest_indResult(statistic=48.64290196560924, pvalue=5.2129993985436896e-219)
Ttest_indResult(statistic=-18.235341545750604, pvalue=5.033833786064087e-60)
