05-observation-based-mechanism.ipynb
======================

**Things to do**
* Test code.
* Find the number of unique cheaters who harmed other players severely.
* Make code modular.

**Sample match ID for testing**
* 000213be-6b3b-438a-8d20-c1b57b01a174 (no cheater)
* 07a471f7-4776-460d-b896-1306b98b6d19 (one cheater)
* 15e457b1-0940-47ca-a730-de0dfd1ccd77 (two cheaters)

## Load packages and read tables.

In [1]:
from pyspark.sql.functions import col, lit, when
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from pyspark.sql.types import StructType, StructField, LongType
import pubg_analysis as pubg

In [2]:
# Read a table that contains edges.
td = spark.read.parquet("s3://social-research-cheating/obs_data.parquet")
td.registerTempTable("td")

# Read a table that contains player data.
players = spark.read.parquet("s3://social-research-cheating/nodes.parquet")
players.registerTempTable("nodes")

In [3]:
# Show the first few rows of each dataset.
td.show(5)
nodes.show(5)

+--------------------+--------------------+------+--------+--------------------+------+--------+--------------------+----------+
|                 mid|                 src|src_bd|src_flag|                 dst|dst_bd|dst_flag|                time|    m_date|
+--------------------+--------------------+------+--------+--------------------+------+--------+--------------------+----------+
|000ec388-a3d7-422...|account.c84796ba7...|    NA|       0|account.5028c61d0...|    NA|       0|2019-03-03 16:14:...|2019-03-03|
|000ec388-a3d7-422...|account.cba82f51c...|    NA|       0|account.b347ea78f...|    NA|       0|2019-03-03 16:33:...|2019-03-03|
|000ec388-a3d7-422...|account.8f011f0fd...|    NA|       0|account.e2f45b350...|    NA|       0|2019-03-03 16:16:...|2019-03-03|
|000ec388-a3d7-422...|account.73324618c...|    NA|       0|account.7911e1c3d...|    NA|       0|2019-03-03 16:12:...|2019-03-03|
|000ec388-a3d7-422...|account.ba7ff30f8...|    NA|       0|account.4e93215ae...|    NA|       0|2

## 1. Count the number of motifs on the empirical network.

In [3]:
# First, assume that victims are severely harmed if they were killed after getting into the top 30 percent.
res_tab = pubg.add_level_of_harm(td, 30)
res_tab.registerTempTable("new_td")

In [4]:
records = pubg.add_more_info(res_tab, players)
records.registerTempTable("records")

In [5]:
# Get a summary table of the empirical network.
obs_sum_tab = pubg.get_obs_summary_tab(records)
# obs_sum_tab.show()

In [6]:
# Store the summary table in the S3 bucket for the later use.
obs_sum_tab.write.parquet("s3://social-research-cheating/summary-tables/emp-net/sev_obs/obs_tab_30.parquet")

The plot below shows the distribution of the number of times the player observed cheating before the transition happened.<br>
In this case, we allow duplicate pairs of cheater and observer as there are some players who have observed the same cheaters more than once.

## 2. Reuse the mapping table in the S3 bucket to create randomised networks.

In [3]:
# td = spark.read.parquet("s3://social-research-cheating/obs_data.parquet")
# td.registerTempTable("td")

# Read the mapping table.
map_tab = spark.read.parquet("s3://social-research-cheating/mapping-tables/map_tab_3.parquet")
map_tab.registerTempTable("map_tab")
map_tab.show(5)

+--------------------+--------------------+---------+--------+--------------------+---------+--------+
|            match_id|            original|orig_flag|orig_tid|          randomised|rand_flag|rand_tid|
+--------------------+--------------------+---------+--------+--------------------+---------+--------+
|07c3165b-19ca-412...|account.993a6791a...|        0|       9|account.92eb6c857...|        0|       9|
|07c3165b-19ca-412...|account.fd4c79da8...|        0|      26|account.cf37ca928...|        0|      26|
|07c3165b-19ca-412...|account.a52b830b5...|        0|      10|account.1b4381bd7...|        0|      10|
|07c3165b-19ca-412...|account.b33f85180...|        0|      18|account.f04d86c9c...|        0|      18|
|07c3165b-19ca-412...|account.4eb9ec6a8...|        0|       7|account.35b34bd44...|        0|       7|
+--------------------+--------------------+---------+--------+--------------------+---------+--------+
only showing top 5 rows



In [4]:
# Get randomised gameplay logs.
temp_rand_logs = spark.sql("""SELECT mid, src, randomised AS new_src, dst, time, m_date 
                              FROM td t JOIN map_tab m ON t.src = m.original AND t.mid = m.match_id""")
temp_rand_logs.registerTempTable("temp_rand_logs")
randomised_logs = spark.sql("""SELECT mid, new_src AS src, randomised AS dst, time, m_date 
                               FROM temp_rand_logs t JOIN map_tab m 
                               ON t.dst = m.original AND t.mid = m.match_id""")

# randomised_logs.show(5)
randomised_logs.registerTempTable("randomised_logs")

## 3. Count the number of motifs on the randomised network.

In [5]:
# Add the cheating flags of players.
add_flags = spark.sql("""SELECT mid, src, ban_date AS src_bd, cheating_flag AS src_flag, 
                         dst, time, m_date 
                         FROM randomised_logs r JOIN nodes n ON r.src = n.id""")
add_flags.registerTempTable("add_flags")

randomised_logs = spark.sql("""SELECT mid, src, src_bd, src_flag, 
                               dst, ban_date AS dst_bd, cheating_flag AS dst_flag, time, m_date 
                               FROM add_flags r JOIN nodes n ON r.dst = n.id""")
randomised_logs.registerTempTable("td")
# randomised_logs.show(5)

In [6]:
rand_logs = pubg.add_level_of_harm(randomised_logs, 30)
rand_logs.registerTempTable("new_td")
# rand_logs.show(5)

In [7]:
records = pubg.add_more_info(rand_logs, players)
records.registerTempTable("records")

In [8]:
# Get a summary table of the randomised network.
obs_sum_tab = pubg.get_obs_summary_tab(records)
# obs_sum_tab.show()

In [9]:
# Store the summary table in the S3 bucket for the later use.
obs_sum_tab.write.parquet("s3://social-research-cheating/summary-tables/rand-net/sev_obs/obs_tab_30_3.parquet")