# ADA Project

Now that we understand the dataset, we are going to try to answer our research questions in this notebook.

In [1]:
import pandas as pd
import time
from collections import deque
import numpy as np
pd.options.mode.chained_assignment = None

In [2]:
# read the dataframes
all_games = pd.read_pickle("data/games.pkl")
all_orders = pd.read_pickle("data/orders.pkl")
all_players = pd.read_pickle("data/players.pkl")
all_turns = pd.read_pickle("data/turns.pkl")
all_units = pd.read_pickle("data/units.pkl")

# remove duplicates
all_units = all_units.drop_duplicates()

In [3]:
countries = ['A', 'E', 'F', 'G', 'I', 'R', 'T']
pairs = [x+y for x in countries for y in countries if y > x]

# What is the effect of betrayal over winning ? 

We look at the outcome **winning** (binary) considering the treatment which can be 
- betrayer: player who betrayed another one
- betrayed: player who ended up betrayed by another
- 'neutrals': player who was not engaged in a friendship
- (?) best_friend: player engaged


In order to answer this question, let's define a few functions that we will use in the later analysis.

In [4]:
def get_betrayers_and_betrayed(friendships):
    """Given the Friendships dataframe as define in our analysis, returns all the players who commited 
    betrayals and all players who ended up betrayed"""
    cols = [col for col in friendships.columns if np.count_nonzero(friendships[col] != 0)]
    betrayers = []
    betrayeds = []
    for c in cols: 
        tmp = friendships[c]
        values = tmp[tmp != 0].values
        if type(values[-1]) == str: 
            betrayer = values[-1]
            betrayers.append(betrayer)
            tmp = list(c)
            tmp.remove(betrayer)
            betrayeds.append(tmp[0])
            
    return betrayers, betrayeds

In [5]:
def get_neutrals(betrayers, betrayeds):
    """Given betrayers and betrayeds players of a game, returns the list of players
    who were not involved in a broken friendships"""
    neutrals = countries.copy()
    for b in betrayers: 
        if b in neutrals: neutrals.remove(b)
    for b in betrayeds: 
        if b in neutrals: neutrals.remove(b)
    return neutrals

In [6]:
def get_winners(game_id):
    winner = all_players.query("game_id == @game_id & won == 1")
    return winner.country.values

In [7]:
def get_losers(winners):
    loosers = countries.copy()
    for w in winners: loosers.remove(w)
    return loosers

In [8]:
# load the data to analyse
games_id = np.load("data/subset2/games_id.npy")
all_friendships = np.load("data/subset2/friendships.npy", allow_pickle=True)
verbose = False

data = np.zeros(shape = (3,2))
treatments = ["betrayer", "betrayed", "neutral"]
outcomes = ["winner", "loser"]
stats = pd.DataFrame(data, index = treatments, columns = outcomes )

N = len(games_id)
for i, game_id in enumerate(games_id):
    
    # reconstruct the obtained data
    data = all_friendships[i]
    years = np.arange(1901, 1901 + data.shape[0] * 0.5, 0.5)
    friendships = pd.DataFrame(data = all_friendships[i], columns = pairs, index = years)
    
    # get labels
    winners = get_winners(game_id)
    losers = get_losers(winners)
    betrayers, betrayeds = get_betrayers_and_betrayed(friendships)
    neutrals = get_neutrals(betrayers, betrayeds)
    
    # statistics 
    for winner in winners: 
        if winner in betrayers: stats.loc["betrayer", "winner"] += 1
        if winner in betrayeds: stats.loc["betrayed", "winner"] += 1
        if winner in neutrals: stats.loc["neutral", "winner"] += 1
    for loser in losers: 
        if loser in betrayers: stats.loc["betrayer", "loser"] += 1
        if loser in betrayeds: stats.loc["betrayed", "loser"] += 1
        if loser in neutrals: stats.loc["neutral", "loser"] += 1
            
    if verbose:
        print("\nGame",i)
        print("Winners: ", winners, " and Losers", losers)
        print("Betrayers: ", betrayers, " and Betrayed", betrayeds)
        print("Neutrals: ", neutrals)

win_ratio = stats.winner / (stats.loser + stats.winner)
stats["win_ratio"] = win_ratio
stats

Unnamed: 0,winner,loser,win_ratio
betrayer,74.0,105.0,0.413408
betrayed,39.0,140.0,0.217877
neutral,438.0,2706.0,0.139313


What can we see here ? 
- among all players that were involved in a broken friendship (either *betrayer* or *betrayed*) the chances of wins go towards the betrayer. Betrayed player have much higher chances of loosing and about 5 times less chances of winning. . .. 
- the neutral players represents the majority of players, however their chances of wining are not much bigger than the chances of win than betrayed people. 

This results makes us strongly believe that **betrayals strongly influences the outcome of the game**. 

What can we do next ? 

- select 250 games with  and without betrayals (and all with friendships), then do a matching based on the games properties, and look at what differs once a betrayal happened for the players who were engaged in a friendship !

- quantify agressivity of players towards others, and using the same dataset as before, try to see what happens to a player that was betrayed

# Paired experiment

Paired study over 500 friendships: 250 that didn't ended up in betrayal, and 250 that did. For each friendship, we have
- the game_id
- the players involved : betrayer / betrayeds
- the winners / the losers 

and things we can do the matching upon (*using which outcome ? Maybe 'is one of the player winner'*)
- length of the friendship
- length of the game
- ? average score of the 2 players when they become friends. 
- ? average score .. when they quit being friends.

Then we can attribute to each player a status 'betrayer' or 'betrayed'. 

And finally we could look at the following statistics 
- probability of winning of one of the player.  
- average agressivity of one of the player towards the other. 

## Study design

In order to do this study, we must obtain a new dataset. In the notations, 's' means 'start of friendship', 'e' means 'end of friendship' and 'f' means 'final' (ends of the game).

- desired columns: "game_ids", "has_betrayed", "betrayer", "betrayed", "winners", "friendship_length", "game_length", "score_s", "score_e", "score_f", "agressivity_s", "agressivity_e"
- each row is one game ...

Method
1. Construct the dataset
2. Propensity score matching using 'has_betrayed' as outcome
3. Observing effect over the 2 groups: "control" and "treated"

# Potential plots
