## Reading the data

In this notebook, we show how to read the dataset. 

Our dataset can be found [here](https://data.world/maxstrange/diplomacyboardgame).

During our analysis, we realised that 'units' have a duplication problem. For several games, all units were replicated. Hence, we remove those. By precaution, we also remove duplicate for orders and for turns. 

In [1]:
import pandas as pd
import time
from collections import deque
import numpy as np
pd.options.mode.chained_assignment = None

In [2]:
# read the dataframes
all_games = pd.read_pickle("data/games.pkl")
all_orders = pd.read_pickle("data/orders.pkl")
all_players = pd.read_pickle("data/players.pkl")
all_turns = pd.read_pickle("data/turns.pkl")
all_units = pd.read_pickle("data/units.pkl")

# remove duplicates
all_units = all_units.drop_duplicates()
#all_orders = all_orders.drop_duplicates()
#all_turns = all_turns.drop_duplicates()

In [3]:
all_games.head(3)

Unnamed: 0,id,num_turns,num_players
0,37317,166,7
1,37604,51,7
2,39337,101,7


## Detecting betrayal

What I want to do here is **to detect betrayals within a game**, using the same definitions as in the paper we have studied. Let's recall a few things, and we will explore the dataset based on those definitions.

### What are game actions ? 

Each player has **units** (one per each city a player controls) and thoses are moved using **orders**. There are 2 kinds of orders: 
- **support** order: two units join and become bigger (i.e. stronger). One player can support another.
- movement: move a unit somewhere. If it meets another player's unit, it will be a **battle**

### How to define relationships ? 

Let's follow the definitions given by the paper.

**Act of friendship**: when a player supports another.

**Act of hostility**: When a player invades another, or if a player supports an invasion to the other player's territory.

**Friendship**: a relationship between two players spanning over 3 seasons containing at least 2 **consecutive and reciprocated** acts of friendships.

**Betrayal** / **Broken friendship**: When, after being in a friendship, two players engage in at least 2 acts of hostility. 

### Additional information required for data-processing

It is important to understand the [rules of the game](https://www.playdiplomacy.com/help.php?sub_page=Game_Rules).

Here is a list of points we want to raise before starting the programming, obtained from looking at the rules.
- Each **year** is breaked down into 2 **seasons**: {'Spring', 'Fall'}.
- Each **seasons** is itself divided into several phases, called **turns** (therefore, a year is made of at least 2 turns, and not more than 5)
    - **orders**: each player submit orders to all of its units (that can be **hold**, **move**, **support** or **convoy**)
    - **retreats**: a phase that happens when some units (called **disloged units**) need to retreat. If they can't, they are destroyed
    - **builds**: only happens after the *fall retreat*. Players gain control of SCS they are occupying.
- Geographically, the game is divided into **provinces**
- some provinces are called **supply centers** (SCS) and to win a **player** must control 18 supply centers.
- Each **unit** belongs to a **player** and there can be **only 1 unit** in a province at a time, however **units** can join their force with **support order**.
- There are 2 types of **units**:  {'F' or 'A'} for {Fleet, Army}
- Each **player** is characterized by its country, encoded by a letter: {E,F,I,G,A,T,R} standing for {England, France, Italy, Germany, Austria, Turkey, Russia}

We also give an clarification for the rows of 'all_orders' (i.e. the proper orders) because we will be using those quite a lot, and it can be hard to understand. 
- orders are defined by a **game_id**, a **unit_id** and **turn_number** (which makes sense, considering all the above points). 
- each order has a field **location** which is the province of origin of the unit
- depending on the **unit_order**, here is the description of the fields

| unit_order | location                 | target                            | target_dest     |
| ---------- | ------------------------ | --------------------------------- | --------------- |
| MOVE       | initial loc. of the unit | loc. to move to                   | null            |
| HOLD       | initial loc. of the unit | null                              | null            |
| CONVOY     |                          | initial loc.                      | end goal loc.   |
| SUPPORT    |                          | loc. of unit to be supported      | its target loc. |
| BUILD      | ""                       | encoded string like 'army Berlin' |                 |
| RETREAT    | initial loc. of the unit | target loc                        |                 |
| DESTROY    | initial loc. of the unit |                                   |                 |

### What the map looks like ! 

<img src="img/map.png" width="900">

# Discovering the Dataset

Now that all of this is well-defined, let's see what we can achieve in the code. As it can be quite hard to see how to do this, let's break this down and look at one game.

In [4]:
# extract one game
game = all_games.query("id == 76749")
game_id = game.iloc[0,0]
game

Unnamed: 0,id,num_turns,num_players
7868,76749,61,7


In [5]:
# for this game, extract turns, orders and units
turns = all_turns.query("game_id == {}".format(game_id))
orders = all_orders.query("game_id == {}".format(game_id))
units = all_units.query("game_id == {}".format(game_id))
orders.head()

Unnamed: 0,game_id,unit_id,unit_order,location,target,target_dest,success,reason,turn_num
8607971,76749,0,MOVE,Edinburgh,North Sea,,1,,1
8607975,76749,1,MOVE,Liverpool,Wales,,1,,1
8607980,76749,2,MOVE,London,English Channel,,1,,1
8607983,76749,3,MOVE,Marseilles,Spain,,1,,1
8607992,76749,4,MOVE,Paris,Gascony,,1,,1


## Can we find **acts of friendships** ?

It's firstly defined by a support. However it is not enough: a player could support himself (and that's not a friendship). So we must also look at the **last previous orders** asking to **MOVE** the unit towards the support's **target** destination. This will link to a 'unit_id' (the one that followed this order) and therefore giving access to the country who made the call.

In [6]:
# first we must look at the supports that happened in this game.
supports = orders.unit_order == "SUPPORT"
orders_w_supports = orders[supports]
orders_w_supports.sample(3)

Unnamed: 0,game_id,unit_id,unit_order,location,target,target_dest,success,reason,turn_num
8608957,76749,44,SUPPORT,Berlin,Denmark,Kiel,0,Supported unit has failed,33
8608688,76749,22,SUPPORT,Norway,Norwegian Sea,Barents Sea,1,,18
8608234,76749,14,SUPPORT,Serbia,Bulgaria,Rumania,0,Dislodged by A Bulgaria - Serbia,6


In [7]:
# we want to find the countries of the supported units
# let's take one and see what we can do
support_order = orders_w_supports.iloc[-4]
# support_order = orders_w_supports.head(3).tail(1)
support_order

game_id                                  76749
unit_id                                     57
unit_order                             SUPPORT
location                                Warsaw
target                                  Moscow
target_dest                             Moscow
success                                      0
reason         Dislodged by A Ukraine - Warsaw
turn_num                                    58
Name: 8609321, dtype: object

In [8]:
# Example: there is a support from 'Vienna' to 'Bohemia' 
# we know that in one of the previous orders, someone made a move with destination 'Vienna'
target = support_order.target#.values[0]
turn_number = support_order.turn_num#.values[0]
move_order = orders.query("unit_order == 'MOVE' & target == '{}' & turn_num < {}".format(target, turn_number)).tail(1)
move_order

Unnamed: 0,game_id,unit_id,unit_order,location,target,target_dest,success,reason,turn_num
8609293,76749,49,MOVE,St. Petersburg,Moscow,,0,Bounced,56


In [9]:
unit_id = move_order.unit_id.values[0]
move_unit = units.query("unit_id == {}".format(unit_id))
move_unit

Unnamed: 0,game_id,country,type,start_turn,end_turn,unit_id
748309,76749,R,A,25,59,49


We see that rusian was the country who had moved it's army there the last time before a support happened. Hence, 'Russia is the supported Country'.

In [10]:
# Let's look at the country who did the support
unit_id = support_order.unit_id# .values[0]
support_unit = units.query("unit_id == {}".format(unit_id))
support_unit

Unnamed: 0,game_id,country,type,start_turn,end_turn,unit_id
748317,76749,R,A,35,59,57


As it turns out this **is** an act of friendship: Russia was supported by Austria, when it moved from Vienna to Bohemia, by Austrian soldiers who were in Galicia. As we can see on the map, this is perfectly coherent with the geographical position of provinces.

## A little data pre-processing won't hurt us

Now is when we realize it would be nice to have an extra information about orders. This can be seen as a pre-processing step, since it will be computed only once. 

First thing that we want to have is to know **what country passed which order**.

As we will see later on, we will also need **the year and the season at which each order was passed**, in order to find friendships with more precisions than if we use only 'year' as indicator of when an order happens. Here is how we can encode this: 
- the spring of a year is writen as the year, for instance 'Spring 1904' = 1904
- the fall of the same year is written as the year + 0.5, for instance 'Fall 1904' = 1904.5

Also, there is sometimes a problem with the data. Order with id == 89268 for instance has the 'units' that are duplicated. Possible way to solve this is to run `units.drop_duplicates()units.drop_duplicates()`. It's not the only possible problem with the 'unit' array, indeed sometimes there are duplicates that are 'almost exactly the same, but not exactly. The end turn would be sligtly different. The solution to this problem is simply to expect an error and if one is catched, to skip this game.

In [11]:
try: 
    # 1. compute the country who emitted each order 
    countries = orders.unit_id.apply(lambda x: units.query("unit_id == {}".format(x)).country.item())
    # compute the 'encoded year' at which each order was passed
    turns["season_encoded"] = turns["year"] + 0.5 * (turns["season"] == "Fall")
    years = orders.turn_num.apply(lambda x: turns.query("turn_num == {}".format(x)).season_encoded.item())
    # update the orders
    orders["country"] = countries
    orders["year"] = years
except ValueError:
    print("Data is corrupted and execution was aborted")

Finally, a last step we will need is more **information about the 'target country'**. When an order is passed, there is often another country located where the order's target is. This information is really important to us. Let's find it here.

In [12]:
def get_target_country(order, orders):
    """
    Given an order MOVE or SUPPORT, it returns the country that controls the targeted province.
    The function looks at 'the last successful move towards the targetted province'.
    If none is found, returns None.
    If order is not MOVE or SUPPORT, it returns empty string.
    """
    if order.unit_order not in ["MOVE", "SUPPORT"]: return ""
    # get the last move orders to this province
    q = "(unit_order in ('MOVE' ,'RETREAT', 'HOLD')) & success == 1 & target == '{}' & turn_num < {}".format(order.target, order.turn_num)
    last_move_orders = orders.query(q)
    if len(last_move_orders):
        # extract the order
        last_move_order = last_move_orders.iloc[-1]
        return last_move_order.country
    else:
        # this is when there never was a successful move toward this 
        return "None"

In [13]:
# get target country
targets = orders.apply(lambda o: get_target_country(o, orders), axis = 1)
orders["target_country"] = targets
orders.sample(5)

Unnamed: 0,game_id,unit_id,unit_order,location,target,target_dest,success,reason,turn_num,country,year,target_country
8609324,76749,43,HOLD,Gulf of Bothnia,,,1,,58,R,1912.5,
8608932,76749,50,SUPPORT,North Sea,Norway,Norway,0,Support cut by F English Channel - North Sea,33,E,1907.5,E
8609110,76749,64,BUILD,,army Ankara,,1,,45,T,1909.5,
8609271,76749,4,MOVE,Holland,Ruhr,,0,Bounced,56,F,1912.0,F
8608652,76749,10,MOVE,Kiel,Berlin,,0,Bounced,16,G,1904.0,G


## Acts of **friendships** and Acts of **hostility**

Now that we have a good dataframe, let's start the processing.

### Finding **acts of friendship**

This function is just a rewriting of all the code which was presented above.

Note that we don't add the query criteria 'success == 1' because it's not the result which defines the intention of the act. You can plan to attack someone and fail, you will still have performed a hostile action toward the other player.

In [14]:
def is_order_act_of_friendship(support_order):
    # only interested in SUPPORT orders
    if support_order.unit_order != "SUPPORT": return False
    if support_order.target_country == "None": return False
    return support_order.country != support_order.target_country

In [15]:
# so we may add this to the orders dataframe
acts_of_friendships = orders.apply(is_order_act_of_friendship, axis = 1)
acts_of_friendships.value_counts()

False    822
True      45
dtype: int64

Good news : it's working. As we see, there is another column 'acts_friendships' which tell if this order was an act of friendships-

### Finding **acts of hostility**

The code is quite the same, just the logic is a little tweaked. 

We are looking at all **orders** with **unit_order** that is **MOVE** (*this is how and attack starts*), where the **target** is a province with a unit of another player. Again, the tricky part is to see if there is another player's unit located at this province. We must look at all **previous orders** with the same **target** done by a unit from **another country**.

In [16]:
def is_move_act_of_hostility(move_order):
    if move_order.unit_order != "MOVE": return False
    if move_order.target_country == "None": return False
    return move_order.country != move_order.target_country

In [17]:
acts_of_hostility = orders.apply(is_move_act_of_hostility, axis = 1)
acts_of_hostility.value_counts()

False    685
True     182
dtype: int64

## Who is friend with who

Now that we have classified our orders, let's find friends within our game. Coming back to our definition:

> Friendship is a relationship between two players spanning over 3 seasons containing at least 2 consecutives and reciprocated acts of friendships.

The difficulty comes from the identification of the country who received the support. Let's add another function to solve this problem which we will be able to call for both the acts of friendships and the acts of hostility. In both case, it's the **target province** which matters.

In [18]:
friendly_orders = orders[acts_of_friendships]
hostile_orders = orders[acts_of_hostility]
friendly_orders.head()

Unnamed: 0,game_id,unit_id,unit_order,location,target,target_dest,success,reason,turn_num,country,year,target_country
8608087,76749,9,SUPPORT,Ruhr,Wales,Belgium,1,,3,G,1901.5,E
8608234,76749,14,SUPPORT,Serbia,Bulgaria,Rumania,0,Dislodged by A Bulgaria - Serbia,6,A,1902.0,T
8608255,76749,18,SUPPORT,Rumania,Bulgaria,Serbia,1,,6,R,1902.0,T
8608456,76749,29,SUPPORT,Greece,Ionian Sea,Ionian Sea,1,,11,T,1903.0,I
8608482,76749,31,SUPPORT,Budapest,Serbia,Trieste,0,Supported order does not correspond,11,R,1903.0,T


For each order in the above set , assuming the 'supporting country' is X and the 'supported country' is Y, we must look if there exist in the last 2 years: 
- at least 1 previous order with 'X supports Y'
- at least 2 previous orders with 'Y supports X'

Those acts of frienships are part of a **friendships**.

Let's create a table which will recap all friendships and potential betrayal, and then let's fill it up with the data already extracted. 

In [19]:
def get_countries_code(x,y):
    """Given two countries, it returns another string being their letter sorted alphabetically.
    This strings is the 'encoded friendship name'."""
    return ''.join(sorted(x + y))

In [20]:
def analyse_friendships(df):
    """Given friendship dataframe, it will extract insigts out of it"""
    # get columns where something happened
    cols = [col for col in df.columns if np.count_nonzero(friendships[col] != 0)]
    if len(cols):
        print("- friendships:", cols)
        for c in cols: 
            tmp = df.loc[:,c]
            pos = tmp[tmp != 0].values
            # print(pos)
            if type(pos[-1]) is str: 
                print("    * length of friendship ",c," is ", len(pos) - 1)
                print("        ->",c," ends betrayed by ",pos[-1],"") 
            else: 
                print("    * length of friendship: ", len(pos))

    else:
        print("- No friendships was found")

In [21]:
countries = ['A', 'E', 'F', 'G', 'I', 'R', 'T']
pairs = [x+y for x in countries for y in countries if y > x]
years = np.arange(1901, max(friendly_orders.year.max(), hostile_orders.year.max()) + 0.5, 0.5)
# create a dataframe each possible friendships and each years
friendships = pd.DataFrame(columns=pairs, index = years, dtype = np.int8).fillna(0)

In [22]:
def find_frienships(friendships, friendly_orders, length_year = 1.5):
    """Will fill the 'friendships' dataframe by adding '1's to signify that a pair of players engaged in a 
    reciprocal friendships
    
    Careful, the function doesn't return anything - instead it changes the 'friendship dataframe
    
    Parameters
    friendships (pd.DataFrame): df with rows as 'encoded seasons' and columns as 'encoded friendship names'.
    friendly_orders (pd.DataFrame): frame of all order with an 'act of friendship'
    length_year (int): minimum amount of time that must pass between the first AOF and the last one.
    
    Discussion
    For each order with an AOF, it will query all other ones that 
    - happened before
    - involved exactly the same 2 countries
    Then it makes sure that amongs all those orders
    - each player originated at least 2 of them
    - that the time spent between the first one and the last one was longer than `length_year` 
    Those orders are the ones defining a friendships.
    """
    # for each friendly order, finds those that defines a friendsip
    for i, o in friendly_orders.iterrows():
        x, y, year = o.country, o.target_country,  o.year
        # this line doesn't remove frienships because we are only looking are reciprocal ones
        if y > x: continue
        # make a query over others friendly orders from the past (including this one !)
        q = "year <= @year & ((country == @x & target_country == @y) | (country == @y & target_country == @x) )"
        query = friendly_orders.query(q)
        # extract number of helps from 2 countries
        n_x = np.count_nonzero(query.country == x)
        n_y = len(query) - n_x
        y_min = query.year.min()
        y_max = query.year.max()
        if n_x >= 2 and n_y >= 2 and (y_max - y_min) >= length_year:
            # Those guys are friends :=) 
            code = get_countries_code(x,y)
            # fill the dataframe with the solution that was found
            friendships.loc[query.year.min():o.year, code] = 1

In [23]:
find_frienships(friendships, friendly_orders)
analyse_friendships(friendships)

- friendships: ['FR', 'RT']
    * length of friendship:  5
    * length of friendship:  6


Let's look at Italie and Austria: What happened to their friendship after 1905 ? 

We must investigate the 'hostile orders' and see if a betrayal happened. 

In [24]:
def find_betrayals(friendships, hostile_orders, N_hostile_min = 2):
    """Will fill the 'friendships' dataframe by removing '1' when a friendship was broken and if a 
    betrayal is detect, it will add its country letter to the frame. 
    
    Careful, the function doesn't return anything - instead it changes the 'friendship dataframe
    
    Parameters
    friendships (pd.DataFrame): df with rows as 'encoded seasons' and columns as 'encoded friendship names'.
    hostile_orders (pd.DataFrame): frame of all order with an 'act of hostility'
    N_hostile_min (int): minimum number of hostile action to consider this event as a betrayal
    
    Discussion
    For each order, if the players were engage in a friendship, it will first destroy the friendship and 
    then verifiy that the even is considered as a betrayal according to our definition.
    """
    for i, o in hostile_orders.iterrows():
        x = o.country
        y = o.target_country
        code = get_countries_code(x,y)
        if friendships.loc[o.year, code]:
            # 1. it breaks the friendship
            friendships.loc[o.year+0.5:, code] = 0 
            # 2. did a betrayal happened ? 
            # we must just verify that another hostile action happened after this
            q = "year >= {}  & ((country == @x & target_country == @y) | (country == @y & target_country == @x) )".format(o.year, x, y)
            query = hostile_orders.query(q)
            if len(query) >= N_hostile_min: 
                # print("   - betrayal happened for ", code)
                friendships.loc[o.year, code] = x 

find_betrayals(friendships, hostile_orders)
analyse_friendships(friendships)

- friendships: ['FR', 'RT']
    * length of friendship  FR  is  2
        -> FR  ends betrayed by  F 
    * length of friendship  RT  is  2
        -> RT  ends betrayed by  T 


We see here what happened between Austria and Italy: in the Spring of 1905, Italy attacked Austria... This is why the friendship ended. However, it is not count as a betrayal, because our definition asks for 2 hostile actions and in this case, there was only one. 

# Scalling to several games

Until now, we have been looking at only one game, and we must start to scale the analysis to a broader range. Let's see what we can do. Fist thing to do is to make the process a little more functional by definiing methods for the different logical steps.

In [25]:
def get_dataframes(game_id):
    orders = all_orders.query("game_id == {}".format(game_id))
    turns = all_turns.query("game_id == {}".format(game_id))
    units = all_units.query("game_id == {}".format(game_id))
    return orders, turns, units

In [26]:
def get_empty_friendships(max_year):
    countries = ['A', 'E', 'F', 'G', 'I', 'R', 'T']
    pairs = [x+y for x in countries for y in countries if y > x]
    years = np.arange(1901, max_year + 0.5, 0.5)
    # create a dataframe each possible friendships and each years
    return pd.DataFrame(columns=pairs, index = years, dtype = np.int8).fillna(0)

In [27]:
def process_orders(orders, turns, units):
    try:
        # 1. compute the country who emitted each order 
        countries = orders.unit_id.apply(lambda x: units[["unit_id", "country"]].query("unit_id == {}".format(x)).country.item())
        orders["country"] = countries
        # 2. compute the 'encoded year' at which each order was passed
        turns["season_encoded"] = turns["year"] + 0.5 * (turns["season"] == "Fall")
        years = orders.turn_num.apply(lambda x: turns.query("turn_num == {}".format(x)).season_encoded.item())
        orders["year"] = years
        # 3. get target countries
        targets = orders.apply(lambda o: get_target_country(o, orders), axis = 1)
        orders["target_country"] = targets
        return True
    except ValueError:
        print("Data is corrupted and execution was aborted")
        return False

In [28]:
def analyse_game(game):
    # get dataframes and process them
    orders, turns, units = get_dataframes(game_id = game.id.item())
    has_succeeded = process_orders(orders, turns, units)
    if not has_succeeded: return None
    
    # find acts of friendships / hostility
    acts_of_friendships = orders.apply(is_order_act_of_friendship, axis = 1)
    acts_of_hostility = orders.apply(is_move_act_of_hostility, axis = 1)
    friendly_orders = orders[acts_of_friendships]
    hostile_orders = orders[acts_of_hostility]
    print("- {} acts of friendship and {} acts of hostility over {} orders ".format(len(friendly_orders), len(hostile_orders), len(orders)))
    
    # construct friendship matrix
    if len(friendly_orders) and len(friendly_orders):
        friendships = get_empty_friendships(max_year = max(friendly_orders.year.max(), hostile_orders.year.max()))
        find_frienships(friendships, friendly_orders)
        find_betrayals(friendships, hostile_orders)
        return friendships
    else: 
        return None

Now that we have all our functions, we can verify that we obtain the same results as before.

In [29]:
game = all_games.query("id == 76749")
friendships = analyse_game(game)
if friendships is not None:
    analyse_friendships(friendships)
else:
    print("No acts of friendships or no acts of hostility")

- 45 acts of friendship and 182 acts of hostility over 867 orders 
- friendships: ['FR', 'RT']
    * length of friendship  FR  is  2
        -> FR  ends betrayed by  F 
    * length of friendship  RT  is  2
        -> RT  ends betrayed by  T 


We may now try to use those results over more games ! The first thing that one observes is that according to our definition, not all games contain friendships, and most of the friendships ends - at some points - in a betrayal. This make sense with the 'spirit of the game'. Here is a quote of the creator of the game. No wonders why we observe a lot of betrayals.

> "Luck plays no part in Diplomacy. Cunning and cleverness honesty and perfectly-timed betrayal are the tools needed to outwit your fellow players. The most skillful negotiator will climb to victory over the backs of both enemies and friends. Who do you trust?"

In [None]:
# let's run this function many times
games_sample = all_games.query("num_turns >= 60").sample(10)
for i, game in games_sample.iterrows(): 
    print("\n*** Game {} ***".format(game.id.item()))
    friendships = analyse_game(game)
    if friendships is not None:
        analyse_friendships(friendships)


*** Game 78236 ***


# What is the effect of betrayal over wining ? 

## Does winner tend to betray ? 

The first thing we need is to know when each player 'dies' and who is (are) the winner(s). 