## Reading the data

In this notebook, we show how to read the dataset. 

Our dataset can be found [here](https://data.world/maxstrange/diplomacyboardgame)

In [None]:
import pandas as pd
import time
from collections import deque
import numpy as np
pd.options.mode.chained_assignment = None

In [None]:
# read the dataframes
all_games = pd.read_pickle("data/games.pkl")
all_orders = pd.read_pickle("data/orders.pkl")
all_players = pd.read_pickle("data/players.pkl")
all_turns = pd.read_pickle("data/turns.pkl")
all_units = pd.read_pickle("data/units.pkl")

In [None]:
all_games.head(3)

## Detecting betrayal

What I want to do here is **to detect betrayals within a game**, using the same definitions as in the paper we have studied. Let's recall a few things, and we will explore the dataset based on those definitions.

### What are game actions ? 

Each player has **units** (one per each city a player controls) and thoses are moved using **orders**. There are 2 kinds of orders: 
- **support** order: two units join and become bigger (i.e. stronger). One player can support another.
- movement: move a unit somewhere. If it meets another player's unit, it will be a **battle**

### How to define relationships ? 

Let's follow the definitions given by the paper.

**Act of friendship**: when a player supports another.

**Act of hostility**: When a player invades another, or if a player supports an invasion to the other player's territory.

**Friendship**: a relationship between two players spanning over 3 seasons containing at least 2 **consecutive and reciprocated** acts of friendships.

**Betrayal** / **Broken friendship**: When, after being in a friendship, two players engage in at least 2 acts of hostility. 

### Additional information required for data-processing

It is important to understand the [rules of the game](https://www.playdiplomacy.com/help.php?sub_page=Game_Rules).

Here is a list of points we want to raise before starting the programming, obtained from looking at the rules.
- Each **year** is breaked down into 2 **seasons**: {'Spring', 'Fall'}.
- Each **seasons** is itself divided into several phases, called **turns** (therefore, a year is made of at least 2 turns, and not more than 5)
    - **orders**: each player submit orders to all of its units (that can be **hold**, **move**, **support** or **convoy**)
    - **retreats**: a phase that happens when some units (called **disloged units**) need to retreat. If they can't, they are destroyed
    - **builds**: only happens after the *fall retreat*. Players gain control of SCS they are occupying.
- Geographically, the game is divided into **provinces**
- some provinces are called **supply centers** (SCS) and to win a **player** must control 18 supply centers.
- Each **unit** belongs to a **player** and there can be **only 1 unit** in a province at a time, however **units** can join their force with **support order**.
- There are 2 types of **units**:  {'F' or 'A'} for {Fleet, Army}
- Each **player** is characterized by its country, encoded by a letter: {E,F,I,G,A,T,R} standing for {England, France, Italy, Germany, Austria, Turkey, Russia}

We also give an clarification for the rows of 'all_orders' (i.e. the proper orders) because we will be using those quite a lot, and it can be hard to understand. 
- orders are defined by a **game_id**, a **unit_id** and **turn_number** (which makes sense, considering all the above points). 
- each order has a field **location** which is the province of origin of the unit
- depending on the **unit_order**, here is the description of the fields

| unit_order | location                 | target                            | target_dest     |
| ---------- | ------------------------ | --------------------------------- | --------------- |
| MOVE       | initial loc. of the unit | loc. to move to                   | null            |
| HOLD       | initial loc. of the unit | null                              | null            |
| CONVOY     |                          | initial loc.                      | end goal loc.   |
| SUPPORT    |                          | loc. of unit to be supported      | its target loc. |
| BUILD      | ""                       | encoded string like 'army Berlin' |                 |
| RETREAT    | initial loc. of the unit | target loc                        |                 |
| DESTROY    | initial loc. of the unit |                                   |                 |

### What the map looks like ! 

<img src="img/map.png" width="900">

# Discovering the Dataset

Now that all of this is well-defined, let's see what we can achieve in the code. As it can be quite hard to see how to do this, let's break this down and look at one game.

In [None]:
# extract one game
game = all_games.head(1)
game_id = game.iloc[0,0]
game

In [None]:
# for this game, extract turns, orders and units
turns = all_turns.query("game_id == {}".format(game_id))
orders = all_orders.query("game_id == {}".format(game_id))
units = all_units.query("game_id == {}".format(game_id))
orders.head()

## Finding **acts of friendships**

It's firstly defined by a support. However it is not enough: a player could support himself (and that's not a friendship). So we must also look at the **last previous orders** asking to **MOVE** the unit towards the support's **target** destination. This will link to a 'unit_id' (the one that followed this order) and therefore giving access to the country who made the call.

In [None]:
# first we must look at the supports that happened in this game.
supports = orders.unit_order == "SUPPORT"
orders_w_supports = orders[supports]
orders_w_supports.sample(3)

In [None]:
# we want to find the countries of the supported units
# let's take one and see what we can do
support_order = orders_w_supports.iloc[-4]
# support_order = orders_w_supports.head(3).tail(1)
support_order

In [None]:
# Example: there is a support from 'Vienna' to 'Bohemia' 
# we know that in one of the previous orders, someone made a move with destination 'Vienna'
target = support_order.target#.values[0]
turn_number = support_order.turn_num#.values[0]
move_order = orders.query("unit_order == 'MOVE' & target == '{}' & turn_num < {}".format(target, turn_number)).tail(1)
move_order

In [None]:
unit_id = move_order.unit_id.values[0]
move_unit = units.query("unit_id == {}".format(unit_id))
move_unit

We see that rusian was the country who had moved it's army there the last time before a support happened. Hence, 'Russia is the supported Country'.

In [None]:
# Let's look at the country who did the support
unit_id = support_order.unit_id# .values[0]
support_unit = units.query("unit_id == {}".format(unit_id))
support_unit

As it turns out this **is** an act of friendship: Russia was supported by Austria, when it moved from Vienna to Bohemia, by Austrian soldiers who were in Galicia. As we can see on the map, this is perfectly coherent with the geographical position of provinces.

## A little data pre-processing won't hurt us

Now is when we realize it would be nice to have an extra information about orders: **what country passed this order ?**. This can be seen as a pre-processing step, since it will be computed only once. 

As we will see later on, we will also need the year and the season at which the order was passed, in order to find friendships with more precisions than if we use only 'year' as indicator of when an order happens. Here is how we can encode this: 
- the spring of a year is writen as the year, for instance 'Spring 1904' = 1904
- the fall of the same year is written as the year + 0.5, for instance 'Fall 1904' = 1904.5

In [None]:
# compute the country who emitted each order 
countries = orders.unit_id.apply(lambda x: units[["unit_id", "country"]].query("unit_id == {}".format(x)).country.item())
# compute the 'encoded year' at which each order was passed
turns["season_encoded"] = turns["year"] + 0.5 * (turns["season"] == "Fall")
years = orders.turn_num.apply(lambda x: turns.query("turn_num == {}".format(x)).season_encoded.item())
# update the orders
orders["country"] = countries
orders["year"] = years

Finally, a last step we will need is more information about the 'target country'. When an order is passed, there is often another country located where the order's target is. This information is really important to us. Let's find it here.

In [None]:
def get_target_country(order):
    """
    Given an order MOVE or SUPPORT, it returns the country that controls the targeted province.
    The function looks at 'the last successful move towards the targetted province'.
    If none is found, returns None.
    If order is not MOVE or SUPPORT, it returns empty string.
    """
    if order.unit_order not in ["MOVE", "SUPPORT"]: return ""
    # get the last move orders to this province
    q = "(unit_order == 'MOVE' | unit_order == 'RETREAT') & success == 1 & target == '{}' & turn_num < {}".format(order.target, order.turn_num)
    last_move_orders = orders.query(q)
    if len(last_move_orders):
        # extract the order
        last_move_order = last_move_orders.iloc[-1]
        return last_move_order.country
    else:
        # this is when there never was a successful move toward this 
        return "None"

In [None]:
# get target country
targets = orders.apply(get_target_country, axis = 1)
orders["target_country"] = targets
orders.sample(5)

## Acts of **friendships** and Acts of **hostility**

### Finding **acts of friendship**

This function is just a rewriting of all the code which was presented above.

Note that we don't add the query criteria 'success == 1' because it's not the result which defines the intention of the act. You can plan to attack someone and fail, you will still have performed a hostile action toward the other player.

In [None]:
def is_order_act_of_friendship(support_order):
    # only interested in SUPPORT orders
    if support_order.unit_order != "SUPPORT": return False
    if support_order.target_country == "None": return False
    return support_order.country != support_order.target_country
    
    
    
    # get the last move order to this province
    #move_orders = orders.query("(unit_order == 'MOVE' | unit_order == 'RETREAT') & success == 1 & target == '{}' & turn_num < {}".format(support_order.target, support_order.turn_num))
    #if not len(move_orders): return False
    # look if the last move toward this province came from another country / player
    #move_order = move_orders.iloc[-1]
    #print(move_order)
    #return move_order.country != support_order.country    

In [None]:
# so we may add this to the orders dataframe
acts_of_friendships = orders.apply(is_order_act_of_friendship, axis = 1)
acts_of_friendships.value_counts()

Good news : it's working. As we see, there is another column 'acts_friendships' which tell if this order was an act of friendships-

### Finding **acts of hostility**

The code is quite the same, just the logic is a little tweaked. 

We are looking at all **orders** with **unit_order** that is **MOVE** (*this is how and attack starts*), where the **target** is a province with a unit of another player. Again, the tricky part is to see if there is another player's unit located at this province. We must look at all **previous orders** with the same **target** done by a unit from **another country**.

In [None]:
def is_move_act_of_hostility(move_order):
    if move_order.unit_order != "MOVE": return False
    if move_order.target_country == "None": return False
    return move_order.country != move_order.target_country
    
    # get the last move orders to this province
    #q = "unit_order == 'MOVE'  & target == '{}' & turn_num < {}".format(move_order.target, move_order.turn_num)
    #last_move_orders = orders.query(q)
    
    #if len(last_move_orders):
    #    last_move_order = last_move_orders.iloc[-1]
    #    return move_order.country != last_move_order.country

    #return False 

In [None]:
acts_of_hostility = orders.apply(is_move_act_of_hostility, axis = 1)
acts_of_hostility.value_counts()

## Who is friend with who

Now that we have classified our orders, let's find friends within our game. Coming back to our definition:

> Friendship is a relationship between two players spanning over 3 seasons containing at least 2 consecutives and reciprocated acts of friendships.

The difficulty comes from the identification of the country who received the support. Let's add another function to solve this problem which we will be able to call for both the acts of friendships and the acts of hostility. In both case, it's the **target province** which matters.

In [None]:
friendly_orders = orders[acts_of_friendships]
hostile_orders = orders[acts_of_hostility]

#supported_countries = friendly_orders.apply(get_target_country, axis = 1)
#attacked_countries = hostile_orders.apply(get_target_country, axis = 1)
#friendly_orders["target_country"] = supported_countries
#hostile_orders["target_country"] = attacked_countries

friendly_orders.head()

For each order in the above set , assuming the 'supporting country' is X and the 'supported country' is Y, we must look if there exist in the last 2 years: 
- at least 1 previous order with 'X supports Y'
- at least 2 previous orders with 'Y supports X'

Those acts of frienships are part of a **friendships**.

Let's create a table which will recap all friendships and potential betrayal, and then let's fill it up with the data already extracted. 

In [None]:
def get_countries_code(x,y):
    """Given two countries, it returns another string being their letter sorted alphabetically"""
    return ''.join(sorted(x + y))

In [None]:
countries = ['A', 'E', 'F', 'G', 'I', 'R', 'T']
pairs = [x+y for x in countries for y in countries if y > x]
years = np.arange(1901, friendly_orders.year.max() + 0.5, 0.5)

In [None]:
# minimum number of matches to validate the friendship
N = 4
# create a DF with each possible friendships and each years
friendships = pd.DataFrame(columns=pairs, index = years, dtype = np.int8).fillna(0)
for i, o in friendly_orders.iterrows():
    x = o.country 
    y = o.target_country
    year = o.year
    # this line doesn't remove frienships because we are only looking are reciprocal ones
    # if y > x: continue
    q = "( year > @year-1.5 & year <= @year)  & ((country == @x & target_country == @y) | (country == @y & target_country == @x) )"
    query = friendly_orders.query(q)
    if len(query) >= N:
        # Those guys are friends :=) 
        code = get_countries_code(x,y)
        print(len(query), "matches for ", code, " in ", o.year)
        friendships.loc[o.year-N:o.year, code] = 1

cols = [col for col in friendships.columns if np.count_nonzero(friendships[col] != 0)]
friendships[cols]

In [None]:
# minimum number of go
N = 1
friendships = pd.DataFrame(columns=pairs, index = years, dtype = np.int8).fillna(0)
for i, o in friendly_orders.iterrows():
    x = o.country 
    y = o.target_country
    # this line doesn't remove frienships because we are only looking are reciprocal ones
    if y > x: continue
    # criteria 1: 2 others from the other side
    query1 = friendly_orders.query("year > {} & year <= {} & country == '{}' & target_country == '{}'".format(o.year-2, o.year, y, x))
    if len(query1) >= N:
        # criteria 2: 1 other from the same side
        query2 = friendly_orders.query("year > {} & year <= {} & country == '{}' & target_country == '{}'".format(o.year-2, o.year, x, y))
        if len(query2) >= N:
            # Those guys are friends :=) 
            code = get_countries_code(x,y)
            friendships.loc[o.year-N:o.year, code] = 1
cols = [col for col in friendships.columns if np.count_nonzero(friendships[col] != 0)]
friendships[cols]

In this game, Russia and Austria are friends and it seems to be the case from the begining until the end. How about Italie and Austria ? What happened to their friendship after 1905 ? 

We must also investigate to 'hostile orders' and see if a betrayal happened. 

In [None]:
for i, o in hostile_orders.iterrows():
    x = o.country
    y = o.target_country
    code = get_countries_code(x,y)
    if friendships.loc[o.year, code]:
        # that means a betrayal have happened !
        # we must just verify that another hostile action happened after this
        query1 = hostile_orders.query("year >= {} & country == '{}' & target_country == '{}' ".format(o.year, x, y))
        query2 = hostile_orders.query("year >= {} & country == '{}' & target_country == '{}' ".format(o.year, y, x))
        friendships.loc[o.year, code] = x 
        friendships.loc[o.year+0.5:, code] = 0 
        print(x, " attacks ", y, "in ", o.year)
        print(len(query1))
        print(len(query2))
        
# which can be filtered to clomuns where there's someting to look at
cols = [col for col in friendships.columns if np.count_nonzero(friendships[col] != 0)]
friendships[cols]

In [None]:
def 

We see here what happened between Austria and Italy: in the Spring of 1905, Italy betrayed Austria... This is why the friendship ended. 