<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Starting-State" data-toc-modified-id="Starting-State-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Starting State</a></span></li><li><span><a href="#Middle-State-(1)" data-toc-modified-id="Middle-State-(1)-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Middle State (1)</a></span></li><li><span><a href="#Middle-State-(2)" data-toc-modified-id="Middle-State-(2)-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Middle State (2)</a></span></li><li><span><a href="#Final-State" data-toc-modified-id="Final-State-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Final State</a></span></li><li><span><a href="#Other-tests" data-toc-modified-id="Other-tests-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Other tests</a></span></li></ul></div>

In [1]:
import numpy as np
import pandas as pd
import json


# 10 (start game, no meaningful difference between most of actions), 36, 44, 52(terminal state)

In [2]:
class State:
    def __init__(self, features_dict):
        self.features_dict = features_dict
        
    def distance(self, state):
        distance = 0
        for key, feature in self.feautres_dict.items():
            distance += (state.features_dict[key] - feature) ** 2
            
        return math.sqrt(distance)
    
    def feature_difference(self, state):
        difference_dict = {}
        for key, feature in self.features_dict.items():
            difference_dict[key] = feature - state.features_dict[key]
            
        return difference_dict
    
    def percentage_difference(self, state):
        percentage_dict = {}
        for key, feature in self.features_dict.items():
            difference = feature - state.features_dict[key]
            if abs(state.features_dict[key]) < 0.0001:
                percentage_dict[key] = 'infinite'
            else:
                percentage_dict[key] = 100 * (difference) / abs(state.features_dict[key])
        return percentage_dict
        
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.features_dict == other.features_dict
        else:
            return False

In [3]:
def get_action_path(df, node_name=None, action_name=None, is_max=True, state_col='Game_Features'):
    # (Name, Action_Name,  State)
    action_sequences = []
    root_node = df[df['Parent_Name'] == 'None'].iloc[0]
    action_sequences.append((root_node.Name, root_node.Action_Name, State(json.loads(root_node[state_col]))))
    if node_name:
        filter_df = df[(df['Parent_Name'] == root_node.Name) & (df['Name'] == node_name)]
    elif action_name:
        filter_df = df[(df['Parent_Name'] == root_node.Name) & (df['Action_Name'] == action_name)]
    else:
        filter_df = df[df['Parent_Name'] == root_node.Name]
    
    while filter_df.size:
        if is_max:
            current_row = filter_df.iloc[filter_df.Value.argmax()]
        else:
            current_row = filter_df.iloc[filter_df.Value.argmin()]
        action_sequences.append((current_row.Name, current_row.Action_Name, State(json.loads(current_row[state_col]))))
        filter_df = df[df['Parent_Name'] == current_row.Name]
    
    return action_sequences

In [4]:
def get_path_feature_difference(path, rel_tol=0.0001):
    feature_difference = path[-1][-1].feature_difference(path[0][-1])
    percentage_difference = path[-1][-1].percentage_difference(path[0][-1])
    feature_differences = {}
    percentage_differences = {}
    
    for name, difference in feature_difference.items():
        if abs(difference) > 0.0001:
            feature_differences[name] = difference
            percentage_differences[name] = percentage_difference[name]
            
        
    return feature_differences, percentage_differences

In [5]:
def generate_explanation_list(features_differences, percentage_differences, 
                         features=None, percentage_differences_B=None):
    if features == None:
        features = features_differences.keys()
    
    feature_explanation = []
    addition = "" 
    
    for feature in features:
        feature_difference = features_differences[feature]
        percentage_difference = percentage_differences[feature]
        if percentage_difference == 'infinite' or not percentage_differences_B:
            percentage_difference = percentage_difference
        else:
            percentage_difference = (percentage_difference + percentage_differences_B[feature]) / 2
            addition = 'around '
        
        if feature_difference > 0:
            if percentage_difference != 'infinite':
                feature_explanation.append(f"{feature} increase by {addition}{abs(percentage_difference):.1f}%")
            else:
                feature_explanation.append(f"{feature} increase")
        else:
            if percentage_difference != 'infinite':
                feature_explanation.append(f"{feature} decrease by {addition}{abs(percentage_difference):.1f}%")
            else:
                feature_explanation.append(f"{feature} decrease")  
                
    return feature_explanation

def explanation_by_paths(df):
    # Get the root node name and action space
    root_node_name = df[df['Parent_Name'] == 'None'].iloc[0].Name
    action_space = df[df.Parent_Name == root_node_name].shape[0]
    print(f"There are {action_space} actions available for this state.")
    
    # Check the value of all possible action, if the value is the same, it is just a random decision
    if len(df[df.Parent_Name == root_node_name].Value.unique()) == 1:
        print("The value of every action is the same.")
    
    # Get Best action name
    best_action_name = df[df['Parent_Name'] == 'None'].Best_Action.values[0]
    
    # Get the path that will result in the highest reward and state difference between root node and last node
    best_action_paths = get_action_path(df, action_name=best_action_name)
    ba_feature_differences, ba_percentage_differences = get_path_feature_difference(best_action_paths)
    
    # Get the number of executed actions in the path
    actions_number = len(best_action_paths) - 1
    print(f"The explaination is based on the result of executing {actions_number} action(s) by the agent.")
    
    # Get the path that will result in the lowest reward and state difference between root node and last node
    worse_action_paths = get_action_path(df, action_name=best_action_name, is_max=False)
    worse_action_paths = worse_action_paths[:len(best_action_paths)]
    wa_feature_differences, wa_percentage_differences  = get_path_feature_difference(worse_action_paths)
    
    # Generate the explanation
    # Situation 1: No difference between the best value path and the worst value path
    if ba_feature_differences == wa_feature_differences:
        feature_explanation = generate_explanation_list(ba_feature_differences, ba_percentage_differences)
        print(f"After executing {best_action_name}, I expected {', '.join(feature_explanation[:-1])} and {feature_explanation[-1]} in the future state.")
        return
    
    # Situation 2: Difference between the best value path and the worst value path
    # Classify the feautres
    common_features = set()
    for feature in ba_feature_differences.keys() & wa_feature_differences.keys():
        ba_differences = ba_feature_differences[feature]
        wa_differences = wa_feature_differences[feature]
        if ba_differences == 'infinite' or wa_differences == 'infinite':
            if ba_differences == wa_differences:
                common_features.add(feature)
        elif (ba_differences > 0 and wa_differences > 0) or (ba_differences < 0 and wa_differences < 0):
            common_features.add(feature)
    ba_exlcude_features = ba_feature_differences.keys() - common_features
    wa_exlcude_features = wa_feature_differences.keys() - common_features
    
    feature_explanation = generate_explanation_list(ba_feature_differences, ba_percentage_differences, common_features, wa_percentage_differences)
    best_feature_explanation = generate_explanation_list(ba_feature_differences, ba_percentage_differences, ba_exlcude_features)
    worse_feature_explanation = generate_explanation_list(wa_feature_differences, wa_percentage_differences, wa_exlcude_features)
                   
    # Combine explanations based on different situations
    union_sentence = f"{', '.join(feature_explanation[:-1])} and {feature_explanation[-1]}" if len(feature_explanation) > 1 else feature_explanation[0]
    print(f"After executing {best_action_name}, I expect {union_sentence} in the future state.", end=" ")

    if len(best_feature_explanation):
        best_sentence = f"{', '.join(best_feature_explanation[:-1])} and {best_feature_explanation[-1]}" if len(best_feature_explanation) > 1 else best_feature_explanation[0]
        print(f"In the best situation, I also expect {best_sentence}.", end=" ")

    if len(worse_feature_explanation):
        worst_sentence = f"{', '.join(worse_feature_explanation[:-1])} and {worse_feature_explanation[-1]}" if len(worse_feature_explanation) > 1 else worse_feature_explanation[0]
        print(f"However, I may expect {worst_sentence} in the worst situation.")

In [6]:
def counterfactual_explanation_by_paths(df, action_name=None):
    # Get the root node name and action space
    root_node_name = df[df['Parent_Name'] == 'None'].iloc[0].Name
    
    # Get children list
    children_df = df[df['Parent_Name'] == root_node_name]
    children_list = children_df.Action_Name.values
    if len(children_list) == 1:
        print("There is only one action left for this state. Unable to generate counterfactual explanation.")
        return
    
    print(f"There are {len(children_list)} actions available for this state.")
    
    # Check the value of all possible action, if the value is the same, it is just a random decision
    if len(df[df.Parent_Name == root_node_name].Value.unique()) == 1:
        print("The value of every action is the same.")
    
    # Get Best action name
    best_action_name = df[df['Parent_Name'] == 'None'].Best_Action.values[0]
        
    # Get second action name
    if action_name:
        # If action_name is provided and it does not meet the requirement, print message and return
        if action_name not in children_list or action_name == best_action_name:
            if action_name not in children_list:
                print(f"The selected action {action_name} is not available for this state.")
            if action_name == best_action_name:
                print(f"The selected action {action_name} is not available for this state.")
            return
    else:  
        # Get the second highest action name if action_name is not provided
        if not action_name:
            orders = children_df.Value.argsort().values[::-1]
            action_name = children_df.iloc[orders[1]].Action_Name
            # Exception case (all value is the same and current selection is the best action)
            if action_name == best_action_name:
                action_name = children_df.iloc[orders[1]].Action_Name
                
    print(f"Counterfactual explanation based on {best_action_name} and {action_name}")
    
    # Get the best value path of best action
    best_action_paths = get_action_path(df, action_name=best_action_name)
     
    # Get the best value path of selected action (or second heighest value action)
    comparision_action_paths = get_action_path(df, action_name=action_name)
    
    # Get path length and shorten the path
    path_len = min(len(best_action_paths), len(comparision_action_paths))
    best_action_paths = best_action_paths[:path_len]
    comparision_action_paths = comparision_action_paths[:path_len]
    
    ba_feature_differences, ba_percentage_differences = get_path_feature_difference(best_action_paths)
    ca_feature_differences, ca_percentage_differences  = get_path_feature_difference(comparision_action_paths)
    
    # Get the number of executed actions in the path
    print(f"The explaination is based on the result of executing {path_len - 1} action(s) by the agent.")
    
    # Generate the explanation
    # Situation 1: No difference between two paths
    if ba_feature_differences == ca_feature_differences:
        feature_explanation = generate_explanation_list(ba_feature_differences, ba_percentage_differences)
        print(f"After executing {best_action_name} or {action_name}, I expected {', '.join(feature_explanation[:-1])} and {feature_explanation[-1]} in the future state.")
        return
    
    # Situation 2: Difference between two paths
    # Classify the feautres
    common_features = set()
    for feature in (ba_feature_differences.keys() & ca_feature_differences.keys()):
        ba_differences = ba_feature_differences[feature]
        ca_differences = ca_feature_differences[feature]
        if ba_differences == 'infinite' or ca_differences == 'infinite':
            if ba_differences == ca_differences:
                common_features.add(feature)
        elif (ba_differences > 0 and ca_differences > 0) or (ba_differences < 0 and ca_differences < 0):
            common_features.add(feature)
    ba_exlcude_features = ba_feature_differences.keys() - common_features
    ca_exlcude_features = ca_feature_differences.keys() - common_features
    
    if len(common_features):
        feature_explanation = generate_explanation_list(ba_feature_differences, ba_percentage_differences, common_features, ca_percentage_differences)
        union_sentence = f"{', '.join(feature_explanation[:-1])} and {feature_explanation[-1]}" if len(feature_explanation) > 1 else feature_explanation[0]
        print(f"After executing {best_action_name} or {action_name}, I expect {union_sentence} in the future state.", end=" ")
    
    if len(ba_exlcude_features):
        best_feature_explanation = generate_explanation_list(ba_feature_differences, ba_percentage_differences, ba_exlcude_features)
        best_sentence = f"{', '.join(best_feature_explanation[:-1])} and {best_feature_explanation[-1]}" if len(best_feature_explanation) > 1 else best_feature_explanation[0]
        print(f"By executing {best_action_name}, I also expect {best_sentence}.", end=" ")
    
    if len(ca_exlcude_features):
        comparision_feature_explanation = generate_explanation_list(ca_feature_differences, ca_percentage_differences, ca_exlcude_features)
        comparision_sentence = f"{', '.join(comparision_feature_explanation[:-1])} and {comparision_feature_explanation[-1]}" if len(comparision_feature_explanation) > 1 else comparision_feature_explanation[0]
        print(f"By executing {action_name}, I will expect {comparision_sentence}.") 

## Starting State

In [7]:
df = pd.read_csv("DotsAndBoxes/record_10.csv", sep='\t')
best_action_name = df[df['Parent_Name'] == 'None'].Best_Action.values[0]
df.head()

Unnamed: 0,Depth,Name,Value,Visits,Parent_Name,Game_State,Game_Features,Game_State_Heuristic,Action_Name,Best_Action
0,0,Node_1,0.0,263,,"{""Edge_Owner_6061"":-1,""Edge_Owner_6263"":-1,""Ed...","{""SCORE"":0.05,""SCORE_ADV"":0.1,""ORDINAL"":0.5,""O...",1.0,,"(0,1) -> (0,2)"
1,1,Node_2,0.0,4,Node_1,"{""Edge_Owner_6061"":-1,""Edge_Owner_6263"":-1,""Ed...","{""SCORE"":0.05,""SCORE_ADV"":0.1,""ORDINAL"":0.5,""O...",1.0,"(0,1) -> (0,2)","(7,2) -> (7,3)"
2,1,Node_3,0.0,4,Node_1,"{""Edge_Owner_6061"":-1,""Edge_Owner_6263"":-1,""Ed...","{""SCORE"":0.05,""SCORE_ADV"":0.1,""ORDINAL"":0.5,""O...",1.0,"(0,3) -> (0,4)","(1,2) -> (2,2)"
3,1,Node_4,0.0,4,Node_1,"{""Edge_Owner_6061"":-1,""Edge_Owner_6263"":-1,""Ed...","{""SCORE"":0.05,""SCORE_ADV"":0.1,""ORDINAL"":0.5,""O...",1.0,"(0,0) -> (1,0)","(3,4) -> (4,4)"
4,1,Node_5,0.0,4,Node_1,"{""Edge_Owner_6061"":-1,""Edge_Owner_6263"":-1,""Ed...","{""SCORE"":0.05,""SCORE_ADV"":0.1,""ORDINAL"":0.5,""O...",1.0,"(0,2) -> (1,2)","(2,0) -> (2,1)"


In [8]:
best_action_paths = get_action_path(df, action_name='(0,3) -> (0,4)')
best_action_path_differences, _ = get_path_feature_difference(best_action_paths)
print(f"There are {len(best_action_paths) - 1} actions are exectued.")
for feature, difference in best_action_path_differences.items():
    print(feature, difference)

There are 2 actions are exectued.
NO_BOXES -4.0
ONE_BOXES 3.0
TWO_BOXES 1.0


In [9]:
worse_action_paths = get_action_path(df, action_name=best_action_name, is_max=False)
worse_action_path_differences, _ = get_path_feature_difference(worse_action_paths)
print(f"There are {len(worse_action_paths) - 1} actions are exectued.")
for feature, difference in worse_action_path_differences.items():
    print(feature, difference)

There are 2 actions are exectued.
NO_BOXES -3.0
TWO_BOXES 3.0


In [10]:
explanation_by_paths(df)

There are 65 actions available for this state.
The value of every action is the same.
The explaination is based on the result of executing 2 action(s) by the agent.
After executing (0,1) -> (0,2), I expected NO_BOXES decrease by 21.4% and TWO_BOXES increase by 60.0% in the future state.


In [11]:
childrens = df[df['Parent_Name'] == 'Node_1'].Action_Name.values
childrens

array(['(0,1) -> (0,2)', '(0,3) -> (0,4)', '(0,0) -> (1,0)',
       '(0,2) -> (1,2)', '(0,4) -> (1,4)', '(1,2) -> (1,3)',
       '(1,4) -> (1,5)', '(1,1) -> (2,1)', '(1,3) -> (2,3)',
       '(1,5) -> (2,5)', '(2,2) -> (3,2)', '(3,0) -> (3,1)',
       '(3,2) -> (3,3)', '(3,4) -> (3,5)', '(3,3) -> (4,3)',
       '(3,5) -> (4,5)', '(4,1) -> (4,2)', '(4,0) -> (5,0)',
       '(4,2) -> (5,2)', '(4,4) -> (5,4)', '(5,0) -> (5,1)',
       '(5,2) -> (5,3)', '(5,4) -> (5,5)', '(5,1) -> (6,1)',
       '(5,3) -> (6,3)', '(5,5) -> (6,5)', '(6,1) -> (6,2)',
       '(6,3) -> (6,4)', '(6,0) -> (7,0)', '(6,2) -> (7,2)',
       '(7,0) -> (7,1)', '(7,2) -> (7,3)', '(7,4) -> (7,5)',
       '(0,1) -> (1,1)', '(0,3) -> (1,3)', '(1,1) -> (1,2)',
       '(1,3) -> (1,4)', '(1,0) -> (2,0)', '(1,2) -> (2,2)',
       '(2,0) -> (2,1)', '(2,2) -> (2,3)', '(2,4) -> (2,5)',
       '(2,1) -> (3,1)', '(2,5) -> (3,5)', '(3,1) -> (3,2)',
       '(3,0) -> (4,0)', '(3,2) -> (4,2)', '(3,4) -> (4,4)',
       '(4,4) -> (4,5)',

In [12]:
counterfactual_explanation_by_paths(df)

There are 65 actions available for this state.
The value of every action is the same.
Counterfactual explanation based on (0,1) -> (0,2) and (7,2) -> (7,3)
The explaination is based on the result of executing 2 action(s) by the agent.
After executing (0,1) -> (0,2) or (7,2) -> (7,3), I expect NO_BOXES decrease by around 17.9% and TWO_BOXES increase by around 60.0% in the future state. By executing (7,2) -> (7,3), I will expect ONE_BOXES decrease by 13.3% and THREE_BOXES increase.


In [13]:
counterfactual_explanation_by_paths(df, '(4,3) -> (5,3)')

There are 65 actions available for this state.
The value of every action is the same.
Counterfactual explanation based on (0,1) -> (0,2) and (4,3) -> (5,3)
The explaination is based on the result of executing 2 action(s) by the agent.
After executing (0,1) -> (0,2) or (4,3) -> (5,3), I expect NO_BOXES decrease by around 17.9% and TWO_BOXES increase by around 70.0% in the future state. By executing (4,3) -> (5,3), I will expect ONE_BOXES decrease by 20.0% and THREE_BOXES increase.


## Middle State (1)

In [14]:
df = pd.read_csv("DotsAndBoxes/record_36.csv", sep='\t')
best_action_name = df[df['Parent_Name'] == 'None'].Best_Action.values[0]
df.head()

Unnamed: 0,Depth,Name,Value,Visits,Parent_Name,Game_State,Game_Features,Game_State_Heuristic,Action_Name,Best_Action
0,0,Node_1,0.404,564,,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":0.8,""SCORE_ADV"":1.6,""ORDINAL"":0.5,""OU...",16.0,,"(2,0) -> (2,1)"
1,1,Node_2,0.0,14,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":1,""Edge...","{""SCORE"":0.8,""SCORE_ADV"":1.6,""ORDINAL"":0.5,""OU...",16.0,"(1,0) -> (2,0)","(7,3) -> (7,4)"
2,1,Node_3,1.2,76,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":0.85,""SCORE_ADV"":1.7,""ORDINAL"":0.5,""O...",17.0,"(2,0) -> (2,1)","(3,2) -> (4,2)"
3,1,Node_4,1.15,71,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":0.85,""SCORE_ADV"":1.7,""ORDINAL"":0.5,""O...",17.0,"(3,2) -> (4,2)","(2,0) -> (2,1)"
4,1,Node_5,0.222,18,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":0.8,""SCORE_ADV"":1.6,""ORDINAL"":0.5,""OU...",16.0,"(3,5) -> (4,5)","(2,0) -> (2,1)"


In [15]:
best_action_paths = get_action_path(df, action_name=best_action_name)
best_action_path_differences, _  = get_path_feature_difference(best_action_paths)
print(f"There are {len(best_action_paths) - 1} actions are exectued.")
for feature, difference in best_action_path_differences.items():
    print(feature, difference)

There are 3 actions are exectued.
SCORE 0.1499999999999999
SCORE_ADV 0.2999999999999998
TWO_BOXES -2.0
THREE_BOXES -1.0
OWNED_FILLED_BOXES 3.0


In [16]:
worse_action_paths = get_action_path(df, action_name=best_action_name, is_max=False)
worse_action_path_differences, _  = get_path_feature_difference(worse_action_paths)
print(f"There are {len(worse_action_paths) - 1} actions are exectued.")
for feature, difference in worse_action_path_differences.items():
    print(feature, difference)

There are 3 actions are exectued.
SCORE 0.04999999999999993
NO_BOXES -2.0
TWO_BOXES -4.0
THREE_BOXES 4.0
OPPONENTS_FILLED_BOXES 1.0
OWNED_FILLED_BOXES 1.0


In [17]:
explanation_by_paths(df)

There are 28 actions available for this state.
The explaination is based on the result of executing 3 action(s) by the agent.
After executing (2,0) -> (2,1), I expect SCORE increase by around 12.5%, TWO_BOXES decrease by around 33.3% and OWNED_FILLED_BOXES increase by around 12.5% in the future state. In the best situation, I also expect SCORE_ADV increase by 18.7% and THREE_BOXES decrease by 50.0%. However, I may expect NO_BOXES decrease by 66.7%, THREE_BOXES increase by 200.0% and OPPONENTS_FILLED_BOXES increase in the worst situation.


In [18]:
counterfactual_explanation_by_paths(df)

There are 28 actions available for this state.
Counterfactual explanation based on (2,0) -> (2,1) and (3,2) -> (4,2)
The explaination is based on the result of executing 3 action(s) by the agent.
After executing (2,0) -> (2,1) or (3,2) -> (4,2), I expect SCORE_ADV increase by around 15.6%, SCORE increase by around 15.6%, OWNED_FILLED_BOXES increase by around 15.6% and TWO_BOXES decrease by around 33.3% in the future state. By executing (2,0) -> (2,1), I also expect THREE_BOXES decrease by 50.0%. By executing (3,2) -> (4,2), I will expect NO_BOXES decrease by 33.3%, ONE_BOXES increase by 20.0% and THREE_BOXES increase by 100.0%.


## Middle State (2)

In [19]:
df = pd.read_csv("DotsAndBoxes/record_44.csv", sep='\t')
best_action_name = df[df['Parent_Name'] == 'None'].Best_Action.values[0]
df.head()

Unnamed: 0,Depth,Name,Value,Visits,Parent_Name,Game_State,Game_Features,Game_State_Heuristic,Action_Name,Best_Action
0,0,Node_1,0.16,570,,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.15,""SCORE_ADV"":2.3,""ORDINAL"":0.5,""O...",23.0,,"(3,5) -> (4,5)"
1,1,Node_2,0.0741,27,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.15,""SCORE_ADV"":2.3,""ORDINAL"":0.5,""O...",23.0,"(5,2) -> (6,2)","(6,1) -> (6,2)"
2,1,Node_3,0.257,35,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.15,""SCORE_ADV"":2.3,""ORDINAL"":0.5,""O...",23.0,"(5,4) -> (6,4)","(5,2) -> (6,2)"
3,1,Node_4,0.0741,27,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.15,""SCORE_ADV"":2.3,""ORDINAL"":0.5,""O...",23.0,"(5,5) -> (6,5)","(5,4) -> (6,4)"
4,1,Node_5,0.0,24,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":0,""Edge...","{""SCORE"":1.15,""SCORE_ADV"":2.3,""ORDINAL"":0.5,""O...",23.0,"(6,2) -> (6,3)","(5,2) -> (6,2)"


In [20]:
best_action_paths = get_action_path(df, action_name=best_action_name)
best_action_path_differences, _  = get_path_feature_difference(best_action_paths)
print(f"There are {len(best_action_paths) - 1} actions are exectued.")
for feature, difference in best_action_path_differences.items():
    print(feature, difference)

There are 3 actions are exectued.
SCORE 0.10000000000000009
SCORE_ADV 0.20000000000000018
ONE_BOXES -1.0
TWO_BOXES -2.0
THREE_BOXES 1.0
OWNED_FILLED_BOXES 2.0


In [21]:
worse_action_paths = get_action_path(df, action_name=best_action_name, is_max=False)
worse_action_path_differences, _  = get_path_feature_difference(worse_action_paths)
print(f"There are {len(worse_action_paths) - 1} actions are exectued.")
for feature, difference in worse_action_path_differences.items():
    print(feature, difference)

There are 3 actions are exectued.
SCORE_ADV -0.19999999999999973
NO_BOXES -2.0
ONE_BOXES -1.0
TWO_BOXES -1.0
THREE_BOXES 2.0
OPPONENTS_FILLED_BOXES 2.0


In [22]:
explanation_by_paths(df)

There are 19 actions available for this state.
The explaination is based on the result of executing 3 action(s) by the agent.
After executing (3,5) -> (4,5), I expect TWO_BOXES decrease by around 21.4%, ONE_BOXES decrease by around 33.3% and THREE_BOXES increase in the future state. In the best situation, I also expect SCORE_ADV increase by 8.7%, SCORE increase by 8.7% and OWNED_FILLED_BOXES increase by 8.7%. However, I may expect SCORE_ADV decrease by 8.7%, NO_BOXES decrease by 100.0% and OPPONENTS_FILLED_BOXES increase in the worst situation.


In [23]:
counterfactual_explanation_by_paths(df)

There are 19 actions available for this state.
Counterfactual explanation based on (3,5) -> (4,5) and (6,1) -> (6,2)
The explaination is based on the result of executing 3 action(s) by the agent.
After executing (3,5) -> (4,5) or (6,1) -> (6,2), I expect SCORE_ADV increase by around 6.5%, SCORE increase by around 6.5%, ONE_BOXES decrease by around 50.0%, TWO_BOXES decrease by around 28.6%, THREE_BOXES increase and OWNED_FILLED_BOXES increase by around 6.5% in the future state. By executing (6,1) -> (6,2), I will expect NO_BOXES decrease by 50.0%.


## Final State

In [24]:
df = pd.read_csv("DotsAndBoxes/record_52.csv", sep='\t')
best_action_name = df[df['Parent_Name'] == 'None'].Best_Action.values[0]
df.head()

Unnamed: 0,Depth,Name,Value,Visits,Parent_Name,Game_State,Game_Features,Game_State_Heuristic,Action_Name,Best_Action
0,0,Node_1,12.0,505,,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.35,""SCORE_ADV"":2.7,""ORDINAL"":0.5,""O...",27.0,,"(5,4) -> (5,5)"
1,1,Node_2,9.09,34,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":1,""Edge...","{""SCORE"":1.35,""SCORE_ADV"":2.1,""ORDINAL"":0.5,""O...",27.0,"(5,4) -> (6,4)","(5,4) -> (5,5)"
2,1,Node_3,12.1,80,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":0,""Edge...","{""SCORE"":1.4,""SCORE_ADV"":2.8,""ORDINAL"":0.5,""OU...",28.0,"(6,2) -> (6,3)","(5,4) -> (5,5)"
3,1,Node_4,10.3,47,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.4,""SCORE_ADV"":2.8,""ORDINAL"":0.5,""OU...",28.0,"(6,3) -> (7,3)","(5,4) -> (5,5)"
4,1,Node_5,10.7,51,Node_1,"{""Edge_Owner_6061"":0,""Edge_Owner_6263"":-1,""Edg...","{""SCORE"":1.4,""SCORE_ADV"":2.8,""ORDINAL"":0.5,""OU...",28.0,"(7,4) -> (7,5)","(5,4) -> (5,5)"


In [25]:
best_action_paths = get_action_path(df, action_name=best_action_name)
best_action_path_differences, _ = get_path_feature_difference(best_action_paths)
print(f"There are {len(best_action_paths) - 1} actions are exectued.")
for feature, difference in best_action_path_differences.items():
    print(feature, difference)

There are 1 actions are exectued.
SCORE_ADV -0.8000000000000003
OUR_TURN -1.0
HAS_WON 1.0
FINAL_ORD 0.5
TWO_BOXES -4.0
THREE_BOXES -4.0
OPPONENTS_FILLED_BOXES 8.0


In [26]:
worse_action_paths = get_action_path(df, action_name=best_action_name, is_max=False)
worse_action_path_differences, _ = get_path_feature_difference(worse_action_paths)
print(f"There are {len(worse_action_paths) - 1} actions are exectued.")
for feature, difference in worse_action_path_differences.items():
    print(feature, difference)

There are 1 actions are exectued.
SCORE_ADV -0.8000000000000003
OUR_TURN -1.0
HAS_WON 1.0
FINAL_ORD 0.5
TWO_BOXES -4.0
THREE_BOXES -4.0
OPPONENTS_FILLED_BOXES 8.0


In [27]:
explanation_by_paths(df)

There are 7 actions available for this state.
The explaination is based on the result of executing 1 action(s) by the agent.
After executing (5,4) -> (5,5), I expected SCORE_ADV decrease by 29.6%, OUR_TURN decrease by 100.0%, HAS_WON increase, FINAL_ORD increase, TWO_BOXES decrease by 100.0%, THREE_BOXES decrease by 100.0% and OPPONENTS_FILLED_BOXES increase in the future state.


In [28]:
counterfactual_explanation_by_paths(df)

There are 7 actions available for this state.
Counterfactual explanation based on (5,4) -> (5,5) and (5,3) -> (5,4)
The explaination is based on the result of executing 1 action(s) by the agent.
After executing (5,4) -> (5,5) or (5,3) -> (5,4), I expect TWO_BOXES decrease by around 62.5% in the future state. By executing (5,4) -> (5,5), I also expect SCORE_ADV decrease by 29.6%, THREE_BOXES decrease by 100.0%, OPPONENTS_FILLED_BOXES increase, OUR_TURN decrease by 100.0%, HAS_WON increase and FINAL_ORD increase. By executing (5,3) -> (5,4), I will expect SCORE_ADV increase by 3.7%, SCORE increase by 3.7% and OWNED_FILLED_BOXES increase by 3.7%.


## Other tests

In [29]:
df = pd.read_csv("DotsAndBoxes/record_6.csv", sep='\t')
explanation_by_paths(df)

There are 73 actions available for this state.
The value of every action is the same.
The explaination is based on the result of executing 2 action(s) by the agent.
After executing (4,1) -> (4,2), I expected NO_BOXES decrease by 20.0%, ONE_BOXES increase by 42.9% and TWO_BOXES increase by 100.0% in the future state.


In [30]:
df = pd.read_csv("DotsAndBoxes/record_20.csv", sep='\t')
explanation_by_paths(df)

There are 50 actions available for this state.
The explaination is based on the result of executing 2 action(s) by the agent.
After executing (0,3) -> (0,4), I expect ONE_BOXES decrease by around 10.7% and THREE_BOXES increase in the future state. In the best situation, I also expect SCORE_ADV increase by 16.7%, SCORE increase by 16.7%, OWNED_FILLED_BOXES increase by 16.7% and TWO_BOXES decrease by 22.2%. However, I may expect NO_BOXES decrease by 16.7% and TWO_BOXES increase by 11.1% in the worst situation.


In [31]:
df = pd.read_csv("DotsAndBoxes/record_33.csv", sep='\t')
explanation_by_paths(df)

There are 32 actions available for this state.
The explaination is based on the result of executing 2 action(s) by the agent.
After executing (2,1) -> (3,1), I expect TWO_BOXES decrease by around 16.7% and THREE_BOXES increase in the future state. In the best situation, I also expect SCORE_ADV increase by 14.3%, SCORE increase by 14.3% and OWNED_FILLED_BOXES increase by 14.3%. However, I may expect NO_BOXES decrease by 25.0% and ONE_BOXES decrease by 20.0% in the worst situation.


In [32]:
df = pd.read_csv("DotsAndBoxes/record_46.csv", sep='\t')
explanation_by_paths(df)

There are 16 actions available for this state.
The explaination is based on the result of executing 3 action(s) by the agent.
After executing (4,4) -> (4,5), I expect SCORE_ADV increase by around 6.3%, SCORE increase by around 6.3%, NO_BOXES decrease by around 100.0% and OWNED_FILLED_BOXES increase by around 6.2% in the future state. In the best situation, I also expect ONE_BOXES increase by 50.0% and THREE_BOXES decrease by 50.0%. However, I may expect ONE_BOXES decrease by 50.0% and THREE_BOXES increase by 100.0% in the worst situation.


In [33]:
df = pd.read_csv("DotsAndBoxes/record_50.csv", sep='\t')
explanation_by_paths(df)

There are 11 actions available for this state.
The explaination is based on the result of executing 3 action(s) by the agent.
After executing (6,4) -> (6,5), I expect SCORE increase by around 5.6%, ONE_BOXES decrease by around 100.0%, TWO_BOXES decrease by around 75.0%, NO_BOXES decrease by around 100.0% and OWNED_FILLED_BOXES increase by around 5.6% in the future state. In the best situation, I also expect SCORE_ADV decrease by 14.8%, OPPONENTS_FILLED_BOXES increase, THREE_BOXES decrease by 100.0%, HAS_WON increase and FINAL_ORD increase. However, I may expect SCORE_ADV increase by 3.7% and THREE_BOXES increase by 150.0% in the worst situation.


In [34]:
df = pd.read_csv("MCTS_test_0.csv", sep='\t')
explanation_by_paths(df)

There are 3 actions available for this state.
The explaination is based on the result of executing 5 action(s) by the agent.
After executing Baron - compare the cards with player 1, I expect SCORE increase, GUARD_DISCARD decrease by around 83.3%, ROUND increase, BARON_DISCARD decrease by around 100.0%, DRAW_DECK increase by around 70.0%, PRINCE decrease by around 100.0% and BARON decrease by around 100.0% in the future state. In the best situation, I also expect SCORE_ADV increase, PRINCESS increase, CARDS increase by 88.9% and COUNTESS_DISCARD increase. However, I may expect HANDMAID_DISCARD increase, CARDS decrease by 76.4% and GUARD increase in the worst situation.


In [35]:
df = pd.read_csv("MCTS_test_1.csv", sep='\t')
explanation_by_paths(df)

There are 2 actions available for this state.
The explaination is based on the result of executing 6 action(s) by the agent.
After executing Priest - see the cards of player 1, I expect HANDMAID_DISCARD decrease by around 100.0%, GUARD_DISCARD decrease by around 66.7%, PRINCESS decrease by around 100.0%, PRIEST decrease by around 100.0%, ROUND increase by around 200.0%, CARDS decrease by around 36.7%, DRAW_DECK increase by around 500.0% and BARON increase in the future state. In the best situation, I also expect SCORE_ADV increase by 300.0%, SCORE increase, PRIEST_DISCARD decrease by 100.0%, HANDMAID increase, ORDINAL decrease by 50.0%, PRINCE_DISCARD decrease by 100.0% and KING_DISCARD decrease by 100.0%. However, I may expect SCORE_ADV decrease by 100.0% and PRINCESS_DISCARD increase in the worst situation.


In [36]:
df = pd.read_csv("Connect4/record_0.csv", sep='\t')
explanation_by_paths(df)

There are 8 actions available for this state.
The value of every action is the same.
The explaination is based on the result of executing 2 action(s) by the agent.
After executing SetGridValueAction{gridBoard=5, x=0, y=7, value=o}, I expected ROUND increase, Two_Token increase and Opponent_Two_Token increase in the future state.


In [37]:
df = pd.read_csv("Connect4/record_3.csv", sep='\t')
explanation_by_paths(df)

There are 8 actions available for this state.
The explaination is based on the result of executing 3 action(s) by the agent.
After executing SetGridValueAction{gridBoard=5, x=4, y=7, value=o}, I expect ROUND increase by around 83.3%, Opponent_Three_Token increase, Opponent_Two_Token increase by around 50.0% and Opponent_One_Token increase in the future state. In the best situation, I also expect SCORE_ADV increase, SCORE increase, One_Token decrease by 100.0%, HAS_WON increase, FINAL_ORD increase and Four_Token increase. However, I may expect One_Token increase by 100.0% and Two_Token increase by 100.0% in the worst situation.


In [38]:
df = pd.read_csv("Connect4/record_4.csv", sep='\t')
explanation_by_paths(df)

There are 8 actions available for this state.
The explaination is based on the result of executing 2 action(s) by the agent.
After executing SetGridValueAction{gridBoard=5, x=6, y=7, value=o}, I expect ROUND increase by around 37.5% and Two_Token decrease by around 50.0% in the future state. In the best situation, I also expect SCORE_ADV increase, SCORE increase, Opponent_Two_Token decrease by 20.0%, Opponent_Three_Token increase, HAS_WON increase, FINAL_ORD increase and Four_Token increase. However, I may expect Opponent_One_Token increase, One_Token increase and Three_Token increase in the worst situation.
