# Introduction
This workbook is intended to provide an example of how to process and display the logs from a single session of the Empowerment experiment.  It's not exhausive documentation(!).

Original author: Chris Bennett (christopher.bennett@bristol.ac.uk)

Last update: 14/06/2022

## Files

The simulation logs are stored in a direction named with the session id which generated them.  The format of the session id is "yyyymmddThhmmss" e.g. "20220604T151812".  This is the date and time (in ISO 8601 format) at which the data was collected.

The following files should be present in the directory:

 - **user_details**: stores the participant's responses to questions about themselves e.g. whether english is their first language, date of birth etc.
 - **post_test_responses**: stores the participant's responses to the questions asked after each trial e.g. "the sliders".
 - **config_fam_X_simlog**: the simulation data (sim_data) from the familiration trials.  
     - X is numbered 1-4
 - **config_exp_X_Y_Z**: the simuulation data (sim_data) from the experimental trials.  
     - X is numbered 1-12
     - Y is either 
         - absent  (the simulation did not show the state of the dogs empowerement to the participant during the trial)
         - "empshown" (empowerment information was shown)
     - Z is either
         - absent (empowerment was calculated using the "vanilla" method where all states in which the dog changes the state of the flock are counted)
         - "taskweighted" (empowerment was calculated using a method which ones uses states in which the dog effected a change in the flock which moved the flock closer to the goal)

# Setting up the environment

You will need to edit the paths to match where the simulation logs and empowered_herding program are located

Set the session_id to the unique identifier of the experiment you want to analyse

In [17]:
import os

import numpy as np
import pickle
import glob
import pandas as pd
user_home_dir = os.path.expanduser('~')

#change the directory so where the empowered herding model is
os.chdir(os.path.join(user_home_dir, 'C:\\Users\\zi18494\\OneDrive - University of Bristol\\00_Simulation\\Github\\empowered_herding'))

#change this to the name of the session to be analysed
session_id = "20220614T164059"

#change this to match where the logs are stored
#base_path = os.path.join(user_home_dir, "OneDrive - University of Bristol\\00_Simulation\\Github\\empowered_herding\\logs")
base_path = os.path.join(user_home_dir, "OneDrive - University of Bristol\\Empowerment Results")

path = os.path.join(base_path, session_id)

#change this to match the name of a single log file of the form config_XXXXXX_simlog.pkl 
log_file_name = "config_exp_1_taskweighted_simlog.pkl"

import model.SimLog as log

# Run the post-processing

Create some helper functions

In [12]:
def create_table_of_empowerment(dog_logs, times : list): 
    import numpy as np
    dog_ids = list(dog_logs.keys())
    col_names = dog_ids
    row_names =  times #list(sim_data['world_at_t'].keys())
    #create a big table with 3 dimensions (time, agents, xy-position)
    # and initialise it with nan
    empowerment_ts = np.empty((len(row_names), len(col_names)))
    empowerment_ts[:] = np.nan

    #copy the data from the dictionary world_at_t into the big table
    for dog_id in dog_logs.keys():
        dog_state = dog_logs[dog_id].state
        #copy the empowerment
        col_idx = np.where(np.asarray(col_names) == dog_id)
        row_idx = dog_state['time']
        empowerment_ts[row_idx, col_idx] = dog_state['empowerment']
    return empowerment_ts

Load and display the information about the participant

In [13]:
#Load the user responses
with open(os.path.join(path,"user_details.pkl"), "rb") as input_file:
    user_details = pickle.load(input_file)
#and convert to a pandas dataframe
user_details_simple = user_details[1]
user_details_simple['english'] = user_details_simple['english'][0][0]
user_details_simple['colour'] = user_details_simple['colour'][0][0]
user_details_simple['vision'] = user_details_simple['vision'][0][0]
user_details_df =  pd.DataFrame(user_details_simple, index=[0])

In [14]:
print("These are the users details")
print("---------------------------")
print(f"Session ID was {user_details[0]}")
print(f"User details where {user_details[1]}")
print("\n")
print("and as a dataframe...")
user_details_df

These are the users details
---------------------------
Session ID was 20220614T164059
User details where {'english': 'Yes', 'colour': 'No', 'birth': 'bn', 'sex': 'hgj', 'vision': 'No'}


and as a dataframe...


Unnamed: 0,english,colour,birth,sex,vision
0,Yes,No,bn,hgj,No


Now run the main processing script which will calculate the metrics and collate the participants responses to each trial into a single pandas data frame with one row per trial

In [15]:
# load the test responses by the user
with open(os.path.join(path,"post_test_responses.pkl"), "rb") as input_file:
    test_responses = pickle.load(input_file)
# extract the parameterised configuration names
# IMPORTANT: these names describe each trial that the participant undertook 
#            and are recorded independently of the simulation logs.
config_names = test_responses[1]

#convert all the user responses into a pandas dataframe table
user_responses_df = pd.DataFrame.from_dict(test_responses[2], orient='index')
user_responses_df.attrs = {'session_id': test_responses[0], 'test_order' :  config_names}

#find the filenames of all the log files recorded
sim_file_names = glob.glob(os.path.join(path,'*_simlog.pkl'))

#create some arrays to hold the results
n_time_steps = []
n_dogs_per_tick = []
n_user_interactions = []
trial_duration = []
distance_2_goal_integral = []
trial_start_time = []
trial_end_time = []

#loop for each parameterised configuration names (these should match the log files in the directory)
for i_cfg, cfg_name in enumerate(config_names):
    #create a list of all the simulation log files whose file name matchs the paramaterised configuration names
    idx = [i for i, s in enumerate(sim_file_names) if (cfg_name + '_simlog') in s]
    
    #should only return one match, if more than one then skip the file
    if len(idx)>1:
        print(f'ERROR: found more than one match for {cfg_name} in directory {path}')
        n_time_steps.append(-1)
        trial_duration.append(-1)
        n_user_interactions.append(-1)
    #if a file can't be found which matches the tested config name then panic and skip the file
    elif not idx:
        print(f'ERROR: cant find a match for {cfg_name} in directory {path}')
        n_time_steps.append(-1)
        trial_duration.append(-1)
        n_user_interactions.append(-1)
    else:
        #if we've made it to here then idx will be a list of one element, change it into an integer to use as an index
        idx = idx[0]
        with open(sim_file_names[idx], "rb") as input_file:
             sim_data = pickle.load(input_file)

        #calculate the number of dogs present on each time step
        n_ts = len(sim_data['world_at_t'].keys())
        n_time_steps.append(n_ts)
        n_dogs_per_tick = np.zeros(n_ts)
        for t in sim_data['world_at_t'].keys():
            n_dogs_per_tick[t] = np.sum(sim_data['world_at_t'][t]['ids']<100)
            
        #extract the empowerment of each dog as a time series and arrange in a table
        # columns are the dogs, rows are the time steps, cell values are empowerment
        # nan means the dog wasn't present at the corresponding time step
        dogs_empowerment_ts = create_table_of_empowerment(sim_data['dog_logs'], list(sim_data['world_at_t'].keys()) )
        dog_empowerment_mean = np.nanmean(dogs_empowerment_ts, axis = 1)
                                                                       
        #calculate the time integral of the distance of the flock CoM from the centre of the square
        # The goal position is the same for all configs.  If it wasn't then it could be read by loading the file maching cfg_name
        # in experiment_config_files
        goal_position = [48,48]
        dt = 1
        integral = 0
        for t in sim_data['world_at_t'].keys():
            #agent_positions = sim_data['world_at_t'][t]['positions']
            idx_sheep = sim_data['world_at_t'][t]['ids']>=100
            sheep_positions = sim_data['world_at_t'][t]['positions'][idx_sheep]
            com = np.mean(sheep_positions, axis=0)
            distance_2_goal = np.linalg.norm(np.array(goal_position) - com)
            integral = integral + distance_2_goal * dt
        
        distance_2_goal_integral.append(integral)

        #calculate the duration of the trial
        # this calculation is of trial duration is always the same as n_time_steps????
        #    trial_duration.append(np.max(list(sim_data['world_at_t'].keys())))
        # so instead, base duration on the real world time difference        
        trial_duration.append(sim_data['meta_data']['end_time'] - sim_data['meta_data']['start_time'])
        
        #calculate number of user interactions (mouse clicks)
        n_user_interactions.append(len(sim_data['user_log'].events_at_t.keys()))   
        
        trial_start_time.append(sim_data['meta_data']['start_time'])
        trial_end_time.append(sim_data['meta_data']['end_time'])

#this isn't the most efficient but possibly the clearest approach
# create a dictionary of the stats and then turn it into a dataframe
metrics = {'n_time_steps' : n_time_steps, 'n_user_interactions' : n_user_interactions, 'trial_duration' : trial_duration, 
           'd2goal_intergral' : distance_2_goal_integral, 'start_time' : trial_start_time, 'end_time' : trial_end_time}       
metrics_df =  pd.DataFrame.from_dict(metrics)
metrics_df.index = config_names

#finally combine the user responses to the slider questions with the metrics from the simulation in a single dataframe
test_results_df = pd.concat([user_responses_df, metrics_df], axis=1)

View the results...

In [16]:
print("These are the results")
print("---------------------------")
print(f"Session ID was {user_details[0]}")
print(f"User details where {user_details[1]}")
test_results_df

These are the results
---------------------------
Session ID was 20220614T164059
User details where {'english': 'Yes', 'colour': 'No', 'birth': 'bn', 'sex': 'hgj', 'vision': 'No'}


Unnamed: 0,time,engaged,part_of_team,n_time_steps,n_user_interactions,trial_duration,d2goal_intergral,start_time,end_time
config_exp_2_taskweighted,1.0,0,0,23,3,0 days 00:00:01.580386,873.423135,2022-06-14 16:41:09.994612,2022-06-14 16:41:11.574998
config_exp_1_taskweighted,2.0,1,1,28,4,0 days 00:00:01.840083,1062.267877,2022-06-14 16:41:26.677433,2022-06-14 16:41:28.517516
config_exp_1_empshown_taskweighted,3.0,2,2,31,4,0 days 00:00:02.075297,1173.850819,2022-06-14 16:41:39.504239,2022-06-14 16:41:41.579536
config_exp_2_empshown_taskweighted,4.0,3,3,30,3,0 days 00:00:02.038569,1136.521848,2022-06-14 16:41:50.205251,2022-06-14 16:41:52.243820
