The **first version** of this Code has been written by:
<br> **Valentin J. Oettinger** - Mail: Oettinger@stud.uni-heidelberg.de 
<br> Feel free to contact me if questions arise.
<br> Further Editors please enter your names and contact below:
<br> **Further Editors:**
<br> **Julius Rominger** - Mail: J.rominger@stud.uni-heidelberg.de

The code is currently fully functional using a Test Struct made my Susi Malheiros, which contains data of only two Subjects. The file is provided with the distribution message of this code.
<br>In case of the data created with this script is used in a publication, make sure to credit appropirate packages.

In [1]:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import h5py
import xlsxwriter as xls
import scipy.io as scipy

from scipy.stats import chisquare
from itertools import islice

import time
date = time.strftime("%d%m%Y")


%matplotlib inline

# <u> Data Import 
### Change to desired directory.

Here nothing else than a simple data import is happening. 
<br>The h5py package is used as it represents the type of the file that corresponds to .mat files. Note that h5py uses the python Dictionary Syntax.
<br>Please also consult the respective documentation: https://docs.h5py.org/en/stable/
<br>
<br>**Future To-Do:**
- Introduce some sort of context manager to save some memory space while running the script.
- Import the Struct containing data of all Subjects
- Change input directory to your personal dir.

In [2]:
#Data import of StrutTime w/o 'Data' directory
Struct_noData = h5py.File('struct_withEventLog_withoutData.mat', 'r')
Struct_youngData = h5py.File('Struct_YoungSubjects.mat', 'r')

In [3]:
#The commands below let you see a very generic overview of the fields of each level of the Struct. They simply represent an insight.
#BEWARE: It opens a huge window with literally every possible directory of the struct being displayed.
#Remove # infront of desired command. To run.

#names of fields
#StructTime.visit(print)
#StructTime2Sub.visit(print)

#names of fields + data type
#StructTime.visititems(lambda name,obj:print(name, obj)) #names, data type and shape of field
#StructTime2Sub.visititems(lambda name,obj:print(name, obj)) 

# <u>Data Overview and Tools:

In this section a few functions are defined that extract entries of the Struct that actually contain data of interest. 
<br>I.e. there are many directories in the Struct that do not contain anything, so they are of no use to us. If unsure what I mean by that, simply call one of the above .visit functions.
<br>These functions are later used in the functions that compute the variables.
<br>
<br>**Future To-Do:**
- Find a way how to correctly filter the Struct containing data of all Subjects. Using the below Syntax only allowed my to extract 9 of the 23 Subjects. 

**Hint:** I tried to find some property that would distinguish the "Subject" entries from the empty entries in the Struct, by looking at the type. However this does not seem to work for the "full" Struct.

In [22]:
'''OLD VERSION
#Looping over the input Struct and extracing only entries that contain subject data.
#Creating new dictionary filled with those data.

sub_data = {}

def get_subject_data(InputStruct):
    for k, v in InputStruct['#refs#/'].items(): #Hint: I tried using the below 30 - 35 length condition. But that also didn't help.
        if type(v) is h5py._hl.group.Group:
            sub_data[k] = v
'''

In [23]:
'''OLD VERSION
#Applying the get_subject_data fct. to the desired input struct:
#get_subject_data(Struct_noData)
get_subject_data(Struct_youngData)
'''

In [31]:
#Looping over the input Struct and extracing only entries that contain subject data.
#Creating new dictionary filled with those data.

sub_data = {}

def get_subject_data(InputStruct):
    for i, v in enumerate(InputStruct['Sub']):
        #sub_data.append(InputStruct[v[0]])
        sub_data[i] = InputStruct[v[0]]

In [32]:
#Applying the get_subject_data fct. to the desired input struct:
get_subject_data(Struct_youngData)

In [40]:
#Fct. that returns nested dictionary containing only data of desired games
#for each subject and dicitonary with empty entries for each subject. They will be filled later.
#Call with list of desired game names in str format as input.

game_data = {}

def get_game_data(game_labels):
    for subj, data in sub_data.items():
        #if len(sub_data[subj]) == 35: #The '35' condition arises from 33 games + 2 meta data. You might need to tweak this.
            game_data[subj] = {}
            for game, data in sub_data[subj].items():
                for lbl in game_labels:
                    if lbl == game:
                        game_data[subj][game]= data
        #else:
        #    continue

In [41]:
#Fct. that returns nested dictionary containing only meta data of subjects

subj_meta_data = {}
def get_subj_meta_data(subject_data):
    for subj, data in subject_data.items():
        #if len(subject_data[subj]) == 35: #The '35' condition arises from 33 games + 2 meta data. You might need to tweak this.
            subj_meta_data[subj] = {}
            for key, data in subject_data[subj].items():
                if key == 'sub_char':
                    subj_meta_data[subj]['sub_char'] = data
        #else:
        #    continue

# <u>Computing Variables:

In this section the actual variables of interest are computed. Some are computed for all games, some only for specific games. The below functions all use the same type of input. Namely a list that contains the names of the games (in str) that the variables need to be computed for. I.e. if you would like to know the game duration for all games, call the respective function with "all_game_labels" as input.
<br> The functions all follow the same structure. They call the filtering functions from above, then index the data under certain conditions for each subject by using nested loops. Then something is computed after which the result is stored in a temporary dictionary 'variable_data', which is essential in creating the output files at the end of the script.
<br>
<br>The variable functions itself work as of right now. However depending on the input struct they might need to be adjusted slightly.
<br>
<br>**Note:** These game-label lists need to be manually edited in case some are missing. In adittion the functions only add to the 'variable_data' dictionary. This way it can be subsequently filled with variables throughout the further script.
<br>**Note:** The functions account for NaN's by checking if there are any and if so, skipping them, resuming the calculation without them. The code does not indicate yet if NaN's were found.
<br>
<br>**Future To-Do:**
- Add more functions computing other variables. For this use the same structure to make sure they are added to the output files at the end.
- Use the below inputs or create new ones if needed.
- If the conditions indicated in a certain function applies to the directory set by the game label, then you can use that game label as input of that function. Meaning you could also compute the sway path of the SRT game or FTBT i.e.
- Create indication if NaN's were found in a certain data set. Maybe automatically labeling it accordingly.
- Use 'contains' to select snippets of the game name as the selection criterion. This way not every single game has to be entered manually, but rather "BT_", "FTBT1_", and so on.

In [42]:
#Defining input list for games to analyse.

all_game_labels = [
    'BT_hi_001', 'BT_lo_001', 'BT_mid_001',
    'FTBT1_hi_001', 'FTBT1_lo_001','FTBT1_mid_001',
    'FTBT2_hi_001', 'FTBT2_lo_001','FTBT2_mid_001',
    'FTBT3_hi_001', 'FTBT3_lo_001','FTBT3_mid_001',
    'FTBT4_hi_001', 'FTBT4_lo_001','FTBT4_mid_001',
    'SRT_hi_001', 'SRT_lo_001','SRT_mid_001',
    'CRT_hi_001', 'CRT_lo_001','CRT_mid_001',
    'TMT_A_hi_001', 'TMT_A_lo_001','TMT_A_mid_001',
    'TMT_B_hi_001', 'TMT_B_lo_001', 'TMT_B_mid_001',
    'SCFT_hi_001', 'SCFT_lo_001','SCFT_mid_001',
    'CCFT_hi_001', 'CCFT_lo_001','CCFT_mid_001'
                    ]

BT_game_labels = ['BT_hi_001', 'BT_lo_001', 'BT_mid_001']

FTBT_game_labels = ['FTBT1_hi_001', 'FTBT1_lo_001','FTBT1_mid_001',
    'FTBT2_hi_001', 'FTBT2_lo_001','FTBT2_mid_001',
    'FTBT3_hi_001', 'FTBT3_lo_001','FTBT3_mid_001',
    'FTBT4_hi_001', 'FTBT4_lo_001','FTBT4_mid_001']

SRT_game_labels = ['SRT_hi_001', 'SRT_lo_001','SRT_mid_001']

CRT_game_labels = ['CRT_hi_001', 'CRT_lo_001','CRT_mid_001']

TMT_game_labels = ['TMT_A_hi_001', 'TMT_A_lo_001','TMT_A_mid_001',
                   'TMT_B_hi_001', 'TMT_B_lo_001', 'TMT_B_mid_001']

SCFT_game_labels = ['SCFT_hi_001', 'SCFT_lo_001','SCFT_mid_001']

CCFT_game_labels = ['CCFT_hi_001', 'CCFT_lo_001','CCFT_mid_001']

In [43]:
#Initializing Dictionary that will hold all computed variable data.
get_game_data(all_game_labels)

variable_data = {}
for subj, data in game_data.items():
    variable_data[subj] = {}
    for k, v in data.items():
        variable_data[subj][k] = {}

In [48]:
get_subj_meta_data(sub_data)
#print(subj_meta_data)
#print(subj_meta_data[1]["sub_char"]["height"][0])

In [69]:
#Gamedata example
#print(game_data[13]["BT_hi_001"]["Game"]["cursorPos"][0])

## <u>General: 

### Game Duration:

In [70]:
#Defining a fct. that extracts the total game duration of any inputgame for each subject.
#Based on Susis Calculations

def compute_game_duration(game_labels): #unit: [ms]
    get_game_data(game_labels)
    for subj, game in game_data.items():
        for game, data in game_data[subj].items():
            duration = data['con_char']['game_duration_in_ms'][0][0]
            variable_data[subj][game]['tot_game_duration_ms'] = duration

### Sway Path:

In [78]:
#Defining a fct. that computes the total sway path of any inputgame, for each subject.
#Based on cursor x/y - position values.

def compute_sway_path(game_labels): #unit: [arbitrary length unit]
    get_game_data(game_labels)
    for subj, game in game_data.items():
        for game, data in game_data[subj].items():
            path_temp = []
            x = data['Game']['cursorPos'][0]
            y = data['Game']['cursorPos'][1]
            for i in np.arange(0, len(x)): #Checking for NaN's if there is one it is skipped.
                if np.isnan(x[i]) == True:
                    pass
                elif np.isnan(y[i]) == True:
                    pass
                else:
                    path_temp.append(np.sqrt(x[i]**2 + y[i]**2))
            tot_path = np.sum(path_temp) #total sway path of game
            mean_path = np.sum(path_temp)/len(path_temp) #mean sway path of single iteration
            stddev = np.nanstd(path_temp) #std.dev. from mean_path
            variable_data[subj][game]['tot_sway_path_game'] = tot_path
            variable_data[subj][game]['mean_sway_path_game'] = mean_path
            variable_data[subj][game]['stddev_sway_path_game'] = stddev

In [79]:
#Computing the above for ALL games by simply calling the respective functions.
#The input list needs to contain all games for which the above variables are desired.

compute_game_duration(all_game_labels)
compute_sway_path(all_game_labels)        #There is an error (for the first Participant, game SCFT_lo) -> There is no data for the cursor Position (see in the matlab file). Maybe use try-catch for such cases

0 BT_hi_001
[ 0.25493221  0.2288385   0.2216406   0.23248711  0.24304079  0.23781882
  0.2189091   0.20063432  0.19353694  0.19586431  0.19900203  0.19832354
  0.19698627  0.19910135  0.20161737  0.19632483  0.18046058  0.1625054
  0.15475751  0.15995997  0.16677118  0.15993895  0.13552979  0.10594073
  0.08959529  0.09524806  0.11531662  0.13320773  0.1364315   0.12332509
  0.0997759   0.07332831  0.05246     0.04842117  0.07062091  0.11560654
  0.16320049  0.18947904  0.18605792  0.16544355  0.14734252  0.14157532
  0.14439047  0.14771079  0.14758489  0.14356226  0.13495379  0.12241403
  0.11185835  0.11164599  0.12356551  0.13877853  0.14492554  0.13669559
  0.11860107  0.09922011  0.08533631  0.08102529  0.08827969  0.1051189
  0.12369665  0.13341155  0.12819224  0.11127352  0.09296841  0.08381856
  0.0883828   0.10258724  0.11559539  0.11653268  0.10286899  0.08305148
  0.06950484  0.06823803  0.07520174  0.08163259  0.08145976  0.0742682
  0.06370767  0.05415651  0.04739935  0.04

ValueError: Field names only allowed for compound types

## <u>Balance Task - BT

### Sway Area:

In [13]:
#Defining a fct. that computes the sway area of the total game. 
#Based on 95% - confidence ellipse around the population mean (cursor position mean)
#For a detailed documentation see Schubert et al., 2014

#Eigenvalues
def compute_sway_area(game_labels): #unit: [arbitrary area unit]
    get_game_data(game_labels)
    for subj, game in game_data.items():
        for game, data in game_data[subj].items():
            x_temp = []
            y_temp = []
            path_temp = []
            x = data['Game']['cursorPos'][0]
            y = data['Game']['cursorPos'][1]
            for i in np.arange(0, len(x)): #Checking for NaN's if there is one it is skipped.
                if np.isnan(x[i]) == True:
                    pass
                elif np.isnan(y[i]) == True:
                    pass
                else:
                    x_temp.append(x[i])
                    y_temp.append(y[i])
                    path_temp.append(np.sqrt(x[i]**2 + y[i]**2))
            cov_mat = np.cov(x_temp, y_temp) #Computing the covariance matrix of the cursor positions
            x_var = cov_mat[0,0]
            y_var = cov_mat[1,1]
            xy_cov = cov_mat[0,1]
            lam1 = 0.5 * (x_var + y_var + np.sqrt((x_var-y_var)**2 + 4*xy_cov**2)) #Eigenvalues of cursor
            lam2 = 0.5 * (x_var + y_var - np.sqrt((x_var-y_var)**2 + 4*xy_cov**2)) #pos. cov. matrix.
            x_chi2 = chisquare(np.abs(x_temp))
            y_chi2 = chisquare(np.abs(y_temp))
            CEA = np.pi * 1/len(path_temp) * (x_chi2[0]+y_chi2[0])/2 * np.sqrt(lam1*lam2) #95% Confidence Ellipse Area
            PEA = np.pi * (x_chi2[0]+y_chi2[0])/2 * np.sqrt(lam1*lam2) #95% Prediction Ellipse Area
            #CEA = np.pi * 1/len(x_temp) * 2.4478**2 * np.sqrt(lam1*lam2)
            
            variable_data[subj][game]['sway_area_game_cea95'] = CEA
            variable_data[subj][game]['sway_area_game_pea95'] = PEA

In [14]:
#Calling the above fct. for BT games only
compute_sway_area(BT_game_labels)
#variable_data

### Sway Velocity:

In [15]:
#Defining a function that computes the mean sway velocity of a BT game.
#It uses the mean sway path computed by "compute_sway_path"
#as well as the duration of the respective game, provided by "compute_game_duration".

def compute_sway_velocity(game_labels): #unit: [length/sec]
    get_game_data(game_labels)
    compute_sway_path(game_labels)
    compute_game_duration(game_labels)
    for subj, game in game_data.items():
        for game, data in game_data[subj].items():
            duration_s = variable_data[subj][game]['tot_game_duration_ms'] / 1000 #game duration in sec.
            sway_vel = variable_data[subj][game]['tot_sway_path_game'] / duration_s
            variable_data[subj][game]['mean_sway_velocity_game'] = sway_vel

In [16]:
#Calling the above fct. for BT games only. Check new entries in 'variable_data' by removing the #
compute_sway_velocity(BT_game_labels)
#variable_data

## <u>Follow-The-Ball Task - FTBT

### DistanceToBall:

In [17]:
#Defining a function which computes different properties of the distance between the cursor and the ball.

def compute_dist_to_ball(game_labels):
    get_game_data(game_labels)
    for subj in game_data.keys():
        for game, data in game_data[subj].items():
            single_dist = data['Game']['distancesToGoal'][0]
            temp_dist = []
            for i in np.arange(0, len(single_dist)): #Checking for NaN's if there is one it is skipped.
                if np.isnan(single_dist[i]) == True:
                    pass
                else:
                    temp_dist.append(single_dist[i])
            tot_dist = np.sum(temp_dist) #total distance for entire game
            mean_dist = np.sum(temp_dist)/len(temp_dist) #corresponding mean distance
            min_dist = np.min(temp_dist) #minimum distance over entire game
            max_dist = np.max(temp_dist) #maximum distance over entire game
            stddev = np.nanstd(temp_dist) #std.dev. from mean ball dist.
            variable_data[subj][game]['total_distance_to_ball_game'] = tot_dist
            variable_data[subj][game]['mean_distance_to_ball_game'] = mean_dist
            variable_data[subj][game]['min_distance_to_ball_game'] = min_dist
            variable_data[subj][game]['max_distance_to_ball_game'] = max_dist
            variable_data[subj][game]['stddev_distance_to_ball_game'] = stddev

In [18]:
#Calling the above fct.
compute_dist_to_ball(FTBT_game_labels)
#variable_data

## <u>Simple Reaction Time Task - SRT

## <u>Choice Reaction Time Task - CRT

## <u>Trail Making Task A/B - TMT A/B

## <u>Simple Cognitive Flexibility Task - SCFT

## <u>Complex Cognitive Flexibility Task - CCFT

# <u>Output Files

In this last section two outputs are created, with the current date appended to the file name.
<br> One as a Matlab Struct and one as a Excel Sheet. Both make use of the beforehand created 'variable_data' dictionary. **Do not change** this unless needed. It is set up in a way that only the 'variable_data' dict. needs to contain all variables of interest for all subjects and follows the structure it does at the moment. The rest should work out from here.
<br> To be honest a lot of the following was created using trial-and-error principle. So in case you do change it, make sure to save the below code as 'Raw'-type before applying changes. Also consult the documentation of the xlsxwriter package: https://xlsxwriter.readthedocs.io

## Matlab Struct

In [19]:
#Extracting and sorting meta data of each subject.
get_subj_meta_data(sub_data)
SubjectID = []
for subj in subj_meta_data.keys():
    temp=[]
    for i in np.arange(0,3):
        temp.append(chr(subj_meta_data[subj]['sub_char']['subjectID'][i][0]))
    SubjectID.append(''.join(temp))
Age = [subj_meta_data[subj]['sub_char']['age'][0] for subj in subj_meta_data.keys()]
Gender = [chr(subj_meta_data[subj]['sub_char']['gender'][0][0]) for subj in subj_meta_data.keys()]
Weight = [subj_meta_data[subj]['sub_char']['weight'][0] for subj in subj_meta_data.keys()]
Height = [subj_meta_data[subj]['sub_char']['height'][0] for subj in subj_meta_data.keys()]

In [20]:
#Adding Meta Data of each subject to the respective variable data.

for i in np.arange(0,len(subj_meta_data.keys())):
    for ind, (subj,data) in enumerate(variable_data.items()):
        if i == ind:
            variable_data[subj]['SubjectID'] = SubjectID[i]
            variable_data[subj]['Age'] = Age[i]
            variable_data[subj]['Gender'] = Gender[i]
            variable_data[subj]['Weight'] = Weight[i]
            variable_data[subj]['Height'] = Height[i]
        else:
            continue

In [None]:
#Saving the data as a Matlab Struct

StructVars = np.ndarray(shape=(1,len(sub_data.keys())), dtype=object)

temp = []
for i in np.arange(0, len(subj_meta_data.keys())):
    for subj, (var,data) in enumerate(variable_data.items()):
        if i == subj:
            temp.append(data)
        else:
            continue
StructVars = np.array(temp)

scipy.savemat('output_files/StructVars_prelim_' + date + '.mat', {'StructVars_prelim':StructVars})

## Excel Sheet

In [None]:
#Supplementary game label list:
all_game_labels = BT_game_labels + FTBT_game_labels + SRT_game_labels + CRT_game_labels + TMT_game_labels + SCFT_game_labels+ CCFT_game_labels

In [None]:
workbook = xls.Workbook('output_files/Exergames_vars_' + date + '.xlsx')
workbook.set_properties({
    'title': 'ExerGame Variables',
    'company': 'Universität Heidelberg - HCMR',
    'author': 'Valentin J. Oettinger',
    'manager': 'Dr. Lizeth Sloot, Dr. Christian Werner',
    'comments': 'Created with Python'})

cell_head_format = workbook.add_format({'bold':True})
cell_head_format.set_text_wrap()
cell_format = workbook.add_format()
cell_format.set_num_format('0.00')
cell_format.set_shrink()

 
for label in all_game_labels: #Adding a worksheet for each game                
    worksheet = workbook.add_worksheet(label) 
    worksheet.write(0, 0, 'Subject_ID', cell_head_format)
    row = 1
    col = 1
    for subj, entry in variable_data.items(): #Filling the fist column with Subject IDs
        worksheet.write(row, 0, entry['SubjectID'])
        row += 1
        col += 1
        
    for subj, key in variable_data.items(): #Filling the first row with respective variable names:
        row = 1
        col = 1
        for game, variables in key.items():
            if game == label:
                if isinstance(variables, dict):
                    for name, data in variables.items():
                        worksheet.write(0, col, name, cell_head_format)
                        col += 1
                else:
                    continue
            else:
                continue
        break
    
    row = 1
    for subj, key in variable_data.items(): #Filling respective cells with corresponding data:
        col = 1
        for game, variables in key.items():
            if game == label:
                if isinstance(variables, dict):
                    for name, data in variables.items():
                        worksheet.write(row, col, data, cell_format)
                        col += 1
                else:
                    continue
            else:
                continue
            row += 1

workbook.close()