# Location Learning Task Analysis

## Overview

type:
- trial_recall: látta-e korábban a képet --> correct_recall: jól válaszolt-e
- trial_position: hol látta --> correct_position: jól válaszolt-e

# 1. step: Prerequisites

## 1.1 Filename conversation steps

1. Download the latest dataset from HCCL drive (HCCL/A_Studies/TerKepEsz/LongTRK/Data/Wave_1/Raw_Online_Tests/CognitionRun
2. Unzip the file
3. Rename files: LLT_Online_(Participant_ID).csv

When rename the files please consider:
- csv files contains Participant_ID as participantID
- pay attention to 0 (zero) and O (letter) in the participantID (letter O is the correct, it means old)
- check participantID in the longTerKepEsz_Participants file (location: HCCL/A_Studies/TerKepEsz/LongTRK/Data)
- clear files which not contain participantID (participant did not finish the task)
- more than one file from the same participant should eliminate, use the first in time

Further steps: write script to do the steps above

## 1.2 Import libraries

In [11]:
import pandas as pd
import numpy as np
import os
import re

## 1.3 Set constans

Set the location directory of csv files.

In [12]:
dir="old-hcccl-location-learning-task-archive_2022_04_08/"

# 2. step: Data Capture

## 2.1 Reading csv files list from the directory

In [13]:
dir_list = list(filter(lambda name: ".csv" in name, os.listdir(dir)))

print(dir_list)
print(len(dir_list))

['LLT_Online_585O1174F_01.csv', 'LLT_Online_5541112F.csv', 'LLT_Online_585O1174F_02.csv', 'LLT_Online_597O1198F.csv', 'LLT_Online_625O1254F.csv', 'LLT_Online_489O982M.csv', 'LLT_Online_658O1320F.csv', 'LLT_Online_539O1082M..csv', 'LLT_Online_637O1278F.csv', 'LLT_Online_509O122F.csv', 'LLT_Online_01O6F.csv', 'LLT_Online_611O1226F_02.csv', 'LLT_Online_592O1188F.csv', 'LLT_Online_35O73M.csv', 'LLT_Online_611O1226F_01.csv', 'LLT_Online_528O1060F_01.csv', 'LLT_Online_614O1232F_02.csv', 'LLT_Online_528O1060F_02.csv', 'LLT_Online_535O1074F.csv', 'LLT_Online_614O1232F_01.csv', 'LLT_Online_606O1216F_01.csv', 'LLT_Online_638O1280F.csv', 'LLT_Online_521O1046M_01.csv', 'LLT_Online_526O1056M.csv', 'LLT_Online_606O1216F_02.csv', 'LLT_Online_6321268M.csv', 'LLT_Online_39-297.csv', 'LLT_Online_521O1046M_02.csv', 'LLT_Online_606O1216F_03.csv', 'LLT_Online_657O1318M.csv', 'LLT_Online_490O984M.csv', 'LLT_Online_566Y1136M_03.csv', 'LLT_Online_566Y1136M_02.csv', 'LLT_Online_667O1338F.csv', 'LLT_Online_5871

## 2.2 Data cleaning functions

Only trials are needed, practice steps should be cleared.

In [14]:
def dataframe_filter(df):
    regex = "List[0-9]*/"
    
    # Eliminating NaN value
    df["image"].fillna("0", inplace = True)
    
    return df[df["image"].str.match(regex)]

Depending on the parameter list, it counts the number of true and false values within a given column. 

In [15]:
def result(df,column,true_false):
    
    return df.loc[df[column] == true_false, column].shape[0]

It should be verified that the participant has completed the entire task

In [16]:
def file_valid(filename):
    path = dir + filename
    dataframe_raw = pd.read_csv(path)
    #print(dataframe.info())
    
    if "stimulus" in dataframe_raw.columns:
        return True
            
    else:
        print(filename + " – Participant did not finish the task")
        return False

If the relevant column is boolean instead of string, then map should be used

In [17]:
def file_clean(filename,column):
    path = dir + filename
    dataframe_raw = pd.read_csv(path)
    #print(dataframe.info())
        
    dataframe = dataframe_filter(dataframe_raw)
                    
    if result(dataframe,column,True) == 0:
        dataframe[column] = dataframe[column].map({'true': True, 'false': False})
            
    return dataframe


# 3. step: General Statistics

General statistics values.

In [20]:
def statistic(dir_list,column,true_false):
    count_elements = []
    statistic = []
    index = ["Number of Elements", "Mean", "Std", "Min", "Max"]
    error = 0
    
    for filename in dir_list:
        print(filename)
        if file_valid(filename):
        
            dataframe = file_clean(filename,column)
            count_elements.append(result(dataframe,column,true_false))
        else:
            error = error + 1
        
    statistic.append(len(count_elements))
    statistic.append(np.mean(count_elements))
    statistic.append(np.std(count_elements))
    statistic.append(np.min(count_elements))
    statistic.append(np.max(count_elements))
    
    print("Number of unvalid files: " + str(error))
    
    return pd.Series(statistic, index)
        

## 3.1 Participant who remember well that (s)he had seen the picture

In [21]:
statistic(dir_list,'correct_recall',True)

LLT_Online_585O1174F_01.csv


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == "__main__":


LLT_Online_5541112F.csv
LLT_Online_585O1174F_02.csv
LLT_Online_597O1198F.csv
LLT_Online_625O1254F.csv
LLT_Online_625O1254F.csv – Participant did not finish the task
LLT_Online_489O982M.csv
LLT_Online_658O1320F.csv
LLT_Online_539O1082M..csv
LLT_Online_637O1278F.csv
LLT_Online_509O122F.csv
LLT_Online_01O6F.csv
LLT_Online_611O1226F_02.csv
LLT_Online_611O1226F_02.csv – Participant did not finish the task
LLT_Online_592O1188F.csv
LLT_Online_35O73M.csv
LLT_Online_611O1226F_01.csv
LLT_Online_528O1060F_01.csv
LLT_Online_614O1232F_02.csv
LLT_Online_614O1232F_02.csv – Participant did not finish the task
LLT_Online_528O1060F_02.csv
LLT_Online_535O1074F.csv
LLT_Online_614O1232F_01.csv
LLT_Online_606O1216F_01.csv
LLT_Online_638O1280F.csv
LLT_Online_521O1046M_01.csv
LLT_Online_526O1056M.csv
LLT_Online_606O1216F_02.csv
LLT_Online_6321268M.csv
LLT_Online_39-297.csv
LLT_Online_521O1046M_02.csv
LLT_Online_521O1046M_02.csv – Participant did not finish the task
LLT_Online_606O1216F_03.csv
LLT_Online_657O1

Number of Elements    30.000000
Mean                  26.366667
Std                    5.281940
Min                    0.000000
Max                   30.000000
dtype: float64

## Old participants who remember well that (s)he had seen the picture

In [23]:
r = re.compile("LLT_Online_[0-9]*O[0-9]*.")
dir_list_old = list(filter(r.match, dir_list))

#print(dir_list_old)
print(statistic(dir_list_old,'correct_recall',True))

LLT_Online_585O1174F_01.csv
LLT_Online_585O1174F_02.csv
LLT_Online_597O1198F.csv
LLT_Online_625O1254F.csv
LLT_Online_625O1254F.csv – Participant did not finish the task
LLT_Online_489O982M.csv
LLT_Online_658O1320F.csv
LLT_Online_539O1082M..csv
LLT_Online_637O1278F.csv
LLT_Online_509O122F.csv
LLT_Online_01O6F.csv
LLT_Online_611O1226F_02.csv
LLT_Online_611O1226F_02.csv – Participant did not finish the task
LLT_Online_592O1188F.csv
LLT_Online_35O73M.csv
LLT_Online_611O1226F_01.csv
LLT_Online_528O1060F_01.csv
LLT_Online_614O1232F_02.csv
LLT_Online_614O1232F_02.csv – Participant did not finish the task
LLT_Online_528O1060F_02.csv
LLT_Online_535O1074F.csv
LLT_Online_614O1232F_01.csv
LLT_Online_606O1216F_01.csv
LLT_Online_638O1280F.csv


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == "__main__":


LLT_Online_521O1046M_01.csv
LLT_Online_526O1056M.csv
LLT_Online_606O1216F_02.csv
LLT_Online_521O1046M_02.csv
LLT_Online_521O1046M_02.csv – Participant did not finish the task
LLT_Online_606O1216F_03.csv
LLT_Online_657O1318M.csv
LLT_Online_490O984M.csv
LLT_Online_667O1338F.csv
Number of unvalid files: 4
Number of Elements    25.000000
Mean                  25.960000
Std                    5.688444
Min                    0.000000
Max                   30.000000
dtype: float64


## Young participants who remember well that (s)he had seen the picture

In [24]:
r = re.compile("LLT_Online_[0-9]*Y[0-9]*.")
dir_list_young = list(filter(r.match, dir_list))

#print(dir_list_young)
print(statistic(dir_list_young,'correct_recall',True))

LLT_Online_566Y1136M_03.csv
LLT_Online_566Y1136M_03.csv – Participant did not finish the task
LLT_Online_566Y1136M_02.csv
LLT_Online_566Y1136M_01.csv
LLT_Online_566Y1136M_01.csv – Participant did not finish the task
Number of unvalid files: 2
Number of Elements     1.0
Mean                  29.0
Std                    0.0
Min                   29.0
Max                   29.0
dtype: float64


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == "__main__":
