# Data Science in Psychology & Neuroscience (DSPN): 

## Lecture 10. Data Wrangling (part 3)

### Date: September 24, 2020

### To-Dos From Last Class:

* Download Assignment #3 start kit data
    * Just one difficulty-level... Sorry!
    
### Today:

* Wrangle imitation inhibition task data
    * Iterate through to load the first level data
    * Concatenate all together to create one data frame to rule them all
    * Merge with questionnaire data
    * Write to trial-level allsubjects csv
    * Compute summary measures
    * Save to summary allsubjects csv
* Introduce Assignment #3

### Homework

* Assignment #3
    * Now due on 10/1

## Automatic Imitation Experiment

<img src="img/ait_task.png" width="700">

* 20 trials per condition (100 total responses for each participant)
    * Average across cued response finger
* Condition mapping:
    1. Baseline 
    2. Effector congruent 
    3. Effector incongruent
    4. Movement congruent
    5. Movement incongruent

# Importing Packages

In [1]:
# Packages that are key for interacting with the OS and matching filename patterns
import os
from glob import glob # only need the glob subpackage from glob

# Packages that are key for data wrangling
import numpy as np
import pandas as pd

# to do some uber simple visualizations
import matplotlib.pyplot as plt
import seaborn as sns

# Setting up the filepaths

In [3]:
# get current working directory
script_dir = os.getcwd()
base_dir = os.path.dirname(os.getcwd())

# Go above current working directory and
first_dir = os.path.join(base_dir,'misc_exercises/imitation_inhibition_paradigm/data/first') #misc_exercises/ for git
P_file_pattern = 'P*.txt'
second_dir = os.path.join(base_dir,'misc_exercises/imitation_inhibition_paradigm/data/second')
questionnaire_file = os.path.join(second_dir,'ait_questionnaires.csv')

# Using glob to find all participant data files
all_files = glob(os.path.join(first_dir,P_file_pattern))

# Loading in a test subject to make sense of things

In [35]:
# Reading in the data
sample_df = pd.read_csv(all_files[0], error_bad_lines=False, skiprows=5, sep='\t') 
print('How many rows in initial loaded data frame:',len(sample_df)) # What things might cause this to not == 100?

# Filtering the data down to just the experimental block rows
sample_df = sample_df[sample_df['Name.1']=="AI_Block"]
sample_df.loc[:, 'Finger':'Repeated']
sample_df.loc[50:300, :]

# Filtering the df down to just the key release responses
sample_df_releases = sample_df[sample_df['Released']=='Released'] 

# How many key release responses do we have?
print('How many rows in key release filtered data frame:',len(sample_df_releases)) # What things might cause this to not == 100? For now, just worry baout double responses. For this task, error rates so low that miss response trials not really important.
# print(sample_df_releases)

# Identifying double responses
sample_df_releases['shift'] = sample_df_releases['Name.2'].shift(-1) # creating a new column ('shift') based on the next row of our trial name column. "SettingWithCopyWarning".
# print(sample_df_releases[['Name.2','shift']]) # checking that it worked, show them shift(-1) 
sample_df_releases['double_response'] = np.where(sample_df_releases['shift']==sample_df_releases['Name.2'], 1, 0) # using a numpy where conditional to identify double responses

# Filtering our double response trials
sample_df_releases_nodouble = sample_df_releases[sample_df_releases['double_response']==0] 
print('How many rows in no-double-response filtered data frame:',len(sample_df_releases_nodouble)) # Seeing if we have the right # of rows now

How many rows in initial loaded data frame: 521
How many rows in key release filtered data frame: 101
How many rows in no-double-response filtered data frame: 100


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


# Quick excursion into what the data frame contains...

In [33]:

# to be continued...

# Iterating through to get all first level data, concatenating into a single data frame

In [32]:
# Setting up a pid counter to count iterations, also a blank pandas data frame
pid_counter = 1
dfs_list = [] # creating a list of pandas objects

# to be continued...

# Loading in the questionnaire data and merging it with behavioral data

In [31]:
# renaming pid column in data frame -- to match the questionnaire pid column
allsubs_df = allsubs_df.rename(columns={"Name": "pid"})

# to be continued...

# Writing the observation-level data to a big csv

# Computing subject-level summary measures (mean RT by pid)

# Computing subject-level summary measures (mean RT by pid by condition)

# Writing the subject-level RT data to a CSV