# Data Science in Psychology & Neuroscience (DSPN): 

## Lecture 12. Data Wrangling (part 2)

### Date: October 4, 2022

### To-Dos From Last Class:

* Download "imitation inhibition" task data from <a href="https://github.com/hogeveen-lab/DSPN_Fall2022_Git/tree/master/misc_exercises/imitation_inhibition_paradigm">Github</a>
* Download <a href="https://github.com/hogeveen-lab/DSPN_Fall2022_Git/tree/master/assignment_starters/assign3_starter">Assignment #3 starter kit</a>

### Today:

* Debriefing the leaky integrate-and-fire neuron assignment
* Wrangle some real data

### Homework
* Download <a href="https://github.com/hogeveen-lab/DSPN_Fall2022_Git/tree/master/assignment_starters/assign3_starter">Assignment #3 starter kit</a>


# Changing gears: Picking up where we left off on Data Wrangling...

<img src="img/imit_inhib_fileorg.png" width=500>

## Breaking into 8 code chunks
## 1. Import packages

In [1]:
### Part 1 --> Importing data wrangling packages I often use
import os
from glob import glob # only need the glob subpackage from glob
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None

## 2. Setting paths to the first level data

In [2]:
### Part 2 --> setting paths to the first level data

# get current working directory
base_dir = os.getcwd()
# option: base_dir = 'PATH/TO/YOUR/BASEDIR'
# Go above current working directory and
first_dir = os.path.join(base_dir,'data/first')
# option: first_dir = base_dir + '/PATHTO/data/first'
P_file_pattern = 'P*.txt'
second_dir = os.path.join(base_dir,'data/second')
# option: second_dir = base_dir + '/PATHTO/data/second'
questionnaire_file = os.path.join(second_dir,'ait_questionnaires.csv')
# option: questionnaire_file = base_dir + '/' + second_dir + '/ait_questionnaires.csv'

# Using glob to find all participant data files
all_files = glob(os.path.join(first_dir,P_file_pattern))
# print(all_files)

## 3. Load a test subject to make sense of things

In [5]:
# Reading in the data
sample_df = pd.read_csv(all_files[0], skiprows=5, sep='\t')
print('How many rows in initial loaded data frame:',len(sample_df)) # What things might cause this to not == 100?


# Filtering the data down to just the experimental block rows
sample_df = sample_df[sample_df['Name.1']=="AI_Block"]

# Filtering the dataframe down to RELEASES
sample_df_releases = sample_df[sample_df['Released']=='Released']

# How many key release responses do we have?
print('How many rows in key release filtered data frame:',len(sample_df_releases))

# Identifying double responses
sample_df_releases['shift'] = sample_df_releases['Name.2'].shift(1)
# display(sample_df_releases[['Name.2','shift']])

# filter down to only the first response
sample_df_releases['double_response'] = np.where(sample_df_releases['shift']==sample_df_releases['Name.2'],1,0)
double_resp_df = sample_df_releases[sample_df_releases['double_response']==1]
# display(double_resp_df[['Name.2','shift','double_response']])

# Filtering our double response trials
sample_df_releases_nodouble = sample_df_releases[sample_df_releases['double_response']==0] 
print('How many rows in no-double-response filtered data frame:',len(sample_df_releases_nodouble)) # Seeing if we have the right # of rows now

How many rows in initial loaded data frame: 521
How many rows in key release filtered data frame: 101
How many rows in no-double-response filtered data frame: 100


In [14]:
# Uncomment / comment lines 1, 2, and 3 to learn a bit more about what a column is within a pandas data frame.

# display(sample_df_releases) # 1. data frame
# display(sample_df_releases['Finger']) # 2. Series w/in the data frame
# display(sample_df_releases['Finger'].values) # 3. pull the values from a DataFrame.series, returns an array!

# Note: within each data frame column exists a series, 
# which is just a special / pandas-y way of storing arrays. Numpy likes to keep things in array form like the below...
# np.array([1, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1,
# #        2, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1,
# #        1, 1, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2,
# #        2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1,
# #        2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 1, 2, 2])


## 4. Iterate through to load the first level data
###    * Concatenate all together to create one data frame to rule them all

## 5. Merge with questionnaire data

## 6. Write to trial-level allsubjects csv

#### Pick up next class..
7. Compute summary measures
8. Save to summary allsubjects csv