# Analyzing microstimulation data  <br>
# Refer to the proper code for matlab reading it isnt mine

## Introduction
This notebook file can be used as a guideline for analyzing data concerning the MTL-dependent microstimulation detection task as described in Doron et al. (2020). It is meant to show students or newly interested the step-by-step process from raw data to beautiful graphs. Within this, script the Mouse_Data dataclass and functions from the helpers.py will be explained and used for data-analysis. The analysis will be performed on the data files that are contained within the sibling-folder: 'data'. 
<br><br>
### Importing functions
We will start by importing the necessary functions. In order to prevent import or modulenotfound errors it is adviced to mimic the file structure from GitHub. This means that this: 'analysis.ipynb' file should be in the same folder as 'Mouse_Data.py' & 'helpers.py'.

In [1]:
# First we will have to import the dataclass we're using to hold our data, as well as, the helper functions that we'll need.
# from Mouse_Data import Mouse_Data
from helpers import *

# We will also import some additional functions from commonly used packages for convenience and plotting.
import os
import matplotlib.pyplot as plt

# And suppress an unnessecary error?
pd.options.mode.chained_assignment = None  # default='warn'

### TODO: CHANGE STRUCTURE TO MLAB Importing your data
Next we'll import the data we will be analyzing. The example data can be found in the GitHub repository witin the 'data' folder. It is advisable to first do a test run with the example data and make sure you understand the process, before proceding with your own data.
The code requires the path to the data folder which in the case of the example data should look something like this: <code>C:Users/username/coding_projects/microstimulation/data</code><br><br>
The content of the datafolder is structured in the following way:<br>
- ID0
    - session0
    - session1
- ID1
    - session0<br>
    
    
Within this datafolder there are subfolders *ID0, ID1* that refer individual animals and their sessions *session0, session1*. Within the folder of each session there should be .txt files that contain the extracted SPIKE2 data. After you have provided the path and checked if it contains the correct data we will load in the data using the Mouse_Data dataclass. 


In [3]:
# Select the path to your data files, replace '\' with '/' if you copied your path. 
# Also make sure it ends with '/' so that it's recognized as a folder.
root = 'E:/mStim_data/PSI/' #TODO this is for now, change to GitHub example path|
datafiles = os.listdir(root)
print(f'The path you provided contains the following files: \n{datafiles}')
print('If you correctly configured your pathing structure this should show the animal ID numbers.')

The path you provided contains the following files: 
['SNA-123598', 'SNA-123599', 'SNA-123601', 'SNA-123602', 'SNA-123995', 'SNA-123996']
If you correctly configured your pathing structure this should show the animal ID numbers.


In [5]:
# TODO REMOCVE
''' Mouse_Data.py

    Contains the Mouse_Data class that is used for analysing the SPIKE2 data .txt files.
    @Mik Schutte
'''
import numpy as np
import pandas as pd
import os, re, datetime, scipy.io
from scipy.io import matlab

def load_mat(filename):
    ''' This function should be called instead of direct scipy.io.loadmat
        as it cures the problem of not properly recovering python dictionaries
        from mat files. It calls the function check keys to cure all entries
        which are still mat-objects
    '''

    def _check_vars(d):
        ''' Checks if entries in dictionary are mat-objects. If yes
            todict is called to change them to nested dictionaries
        '''
        for key in d:
            if isinstance(d[key], matlab.mio5_params.mat_struct):
                d[key] = _todict(d[key])
            elif isinstance(d[key], np.ndarray):
                d[key] = _toarray(d[key])
        return d
    
    def _todict(matobj):
        ''' A recursive function which constructs from matobjects nested dictionaries
        '''
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, matlab.mio5_params.mat_struct):
                d[strg] = _todict(elem)
            elif isinstance(elem, np.ndarray):
                d[strg] = _toarray(elem)
            else:
                d[strg] = elem
        return d

    def _toarray(ndarray):
        ''' A recursive function which constructs ndarray from cellarrays
            (which are loaded as numpy ndarrays), recursing into the elements
            if they contain matobjects.
        '''
        if ndarray.dtype != 'float64':
            elem_list = []
            for sub_elem in ndarray:
                if isinstance(sub_elem, matlab.mio5_params.mat_struct):
                    elem_list.append(_todict(sub_elem))
                elif isinstance(sub_elem, np.ndarray):
                    elem_list.append(_toarray(sub_elem))
                else:
                    elem_list.append(sub_elem)
            return np.array(elem_list, dtype='object')
        else:
            return ndarray

    data = scipy.io.loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_vars(data)

def format_data(checked_data):
    ''' Formats the checked data into a pandas DataFrame
    '''
    # Check for mat objects
    df = pd.DataFrame(columns=['trialType', 'trialStart', 'trialEnd', 'stim_t', 'response_t', 'success', 'licks'])
    df['trialType'] = checked_data['SessionData']['TrialTypes']
    df['trialStart'] = checked_data['SessionData']['TrialStartTimestamp']
    df['trialEnd'] = checked_data['SessionData']['TrialEndTimestamp']

    stim_t = [trial['States']['Stimulus'][0] for trial in checked_data['SessionData']['RawEvents']['Trial']]
    df['stim_t'] = checked_data['SessionData']['TrialStartTimestamp'] + stim_t

    response_t = [np.diff(trial['States']['WaitForLick'])[0] for trial in checked_data['SessionData']['RawEvents']['Trial']]
    df['response_t'] = response_t

    success = [np.isnan(trial['States']['Reward'][0]) for trial in checked_data['SessionData']['RawEvents']['Trial']]
    df['success'] = np.invert(success)

    # Licks
    for i in range(len(checked_data['SessionData']['RawEvents']['Trial'])):
        licks = np.array([])
        if 'Port1In' in checked_data['SessionData']['RawEvents']['Trial'][i]['Events'].keys():
            licks = np.append(licks, checked_data['SessionData']['RawEvents']['Trial'][i]['Events']['Port1In'])
        if 'Port1Out' in checked_data['SessionData']['RawEvents']['Trial'][i]['Events'].keys():
            licks = np.append(licks, checked_data['SessionData']['RawEvents']['Trial'][i]['Events']['Port1Out'])
        df['licks'].iloc[i] = sorted(licks) # Apperently this is setting with a copy, but I failed to remove this error. The outcome is correct
    return df

def concat_session(session_data, session):
    ''' docstring
    '''
    # So we can already detect duplicates and they are temporally organized so if a duplicate is read it will always be succesive.
    


    
    pass

class Mouse_Data:
    ''' Class designed for housing all data for an individual mouse
        
        INPUT:
            path_to_data(str): path to the mouse folder you want to extract the data from
            
        OUTPUT:
            Mouse_Data(Class): Dataclass with attributes like id, sessions, all_data and concatenated data
    '''    

    def __init__(self, path_to_data): 
        # From path_to_data get path and files in the raw-folder of that path
        self.path = path_to_data 
        self.files = os.listdir(self.path)
        self.id = self.files[0].split('/')[-1].split('_')[0]
        self.get_behaviour()
        self.sessions = [str(key) for key in self.session_data.keys()]
        self.compile_data()

    def get_behaviour(self):
        ''' Creates self.session_data a dictionary with keys being session_dates and values being a pd.Dataframe 
        '''
        self.session_data = {}
        for file in self.files:
            print(file)
            if 'meta' not in file:
                rawData = load_mat(self.path + file)
                session = rawData['__header__'].decode()
                session = re.split('Mon |Tue |Wed |Thu |Fri |Sat |Sun ', session)[-1] 
                session = str(datetime.datetime.strptime(session, '%b %d %X %Y')).split()[0] # It's possible to recover time by not slicing this string or [-1]
                
                # Check if a similar session is already in the dictionary
                if session in self.session_data.keys():
                    print(session)
                    print(f'WARNING: There is already data loaded for the session on {session}.\nPlease check validity.')
                    
                    # TODO Now lets get a concatenating function in here
                    # sesh0 = self.session_data[session]
                    # sesh1 = format_data(rawData)

                    # # Things to concat 'trialStart', 'trialEnd', 'stim_t', 'response_t'
                    # lastrow = sesh0.iloc[-1] 
                    # sesh1['trialStart'] += lastrow['trialStart']#

                self.session_data[session] = format_data(rawData)
    
    def compile_data(self):
        ''' Creates one big pd.DataFrame of all stimuli over all sessions'''
        df_full = pd.DataFrame()
        for session in self.sessions:
            df_full = pd.concat([df_full, self.session_data[session]])
        self.full_data = df_full   

# Load the datafiles into Python using Mouse_Data
# We need an aditional path string to get to the BPOD data
BPOD_path = '/microstim/Session Data/'
mouse_list = []
for ID in datafiles:
    path_to_data = root+ID+BPOD_path
    mouse = Mouse_Data(path_to_data)
    mouse_list.append(mouse)
print(mouse_list)

meta.txt
SNA-123598_microstim_20230803_114747.mat
SNA-123598_microstim_20230803_122241.mat
2023-08-03
Please check validity.
SNA-123598_microstim_20230804_102701.mat
SNA-123598_microstim_20230805_101842.mat
SNA-123598_microstim_20230806_153250.mat
SNA-123598_microstim_20230807_102754.mat
SNA-123598_microstim_20230808_104836.mat
meta.txt
SNA-123599_microstim_20230803_130542.mat
SNA-123599_microstim_20230804_130514.mat
SNA-123599_microstim_20230805_114941.mat
SNA-123599_microstim_20230805_115057.mat
2023-08-05
Please check validity.
SNA-123599_microstim_20230806_165302.mat
SNA-123599_microstim_20230807_121013.mat
meta.txt
SNA-123601_microstim_20230807_151515.mat
SNA-123601_microstim_20230808_143850.mat
SNA-123601_microstim_20230809_102201.mat
SNA-123601_microstim_20230810_102553.mat
SNA-123601_microstim_20230811_125327.mat


MatReadError: Mat file appears to be empty

In [14]:
df0 = pd.DataFrame({'trialStart': [0.001, 0.002], 'response_t': [0.45, 0.6]})
df1 = pd.DataFrame({'trialStart': [0.001], 'response_t': [0.4]})
display(df1)
df1['trialStart'] += df0.iloc[-1]['trialStart']
df1['response_t'] += df0.iloc[-1]['response_t'] # todo not change this

display(df1)

df2 = pd.concat([df0, df1])
display(df2)

Unnamed: 0,trialStart,response_t
0,0.001,0.4


Unnamed: 0,trialStart,response_t
0,0.003,1.0


Unnamed: 0,trialStart,response_t
0,0.001,0.45
1,0.002,0.6
0,0.003,1.0
