# Preprocessing Script for EPSC Traces #

## Description ## 

This script is a pilot preprocessing script for EPSC traces, in need of validation. 

## Data Dictionary: ## 

- **Processed_EPSCs**: Data from "Only Single EPSCs Sheet 1". This data has already been preprocessed  manually. 

- **Unprocessed_EPSCs**: Data from "171023_010.txt" that has not been processed. After processing, it will be compared with Processed_EPSCs. This file in particular is being used because full traces can be visualized. The first column of this data is the time points, the second column of it is the current recordings



Notes from Juan:
The criteria used for decay is, as discussed also, decay tau < 0.8 milliseconds and end level of fitting < 10pA.
The criteria for rise time that I used now was 20-80% < 0.2 milliseconds. Again, only based on operational criteria.

A couple of more points:
- We can change the 20-80% to 10-90% at any time. 
- to detect the onset of the EPSC the criteria that I used was 5x the standard deviation of the baseline.
- Decay fitting started 5 sample points after peak
- EPSCs in the spreadsheet are not further noise filtered after acquisition (lowpass 6 kHz at hardware).

## Initalization ## 

In [1]:
#Ensure all libraries are present
import numpy as np 
import xlrd 
import matplotlib.pyplot as plt

In [2]:
#Helper functions for reading in data 

def read_excel(excel_path, sheet_no = 0):
    book = xlrd.open_workbook(excel_path)
    sheet = book.sheet_by_index(sheet_no)
    return numpy.array([list(map(lambda x : x.value, sheet.row(i))) for i in range(sheet.nrows)])

def read_txt(filename):
    data = np.loadtxt(filename, delimiter="\t")
    return data


In [3]:
# #Specify the file name 
# unprocessed_filename = "unprocessed_data.txt"
# unprocessed_epscs = read_txt(unprocessed_filename)
# #Reshape the data so that every trace and its timepoints are a subarray
# num_traces = 10
# unprocessed_epscs = unprocessed_epscs.reshape(num_traces, -1, 2) #Shape is (10, 499968, 2) for prototype
# #Access a single trace without time data by unprocessed_epsc[trace index][1]

## Visualize Traces ##

In [4]:
# fig, axs = plt.subplots(2, 5, figsize=(30, 20))
# axs = axs.flatten()
# for i, trace in enumerate(unprocessed_epscs):
#     timepoints = trace[:,0]
#     current = trace[:,1]
#     axs[i].plot(timepoints, current)


## Preprocessing Checkpoints ##

Email raw text: 
minimum peak amplitude (20pA), maximum 10-90% or 20-80% rise time (0.2 - 0.3 ms), maximum decay tau (0.8 ms), and return to within 10pA of baseline within 10ms after peak.  The baseline (mean and s.d.) is calculated over a 1ms period beginning 2ms before the peak, and the beginning of the EPSC is when it exceeds baseline by 5 standard deviations

### *Checkpoint 1: EPSC Peaks are Aligned* ###

The peak of the EPSC must correspond to the index given 

In [5]:
def check_peak_alignment(unprocessed_EPSCs, peak_index):
    # Iterate through columns
    processed_EPSCs = unprocessed_EPSCs.copy()
    count = 0
    for col in processed_EPSCs.columns:
        # Find the index of the maximum value in the column
        max_index = processed_EPSCs[col].idxmax()

        # Check if the index matches the specified peak_index
        if max_index != peak_index:
            # Drop the column if the index doesn't match
            processed_EPSCs.drop(col, axis=1, inplace=True)
            count +=1
    print(f"Dropped {count} unaligned EPSC traces!")
    return processed_EPSCs



        
    
    
    


### *Checkpoint 2: EPSC Peaks Return to Baseline Within 10ms* ###
The EPSC must return to within 10pA of the baseline within 10ms after the peak. This means that after the peak, the function will check if everything beyo

600 timepoints = 12ms 
50 timepoints = 1ms
"Within 10ms of peak" indicates "within 500 timepoints of peak" 

In [6]:
# def check_baseline_return(unprocessed_EPSCs,baseline_mean,peak_index):
    
    

### *Checkpoint 3: Eliminate Double Peaks* ###

In [7]:
def eliminate_double_peaks(unprocessed_EPSCs,current_threshold, peak_index, time_threshold ):
    #Late double peaks
    df = unprocessed_EPSCs.copy()
    count = 0
    for col in df.columns:
        # Extract values in the specified range of rows for the column
        after_peak_values = df.loc[peak_index+time_threshold:, col]
        before_peak_values = df.loc[0:peak_index-30,col]
        # Check if any value violates the threshold
        if any(after_peak_values > current_threshold):
            df.drop(col, axis=1, inplace=True)
            count += 1
        if any(before_peak_values > 300):
            print("Found before peak issue")
            df.drop(col,axis=1,inplace=True)
            
    print(f"Dropped {count} EPSCs with double peaks")
    return df 


    
    