<a id="fa2b009b-f715-4093-94a6-d32a2ac59061"></a>
# Week 8: Walk and Spike Analysis
<a href="#Overview">Return to overview</a>


Hello All, 

Today we are going to be analyzing whole-cell patch clamp electrophysiology data. 

A bit of background.. 

I'm currently examining the effects of chronic morphine exposure on the ability for paraventricular nucleus of the thalamus (PVT)    to fire action potentials. 
    
Experimentally, I'm probing this by holding cells in a Current-Clamped configuration and injecting depolarizing currents while measuring changes to the voltage. (this allows us to visualize action potentials. (Insert example trace of an action potential - illustrating that once the voltage passes a threshold of 0 mV you get action potentials). we will be analyzing the voltage response to a 100 pA current injection under 4 different context. 
    
Unfortunately, (or perhaps fortunately for yall) my inital experiements were not designed with streamlined coding in mind. SO what i actually ended up doing was injecting varying current steps into a single cell, beginning at -200 pA and ending at +600 pA increasing by 100 pA increments. 
    
So what is the goal for today?
    
    1. write a code to analyze relevant information from our data
        relevant data:
            a. number of spikes
            b. Latency
            c. interspike interval 
            d. frequency of spikes 
    2. once we have the frame work in place, lets write a code to streamline this process for us. that is, simply plug in a folder with all the data and have the code comb (WAlK) through each folder for the relevant information. 
    
    Lets begin! :D 

<a id="5f458303-534b-4959-be76-ea31a22e686e"></a>
# Examine example data file
<a href="#Overview">Return to overview</a>


First things first. the packages you will need 

we are going to be using 

 os 
 numpy as np
 scipy.signal import find_peaks
 matplotlib.pyplot as plt

In [None]:
import os
import numpy as np
import pandas as pd
from scipy.signal import find_peaks
import matplotlib.pyplot as plt


In [None]:
Test = np.loadtxt('Data/Naive/Morphine/01232023_7.axgt', skiprows=1, delimiter='\t')

A little bit about the data. Again it was not made with coding in mind (sorry). So lets get a handle on it. 

In [None]:
Test.shape

If we look at the shape, we see that it has the shape of (50000, 6). This is becuase my experiment is 5 seconds long, sampled at 10 KHz. A depolarizing, (or Hyperpolarizing) current step was given at 1 second (or the 10000 recorded data point) and lasted for 1 whole seconds. Experiments were recorded using the program Axograph. the data it spits out is oraganized as follows. the first column is always time. We really only care about the 100 pA depolarizing current step, which is stored in the **fifth** column of the data matrix.

Write a script to pull out our time variable and the data during the 100 pA depolarizing current step.

<a id="3c061738-2f22-41f8-80bc-b5f5cd92e75e"></a>
## Exercise 1
<a href="#Overview">Return to overview</a>
write a script to pull out our time variable and the data during the 100 pA depolarizing current step and the plot it. 

In [None]:


# Answer 

#convert_to_mv = 10e10
time = Test[:, 0]
depo_pA = Test[:, 4]

depo_pA.shape

Great! So now we have our two relevant columns from our data. lets plot it. 

In [None]:

# Answer 
# Plot the data
plt.plot(time, depo_pA, label='Depo_pA')

# Add labels and title
plt.xlabel('Time')
plt.ylabel('Voltage')
plt.title('Plot of Time vs Depo_pA')
#plt.xlim(.9,2.1)
# Add legend
plt.legend()


Heck YEAH! I did that experiment! 

<a id="199baedf-e58f-49ae-951f-8292fcb7b927"></a>
## Spike number
<a href="#Overview">Return to overview</a>

So lets write some lines to get at our pertinant data beginning with the number of action potentials. Keep in mind that once a cells membrane potential (voltage) goes above 0, we can call that an action potential. 

Goal is to identify the number of spikes during the depolarizazion. 

In [None]:
# Set the threshold
threshold = 0

# Call find_peaks
peaks, _ = find_peaks(depo_pA, height=threshold)

print(peaks)

In [None]:

#Answer

# Set the threshold
threshold = 0

# Define the time range (between 1 and 2 seconds)
start_time = 1
end_time = 2

# Find the indices within the specified time range
time_indices = np.where((time >= start_time) & (time <= end_time))[0]

In [None]:

#Answer 

# Initialize a counter for the occurrences
num_spike_events = 0

# Iterate over the indices and count occurrences
in_spike = False
for idx in time_indices:
    if depo_pA[idx] > threshold and not in_spike:
        # Entering a spike event
        in_spike = True
        num_spike_events += 1
    elif depo_pA[idx] <= threshold and in_spike:
        # Exiting a spike event
        in_spike = False

# Print the result
print(f"Number of spike events between {start_time} and {end_time} seconds: {num_spike_events}")

For good measure, lets make sure that is actually picking up each AP. 

In [None]:

# Plot the data with peaks
plt.plot(time, depo_pA, label='100 pA')
plt.plot(time[peaks], depo_pA[peaks], 'r.', label='Peaks')
plt.xlabel('Time')
plt.ylabel('Voltage')
plt.legend()
plt.show()


<a id="c6decc8e-a352-42fc-9939-a11e2e0838c5"></a>
## Exercise 2: Zoom in to stimuluation time
<a href="#Overview">Return to overview</a>

Plot the data zoomed in only over the 1-sec stimulation period

In [None]:

# Answer

# Plot the data with peaks
plt.plot(time, depo_pA, label='100 pA')
plt.plot(time[peaks], depo_pA[peaks], 'r.', label='Peaks')
plt.xlabel('Time')
plt.ylabel('Voltage')
plt.legend()
plt.xlim(1,1.2)

##**DRUM SOLO**## FUCK YEAH! 

<a id="aed7aa2c-de3a-43c1-9540-aa6326c7d81b"></a>
## Calculating the latency
<a href="#Overview">Return to overview</a>


Great! Now let's look at how long it took for the first spike to occur in relation to the start of the depolarizing step. 

In [None]:
# Find the first index where depo_pA exceeds the threshold
above_threshold_indices = np.where(depo_pA > threshold)[0]

if above_threshold_indices.size > 0:
    # Get the first index where depo_pA exceeds the threshold
    first_peak_index = above_threshold_indices[0]

    # Calculate the latency to the first peak
    latency_to_first_peak = time[first_peak_index] - start_time
    

    print(f'Latency to the first peak above threshold: {latency_to_first_peak} seconds')
else:
    print('No peak above threshold in the specified time range.')

Gnarly. 

<a id="7722053d-e985-4855-bc35-e76168e775f9"></a>
## Inter Spike interval (ISI)
<a href="#Overview">Return to overview</a>


Another relavent piece of info is the time betweem spikes. we can easily calculate this with  ``np.diff`` function

<a id="11921587-2bde-4b37-89ae-4b25dbfcb40c"></a>
## Exercise 3: Calculate the ISI
<a href="#Overview">Return to overview</a>


In [None]:
np.diff?

In [None]:
inter_spike_intervals = 

In [None]:

#Answer 

# Calculate inter-spike intervals
inter_spike_intervals = np.diff(time[peaks])
inter_spike_intervals

Now we want to look at the Frequency. that is the number of action potentials within the firing range. Its worth noting that while we are deplarizing for 1 second, these cells stop firing before the current step is over (Sometimes..). this is thought to be secondary to sodium channel inactivation Voltage-gated sodium channels open (activate) when the membrane is depolarized and close on repolarization (deactivate) but also on continuing depolarization by a process termed inactivation, which leaves the channel refractory, i.e., unable to open again for a period of time.]

So to calculate the frequency we want to look at the number of events / the time from intiation to Na channel inactivation. 

<a id="301e6d3a-35f7-47a9-9dbc-17886ef368c7"></a>
## Exercise 4: Calulate frequency
<a href="#Overview">Return to overview</a>


There are two ways that come to mind to calculate the time to Na inactivation. (1) finding the time of the last peak or (2) taking the sum of all the ISIs. Is there a 3rd way yall can think of? 

In [None]:
# Answer 

if num_spike_events > 0:
    spike_frequency = num_spike_events / np.sum(inter_spike_intervals)
else:
    spike_frequency = np.nan
    
spike_frequency

In [None]:
# Answer 

if above_threshold_indices.size > 0:
    # Get the last index where depo_pA exceeds the threshold
    last_peak_index = above_threshold_indices[-1]

    # Calculate the latency to the last peak
    latency_to_last_peak = time[last_peak_index] - start_time

    print(f'Latency to the last peak above threshold: {latency_to_last_peak} seconds')
else:
    print('No peak above threshold in the specified time range.')

In [None]:
Frequency = num_spike_events/latency_to_last_peak 
Frequency

<a id="c3111e33-7cfa-4eea-b932-157d44f41158"></a>
# Iterate through file
<a href="#Overview">Return to overview</a>


First some info on how the data is oragnaized.

Data exists as txt files (.agxt) in the folder [Data]. this folder has two more folders repesenting two treatment groups [Naive, MTA]. each folder has TWO MORE subfolders [Morphine, NLX]. So we want to record all the data and give them the label of what folder/subfolder it is being analyzed from.  

Lets take the example weve been looking at today and get its contextual information. 

In [None]:
example_file='Data/Naive/Morphine/01232023_7.axgt'

In [None]:
path_parts = example_file.split('/')
f = path_parts[3]
parent = path_parts[2]
grandparent = path_parts[1]
print(f"file: {f}")
print(f"condition (parent folder): {parent}")
print(f"animal group (grandparent folder): {grandparent}")

Great! we've gotten the components of all we want to do to our data. now let's make it do that through all of our files 

In [None]:
from pathlib import Path

df_files = pd.DataFrame()

startpath='Data'
counter = 0
for root, dirs, files in os.walk(startpath):
    for f in files:
        if f.endswith('axgt'):
            full_file = Path(root, f)
            # print(full_file.parents[2])
            # path_parts = root.split('/')
            parent = full_file.parents[0]
            grandparent = full_file.parents[1]
            df_files.loc[counter,'filepath']=full_file
            df_files.loc[counter,'group']=grandparent.name
            df_files.loc[counter,'condition']=parent.name
            df_files.loc[counter,'file']=f
            counter += 1


df_files

<a id="9826140c-9759-448b-abaa-a2b202776640"></a>
## Exercise 5: Compute statistics for all files
<a href="#Overview">Return to overview</a>

Write code that iteraties through each row of `df_files`, measures spike times and computes the spikes statistics. Save the spike count, spike frequency and time to last spike in columns `spike_count`, `spike_frequency` and `time_to_last_spike`, respectively. 

In [None]:

# Answer

for i, r in df_files.iterrows():
    print("Processing:", i, r['filepath'])

    data = np.loadtxt(r['filepath'], skiprows=1, delimiter='\t')
    
    # Extract time and data for analysis
    time = data[:, 0]
    experiment_data = data[:, 4] 
    
    # Analysis results for: number of spikes/frequency/time to last spike  
    threshold = 0
    peaks, _ = find_peaks(experiment_data, height=threshold)

    
    inter_spike_intervals = np.diff(peaks)
    num_spike_events = len(peaks)
    
    if num_spike_events > 0:
        spike_frequency = num_spike_events / np.sum(inter_spike_intervals)
    else:
        spike_frequency = np.nan
    
    onset_time = 1
    time_to_last_spike = peaks.max() if num_spike_events > 0 else None

    df_files.loc[i,'spike_count']=num_spike_events
    df_files.loc[i,'spike_frequency']=spike_frequency
    df_files.loc[i,'time_to_last_spike']=time_to_last_spike


In [None]:
df_files.shape

Lastly, we can move our results to a .csv for export and use the fancy statistics software we already pay for to analyze the data. 

In [None]:
df_files.to_csv('results.csv', index=False)

ALTERNATIVELY: we need to tell it where we are getting out data from. 

In [None]:
# Specify the top-level folder path ('Data' folder)
top_folder_path = 'Data' 

So to start lets define the columns of data we want to fill 

In [None]:
def analyze_data(top_folder_path):
    # Initialize lists to store results
    file_names = []
    parent_folders = []  
    subfolders = []  
    num_spike_events_list = []
    inter_spike_intervals_list = []
    spike_frequency_list = []
    time_to_last_spike_list = []

Once we set the path for the code to follow we just got to put the code we made above into a for loop and allow it to iterate through all our data. additonally we will want to make it spit all that data into a spreadsheet so I.. i mean we can stop coding and move the data to another software we are aready paying for like prism. 

In [None]:

# Answer 

import os
import numpy as np
from scipy.signal import find_peaks
import pandas as pd

def analyze_data(top_folder_path):
    # Initialize lists to store results
    file_names = []
    parent_folders = []  
    subfolders = []  
    num_spike_events_list = []
    inter_spike_intervals_list = []
    spike_frequency_list = []
    time_to_last_spike_list = []

    count = 0

    # Iterate through all subfolders within the top folder
    for root, dirs, _ in os.walk(top_folder_path):
        # Check if the current subfolder is a 'morphine' or 'NLX' folder
        if os.path.basename(root) in ['Morphine', 'NLX']:
            # Get the parent folder (Naive or MTA)
            parent_folder = os.path.basename(os.path.dirname(root))
            # Get the subfolder (morphine or NLX)
            subfolder = os.path.basename(root)

            # Iterate through all files in the current folder
            for file_name in os.listdir(root):
                if file_name.endswith('.axgt'):
                    # Load data from the file
                    file_path = os.path.join(root, file_name)
                    data = np.loadtxt(file_path, skiprows=1, delimiter='\t')

                    # Extract time and data for analysis
                    time = data[:, 0]
                    experiment_data = data[:, 4]  # Change the column index as needed

                    # Analysis results for: number of spikes/frequency/time to last spike  
                    threshold = 0
                    peaks, _ = find_peaks(experiment_data, height=threshold)

                    num_spike_events = len(peaks)
                    inter_spike_intervals = np.diff(time[peaks])

                    if num_spike_events > 0:
                        spike_frequency = num_spike_events / np.sum(inter_spike_intervals)
                    else:
                        spike_frequency = np.nan

                    onset_time = 1
                    time_to_last_spike = time[peaks[-1]] - onset_time if num_spike_events > 0 else None

                    # Add results to lists
                    file_names.append(file_name)
                    parent_folders.append(parent_folder)  
                    subfolders.append(subfolder)  
                    num_spike_events_list.append(num_spike_events)
                    inter_spike_intervals_list.append(inter_spike_intervals)
                    spike_frequency_list.append(spike_frequency)
                    time_to_last_spike_list.append(time_to_last_spike)

                    count += 1

    # Create a spreadsheet (CSV file) with the results
    output_directory = "output"
    os.makedirs(output_directory, exist_ok=True)

    output_file_path = os.path.join(output_directory, 'analysis_results.csv')

    df = pd.DataFrame({
        'File Name': file_names,
        'Parent Folder': parent_folders,  
        'Subfolder': subfolders,  
        'Number of Spike Events': num_spike_events_list,
        'Inter-Spike Intervals': inter_spike_intervals_list,
        'Spike Frequency (Hz)': spike_frequency_list,
        'Time to Last Spike': time_to_last_spike_list
    })

    df.to_csv(output_file_path, index=False)

    print(f"Results saved to {output_file_path}")
    print(f"Number of times the loop ran: {count}")

# The top-level folder path ('Data' folder)
top_folder_path = 'Data'  
analyze_data(top_folder_path)



If we are left doing the rest of the analysis on python, the next step will be going into the newly developed data files and pulling out the relevant information