# Figure S24 notebook to generate figure

Figure S24 CCA analysis to quantify the correlation between trials. Individual points represent the average value for each animal. Controls were done by taking a random samples of neural activities from each animal. \
a) Average correlation along the first 5 components in CCA space for each pair of same-type trials (from the same startbox to the same reward-well). \
b-c) Decoders performance in CCA space and neural space when cross-validated on same-type trials. Chance levels given by the classes of the decoder. \
d) Same analysis as in (a) but on symmetrical trials (where the trials are overlapped if rotated 180°). e-f) Same analysis as in (b) and (c) but on symmetrical trials. \
g) Distance between the positions of the animal for each pair of trajectories, normalized by their length. h) Correlation between the positions of the animal for each pair of trajectories. \
i-l) Percentage of common neurons that are active between each pair of trials.

In [1]:
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import utils

# Preprocess the recorded data

## Extract trials for each experiment

A trial is the part of the session where the animal goes from the start box to the reward well. If this happens without detours and long pauses it is classified as a correct trial. \
Each animal has a dataframe that contains all the recordings for that animal. The columns of these dataframes are:
- Session: (e.g. "A28") recording day.
- Stage: ("PRE", "SAM", "CHO") stage of the recording session. "PRE" is exploration at the beginning of the day, "SAM" is the training to the correct reward of the day, and "CHO" is the consolidation phase.
- Combo: (e.g. "WN") the starting and ending box of the day (West to North).
- Reward_well: number of the reward well of the day. 
- Movement_status: ("stationary" or "moving") moving is assigned if the speed is more than 3cm/s (?)
- Cap_x, Cap_y, Leftear_x, Leftear_y, Rightear_x, Rightear_y: animal position registration.
- C000, C001, ...: Cells ID registered across all sessions of the animal. 

In detail, each trial is found by:
1. Add the 'Location' column to the dataframe. It can be: 'arena', 'outside', 'W', 'E', 'N', 'S', '1', '2', '3', '4', '5', '6'. These are the areas in the arena, the start boxes and the reward wells. 'outside' is used when the animal is trying to climb the wall.
2. Add 'Trial' column that is either '0' or '1' for being in a trial or not. The trial starts from the start box to the reward well.
3. Add 'Correct_trial' column with either 'True' or 'False'. A correct trial is not longer than 5s, the animal never tries to climb the wall, and it doesn't stop for more than 1s in any other reward well. 

The dataframes with the added columns: 'Location', 'Trial', 'Correct_trial' is saved for every animal and phase ending with '_events_with_trials.csv'

In [None]:
for rat in utils.rats:
    for phase in utils.strategies[rat]:
        print("...starting to find trials in rat: ", rat, "phase: ", phase)
        if os.path.exists(
            utils.root_dir
            + "data/"
            + rat
            + "_phase"
            + str(phase)
            + "_events_with_trials.csv"
        ):
            print(
                "Trials already found in rat: "
                + rat
                + ": "
                + utils.root_dir
                + "data/"
                + rat
                + "_phase"
                + str(phase)
                + "_events_with_trials.csv"
            )
            continue
        # Load the dataframe with all the experiments combined
        df = utils.load_df_all_raw(rat, phase)
        # Add locations to the df
        single_dfs = utils.add_locations_and_trials_to_df(df)
        # Stack all dfs and save them
        new_df = pd.concat(single_dfs)
        new_df.to_csv(
            utils.root_dir
            + "data/"
            + rat
            + "_phase"
            + str(phase)
            + "_events_with_trials.csv",
            index=False,
        )
        print(
            "\tTrials found and saved in rat: "
            + rat
            + ": "
            + utils.root_dir
            + "data/"
            + rat
            + "_phase"
            + str(phase)
            + "_events_with_trials.csv"
        )
        break
    break

...starting to find trials in rat:  H2226 phase:  1
Finished experiment  ('A28', 'CHO')
Finished experiment  ('A28', 'PRE')
Finished experiment  ('A28', 'SAM')
Finished experiment  ('A29', 'CHO')
Finished experiment  ('A29', 'PRE')
Finished experiment  ('A29', 'SAM')
Finished experiment  ('A30', 'CHO')
Finished experiment  ('A30', 'PRE')
Finished experiment  ('A30', 'SAM')
Finished experiment  ('A31', 'CHO')
Finished experiment  ('A31', 'PRE')
Finished experiment  ('A31', 'SAM')
Finished experiment  ('A32', 'CHO')
Finished experiment  ('A32', 'POS')
Finished experiment  ('A32', 'PRE')
Finished experiment  ('A32', 'SAM')
Finished experiment  ('A33', 'CHO')
Finished experiment  ('A33', 'PRE')
Finished experiment  ('A33', 'SAM')
Finished experiment  ('A34', 'CHO')
Finished experiment  ('A34', 'CTC')
Finished experiment  ('A34', 'POS')
Finished experiment  ('A34', 'PRE')
Finished experiment  ('A34', 'SAM')
	Trials found and saved in rat: H2226: /Users/elenafaillace/Library/CloudStorage/One

## Transform the events in firing rates
Convolve the events with a Gaussian kernal to obtain the probability distribution of the neuron firing. The standard deviation is set to 0.2s.


In [None]:
for rat in utils.rats:
    for phase in utils.strategies[rat]:
        print(
            "...starting to convert events to firing rates in rat: ",
            rat,
            "phase: ",
            phase,
        )
        utils.save_df_all_data_firing_rates(rat, phase)
        print(
            "\tFiring rates saved in rat: "
            + rat
            + ": "
            + utils.root_dir
            + "data/"
            + rat
            + "_phase"
            + str(phase)
            + "_data_with_firing_rates.csv"
        )
        break
    break

...starting to convert events to firing rates in rat:  H2226 phase:  1
Starting saving firing rates for rat H2226...
	Firing rates saved in rat: H2226: /Users/elenafaillace/Library/CloudStorage/OneDrive-ImperialCollegeLondon/arena2.0/paper_code_review/data/H2226_phase1_data_with_firing_rates.csv


## Save info summary with all the correct trials for each experiment
Create a dataframe where each row represents a trial, and reports: the start and end box + the rewarded well + the trial number + if the trial was correct or not.

The final dataframe has a summery then of all the correct trials with the columns:
'Rat', 'Strategy', 'Session', 'Stage', 'Combo', 'Rewarded_well', 'Trial', 'Correct_trial'. \
Combo is the start and end box (e.g. NW).

In [4]:
for phase in [1, 2]:
    # If the files is already saved, skip
    if os.path.exists(
        utils.root_dir + "data/phase" + str(phase) + "_all_trials_info.csv"
    ):
        print(
            "Trials already found in phase: "
            + str(phase)
            + ": "
            + utils.root_dir
            + "data/phase"
            + str(phase + 1)
            + "_all_trials_info.csv"
        )
        continue
    columns = [
        "Rat",
        "Strategy",
        "Session",
        "Stage",
        "Combo",
        "Rewarded_well",
        "Trial",
        "Correct_trial",
    ]
    new_df = pd.DataFrame(columns=columns)

    for rat in utils.rats:
        print("...starting to find trials in rat: ", rat)
        strategy = utils.strategies[rat][phase]
        try:
            df = utils.load_df_with_trials(rat, phase)
            # Get all the sessions
            sessions = df["Session"].unique()
            for session in sessions:
                # Get all the stages
                stages = df[df["Session"] == session]["Stage"].unique()
                for stage in stages:
                    # Get all the trials
                    trials = df[(df["Session"] == session) & (df["Stage"] == stage)][
                        "Trial"
                    ].unique()[1:]
                    for trial in trials:
                        # Get the rewarded well
                        rewarded_well = df[
                            (df["Session"] == session)
                            & (df["Stage"] == stage)
                            & (df["Trial"] == trial)
                        ]["Rewarded_well"].unique()[0]
                        # Get the combo (Start box - Rewarded well)
                        combo = df[
                            (df["Session"] == session)
                            & (df["Stage"] == stage)
                            & (df["Trial"] == trial)
                        ]["Combo"].unique()[0]
                        # Get the correctness of the trial
                        correct_trial = df[
                            (df["Session"] == session)
                            & (df["Stage"] == stage)
                            & (df["Trial"] == trial)
                        ]["Correct_trial"].unique()[0]

                        ### APPEND ROW TO THE DF ###
                        new_row = {
                            "Rat": rat,
                            "Strategy": strategy,
                            "Session": session,
                            "Stage": stage,
                            "Combo": combo,
                            "Rewarded_well": rewarded_well,
                            "Trial": int(trial),
                            "Correct_trial": int(correct_trial),
                        }
                        new_df = pd.concat(
                            [new_df, pd.DataFrame([new_row])], ignore_index=True
                        )
        except:
            print("Could not find trials in rat: ", rat)

    path_to_save = (
        utils.root_dir + "data/phase" + str(phase) + "_all_trials_info.csv"
    )
    new_df.to_csv(path_to_save, index=False)
    display(new_df.head())
    print(
        "\tTrials found and saved in phase: "
        + str(phase)
        + ": "
        + utils.root_dir
        + "data/phase"
        + str(phase)
        + "_all_trials_info.csv"
    )
    break

...starting to find trials in rat:  H2226
...starting to find trials in rat:  H2225
...starting to find trials in rat:  H2230
...starting to find trials in rat:  H2234
...starting to find trials in rat:  H2241
...starting to find trials in rat:  H2222
...starting to find trials in rat:  H2224
...starting to find trials in rat:  H2231
...starting to find trials in rat:  H2235


Unnamed: 0,Rat,Strategy,Session,Stage,Combo,Rewarded_well,Trial,Correct_trial
0,H2226,ALLO,A28,CHO,WN,6,1,1
1,H2226,ALLO,A28,CHO,SN,6,2,0
2,H2226,ALLO,A28,CHO,WN,6,3,0
3,H2226,ALLO,A28,CHO,EN,6,4,0
4,H2226,ALLO,A28,PRE,WW,6,1,0


	Trials found and saved in phase: 1: /Users/elenafaillace/Library/CloudStorage/OneDrive-ImperialCollegeLondon/arena2.0/paper_code_review/data/phase1_all_trials_info.csv


## Extract symmetrical trials

For each trial's combo I only want consider the path from the start box to the rewarded well, regardless of the end box. Therefore, I redefined 'combo' as the combination of start box + rewarded well only (e.g. N5 would be starting from the North box and reaching for rewarded well 5).

For each trial combo we defined a symmetrical combo and a control combo. The symmetrical combo is symmetrical along the x and y axis, but requires from the animal to perform the same movements (they are identical from an egocentric prespective). The control combo is symmetrical along one axis only, making the two trajectories differ from an egocentric perspective. 

For each phase we save a file that ends with '_symmetrical_trials_data.pkl' that contains a dictionary with the structure: \
rat -> (combo, symm_combo) -> (trial combo, session, stage, trial number) -> (x, y, trial info row). Where 'x' and 'y' are the xy-coordinates along the trial.

An example is: \
H2226 -> (W1, E6) -> ('W1', 'A29', 'SAM', 2) -> (x, y, all_data)

This is used to easly access the correct trials only, and their symmetrical and controls counterparts.

In [6]:
for phase in [1,2]:
    print('\nPhase: ', phase)
    utils.save_symmetrical_trials_data(phase=phase)


Phase:  1
Starting rat H2226...
...symmetrical pair of trials found: W1 and E6
...symmetrical pair of trials found: E1 and W6
...symmetrical pair of trials found: W5 and E2
Starting rat H2225...
...symmetrical pair of trials found: W5 and E2
...symmetrical pair of trials found: E5 and W2
Starting rat H2230...
...symmetrical pair of trials found: W1 and E6
...symmetrical pair of trials found: E1 and W6
Starting rat H2234...
...symmetrical pair of trials found: W1 and E6
Starting rat H2241...
...symmetrical pair of trials found: W1 and E6
...symmetrical pair of trials found: E1 and W6
...symmetrical pair of trials found: W5 and E2
Starting rat H2222...
Starting rat H2224...
...symmetrical pair of trials found: W1 and E6
Starting rat H2231...
...symmetrical pair of trials found: E1 and W6
Starting rat H2235...

Phase:  2
File already exists: /Users/elenafaillace/Library/CloudStorage/OneDrive-ImperialCollegeLondon/arena2.0/paper_code_review/data/phase2_symmetrical_trials_data.pkl
