# Unsupervised Alignment of Mouse Neural Representations

## Overview
- このノートブックではNeural RepresentationsのUnsupervised Alignmentの実行方法に付いて説明する。
- 実行ファイルはすべて`scripts`内に格納されている。

---

## Group Alignment
- Perform an unsupervised alignment of neural representations between two pseudo-mice, which are treated as single mouse formed by aggregating data from multiple individual mice.
- Neural Representation: 
  - Neural activity (neuron dim. vector) in response to stimuli. As a feature of the representation, we use the distance matrix between stimulus representations (Representational Dissimilarity Matrix; RDM).
- Unsupervised Alignment: 
  - Identify correspondences between neural representations based solely on two RDMs. Gromov-Wasserstein Optimal Transport is used in the alignment algorithm.

### 1. Alignment ([`scripts/ta_paper_group_alignment.py`](../scripts/ta_paper_group_alignment.py))

#### **Overview**
The Script for performing Gromov-Wasserstein (GW) optimal transport between pseudo-mice with different numbers of individuals.  
This script calculates alignment metrics between neural representations across various brain areas of mice.

#### **Key Features**
- Performs unsupervised alignment between neural representations using Entropic Gromov-Wasserstein Optimal Transport.
- Supports 3 stimuli (natural_movie_one, natural_movie_three, natural_scenes).
- Processes data through a pipeline: 
  - Unit Normalization of Spike Counts → MeanTrials → MakeRDM(metric="cosine")
- Configurable via command-line arguments and setting files


#### **Command-Line Interface**
```python
python ta_paper_group_alignment.py <setting_file.csv> [options]
```
Arguments  
- `setting_file`: Path to a CSV file containing experiment settings
- `--target_dir`: Directory to save results
- `--whole_data_dir`: Directory containing source data 
- `--session_split_dir`: Directory for session split files 
- `--config_dir`: Directory for config files 


#### **File Contents**

setting_file(CSV)
- Refer to the example in csv files in [`setting_files/dummy_group_alignment_setting.csv`](../setting_files/dummy_group_alignment_setting.csv)
- The setting file (CSV) should contain the following columns:
  - `exp_name`: Prefix for the GW alignment study (should include stimulus name, number of individuals, etc.)
  - `stimulus_name`: Stimulus name (natural_movie_one, natural_movie_three, natural_scenes)
  - `storage_name`: Name of the RDB to save results (e.g., 'takeda_abe_paper')
  - `session_split_name`: File name describing session split information for creating pseudo-mice. A file with the same name must be included in `session_split_dir`.
  - `config_name`: Pipeline setting file name for converting spike_counts to RDM. A file with the same name must be included in `config_dir`.

session_split_file(CSV)
- File that describes how to split available mice into pseudo-mice.
  - In this file, each row represents a mouse individual name, and each column corresponds to an experiment number for Unsupervised Alignment. (Multiple experiments are conducted with different pseudo-mouse configurations to evaluate without bias from the combination of mice)
- Refer to the csv files in [`session_split`](../session_split/)

config_file(YAML)
- Definition file for the pipeline that creates RDM
- Raw spike counts vectors are preprocessed into RDM in the order written from top to bottom in this file
  - `normalize_scaler`: Standardization settings for Spike Counts
  - `mean_trials`: Averaging Spike Counts for each trial
  - `make_rdm`: RDM creation settings
    - `metric`: Distance metric (e.g. `cosine`)

#### **Workflow**
1. Initialization:
    - Load experiment settings from the specified files
2. Processing Loop:
    - For each row in the experiment settings file:
      - Extract experiment parameters (exp_name, stimulus, storage_name)
      - Call [`group_alignment_experiment()`](../src/neurep_gwot_mouse/alignment/ta_paper_group.py)to perform the alignment
3. Alignment Process (via [`group_alignment_experiment`](../src/neurep_gwot_mouse/alignment/ta_paper_group.py)):
    - Loads neural spike data for each brain area and condition
    - Performs data preprocessing (normalization, averaging across trials)
    - Computes Representational Dissimilarity Matrices (RDMs) using specified distance metric
    - Applies Entropic Gromov-Wasserstein alignment between brain areas
    - Calculates evaluation metrics (Spearman correlation, top-1 matching rate)
    - Saves results to the specified directory

#### **Output Structure**
```  
stimulus_name/
├── results/                   # Directory where GW alignment results are saved
│   ├── exp_name/              # Directory named as specified by exp_name
│   │   ├── condition_0/       # Results for each pseudo-mouse
│   │   │   ├── VISp_vs_VISp/  # Alignment results for each area pair
│   │   │   ├── VISp_vs_VISrl/
│   │   │   └── ...
│   │   └── ...
│   └── ...
└── img/
    ├── exp_name/             # Directory for figures (created separately)
    └── ...
```

### 2. Evaluation ([`scripts/ta_paper_group_evaluation.py`](../scripts/ta_paper_group_evaluation.py))

#### **Overview**
The script for evaluating the results of Gromov-Wasserstein (GW) optimal transport experiments performed between pseudo-mice.  
It processes the alignment results and generates visualizations and metrics to compare brain areas.

#### **Key Features**
- Aggregates alignment results across different brain areas and conditions
- Calculates evaluation metrics including Spearman correlation and top-1 matching rate
- Generates visualizations: heatmaps, dendrograms, and swarm plots

#### **Command-Line Interface**
```python
python ta_paper_group_evaluation.py <stimulus_name> <exp_name> [options]
```
Arguments
- `stimulus_name`: Name of the stimulus used in the alignment (e.g., 'natural_movie_one')
- `exp_name`: Name of the experiment for which to evaluate results
- `--results_dir`: Directory containing the alignment results (optional)
- `--fig_dir`: Directory to save generated figures (optional)

#### **Workflow**
1. Initialization:
    - Parse command-line arguments for stimulus and experiment names
    - Set up input/output directories for results and figures
2. Processing Loop:
    - For each metric (Spearman correlation and top-1 matching rate):
      - Aggregate results across all conditions using aggregate_calcuration()
      - Save aggregated results as numpy array
      - Generate and save visualizations:
        - Swarm plots for same-area correlations
        - Heatmaps showing correlation between all brain area pairs
        - Dendrograms showing hierarchical clustering of brain areas


#### **Visualizations Generated**
- Swarm Plots (swarm_{metric_name}.svg):
  - Shows distribution of metric values for each brain area
  - Displays individual data points for all experimental conditions

- Heatmaps (heatmap_{metric_name}.svg):
  - Visualizes the average metric(correlation or top-1 matching rate) values between all brain area pairs

- Dendrograms (dendrogram_{metric_name}.svg):
  - Shows hierarchical clustering of brain areas based on metric(correlation or top-1 matching rate)

---

## Individual Alignment
- Perform an unsupervised alignment of neural representations between two individual mice.

### 1. Alignment ([`scripts/ta_paper_ind_alignment.py`](../scripts/ta_paper_ind_alignment.py))

#### **Overview**
The Script for performing Gromov-Wasserstein (GW) optimal transport between individual mice.  
This script calculates alignment metrics between neural representations across various brain areas of individual mice.

#### **Key Features**
- Performs unsupervised alignment between neural representations of individual mice using Entropic Gromov-Wasserstein Optimal Transport.
- Supports 3 stimuli (natural_movie_one, natural_movie_three, natural_scenes).
- Processes data through a pipeline: 
  - Unit Normalization of Spike Counts → MeanTrials → MakeRDM(metric="cosine")
- Configurable via command-line arguments and setting files


#### **Command-Line Interface**
```python
python ta_paper_ind_alignment.py <setting_file.csv> [options]
```
Arguments  
- `setting_file`: Path to a CSV file containing experiment settings
- `--target_dir`: Directory to save results
- `--whole_data_dir`: Directory containing source data 
- `--pairs_dict_path`: Path to the JSON file defining individual mouse pairs
- `--config_dir`: Directory for config files 


#### **File Contents**

setting_file(CSV)
- Refer to the example in [`setting_files/dummy_ind_alignment_setting.csv`](../setting_files/dummy_ind_alignment_setting.csv)
- The setting file (CSV) should contain the following columns:
  - `exp_name`: Prefix for the GW alignment study (should include stimulus name, etc.)
  - `stimulus_name`: Stimulus name (natural_movie_one, natural_movie_three, natural_scenes)
  - `storage_name`: Name of the RDB to save results (e.g., 'takeda_abe_paper')
  - `config_name`: Pipeline setting file name for converting spike_counts to RDM. A file with the same name must be included in `config_dir`.

pairs_dict_file(JSON)
- JSON file that defines the pairs of individual mice for alignment
- [`session_split/pairs_dict_ind.json`](../session_split/pairs_dict_ind.json)

config_file(YAML)
- Definition file for the pipeline that creates RDM
- Raw spike counts vectors are preprocessed into RDM in the order written from top to bottom in this file
  - `normalize_scaler`: Standardization settings for Spike Counts
  - `mean_trials`: Averaging Spike Counts for each trial
  - `make_rdm`: RDM creation settings
    - `metric`: Distance metric (e.g. `cosine`)

#### **Workflow**
1. Initialization:
    - Load experiment settings from the specified files
2. Processing Loop:
    - For each row in the experiment settings file:
      - Extract experiment parameters (exp_name, stimulus, storage_name)
      - Call [`ind_alignment_experiment()`](../src/neurep_gwot_mouse/alignment/ta_paper_ind.py) to perform the alignment
3. Alignment Process (via [`ind_alignment_experiment`](../src/neurep_gwot_mouse/alignment/ta_paper_ind.py)):
    - Loads neural spike data for each brain area of individual mice
    - Performs data preprocessing (normalization, averaging across trials)
    - Computes Representational Dissimilarity Matrices (RDMs) using specified distance metric
    - Applies Entropic Gromov-Wasserstein alignment between brain areas of individual mice
    - Calculates evaluation metrics (Spearman correlation, top-1 matching rate)
    - Saves results to the specified directory

#### **Output Structure**
```  
stimulus_name/
├── results/                   # Directory where GW alignment results are saved
│   ├── exp_name/              # Directory named as specified by exp_name
│   │   ├── condition_0/       # Results for each individual mouse pair
│   │   │   ├── VISp_vs_VISp/  # Alignment results for each area pair
│   │   │   ├── VISp_vs_VISrl/
│   │   │   └── ...
│   │   └── ...
│   └── ...
└── img/
    ├── exp_name/             # Directory for figures (created separately)
    └── ...
```

### 2. Evaluation ([`scripts/ta_paper_ind_evaluation.py`](../scripts/ta_paper_ind_evaluation.py))

#### **Overview**
The script for evaluating the results of Gromov-Wasserstein (GW) optimal transport experiments performed between individual mice.  
It processes the alignment results and generates visualizations and metrics to compare brain areas.

#### **Key Features**
- Aggregates alignment results across different brain areas and individual mice pairs
- Calculates evaluation metrics including Spearman correlation and top-1 matching rate
- Generates visualizations: heatmaps, dendrograms, and swarm plots

#### **Command-Line Interface**
```python
python ta_paper_ind_evaluation.py <stimulus_name> <exp_name> [options]
```
Arguments
- `stimulus_name`: Name of the stimulus used in the alignment (e.g., 'natural_movie_one')
- `exp_name`: Name of the experiment for which to evaluate results
- `--results_dir`: Directory containing the alignment results (optional)
- `--fig_dir`: Directory to save generated figures (optional)

#### **Workflow**
1. Initialization:
    - Parse command-line arguments for stimulus and experiment names
    - Set up input/output directories for results and figures
2. Processing Loop:
    - For each metric (Spearman correlation and top-1 matching rate):
      - Aggregate results across all conditions using aggregate_calcuration()
      - Save aggregated results as numpy array
      - Generate and save visualizations:
        - Swarm plots for same-area correlations
        - Heatmaps showing correlation between all brain area pairs
        - Dendrograms showing hierarchical clustering of brain areas


#### **Visualizations Generated**
- Swarm Plots (swarm_{metric_name}.svg):
  - Shows distribution of metric values for each brain area
  - Displays individual data points for all experimental conditions

- Heatmaps (heatmap_{metric_name}.svg):
  - Visualizes the average metric(correlation or top-1 matching rate) values between all brain area pairs

- Dendrograms (dendrogram_{metric_name}.svg):
  - Shows hierarchical clustering of brain areas based on metric(correlation or top-1 matching rate)