# Retrieving Spike Count Data of Mouse Using AllenSDK

## Overview
- We used the Allen Brain Observatory's [`Visual Coding - Neuropixels`](https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html) dataset.
- Neural activity during visual stimulus presentation to mice was recorded using Neuropixels electrodes.
- Allen Software Development Kit (`AllenSDK`; https://allensdk.readthedocs.io) was used for data acquisition and preprocessing.
  - Please refer to the [official quickstart](https://allensdk.readthedocs.io/en/latest/_static/examples/nb/ecephys_quickstart.html) for usage of `AllenSDK`.

### Data Characteristics
- Session ID (mouse ID)
  - 32 individuals
- Brain regions
  - VISp (Visual area)
  - VISrl (Visual area)
  - VISl (Visual area)
  - VISal (Visual area)
  - VISpm (Visual area)
  - VISam (Visual area)
  - LGd (Thalamus)
  - CA1 (Hippocampus)
- Stimulus
  - `natural_scenes`: 118 black and white images
  - `natural_movie_one`: 30 sec. video stimulus (part of a black and white film)
  - `natural_movie_three`: 120 sec. video stimulus (part of a black and white film)

## Setting

In [1]:
import xarray as xr
from neurep_gwot_mouse.allen_brain_toolbox import AllenDataLoader

## Data Loading Process

Set the path where data will be saved in `DATA_PATH`.  
After running this example, the directory structure under `DATA_PATH` will be as follows:

```
 {DATA_PATH} /
        ├── natural_scenes /
        │   ├── {session_id} /
        │      ├── {area}_spike_counts_da.nc
        │      ├── {area}_trial_info_df.pkl
        │
        ├── natural_movie_one /
        │   ├── xxxframe /                          <- Spike counts aggregated based on time-window (30 ÷ time_window)
        │   │   ├── {session_id} /                  <- Mouse ID
        │   │   │   ├── {area}_spike_counts_da.nc   <- Spike counts for each brain area in xarray format (3D version of pd.DataFrame).
        │   │   │   │                                  Open with `xr.open_dataarray({path})`
        │   │   │   │                                  Dimensions: trial x label(frame) x unit_id(neuron)
        │           ├── {area}_trial_info_df.pkl    <- Information about trial start times, etc. Open with `joblib.load({path})`
        │
        └── natural_movie_three /
            ├── xxxframe /                          <- Spike counts aggregated based on time-window (120 ÷ time_window)
                ├── {session_id} /
                    ├── {area}_spike_counts_da.nc
                    ├── {area}_trial_info_df.pkl
```

In [2]:
DATA_PATH = "/home/share/allen_test"

First, initialize the dataloader.  
Note that initialization will take approximately 2-5 minutes as it retrieves cache from AllenSDK.

In [3]:
data_loader = AllenDataLoader(
    manifest_json_path=DATA_PATH + "/settings/manifest.json",
)

Load the data.  
Be aware that downloading each session requires approximately 2-4 GiB of storage space.

In [None]:
# To retrieve data for a single session
data_loader.get_spike_counts(
    stimulus_name="natural_movie_one",
    session_id=715093703,
    area="VISp",
    base_dir=DATA_PATH,
)

# To retrieve data for all sessions (parallel processing)
# data_loader.get_spike_counts_parallel(
#     stimulus_name="natural_movie_one",
#     area="VISp",
#     base_dir=DATA_PATH,
#     n_jobs=24
# )

Start processing stimulus natural_movie_one, session 715093703 for area VISp...




Downloading:   0%|          | 0.00/2.86G [00:00<?, ?B/s]

  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."


Using default time window: 0.3333333333333333
Save directory: /home/share/allen_test/natural_movie_one/90frame


## Data Structure and Validation

The output data is stored in xarray format.  
The `spike_counts_da` contains spike counts recorded during stimulus presentation periods.

### Data Dimensions
- dim1: `trial`
  - Number of stimulus presentations
- dim2: `label`
  - Stimulus identifier
    - `natural_scenes`: Image labels (1-118)
    - `natural_movie_one/three`: Frame numbers counted from the beginning
- dim3: `unit_id`
  - Neuropixels unit identification number
  - Approximately corresponds to individual neurons

In [None]:
# Load a sample data file to examine its structure
da = xr.open_dataarray(
    DATA_PATH + "/natural_scenes/715093703/VISp_spike_counts_da.nc"
)

In [None]:
# Display the xarray object metadata and structure
da

In [None]:
# View the actual spike count values
da.values

array([[[ 4,  6,  1, ...,  0,  0,  0],
        [ 1, 13,  0, ...,  0,  0,  0],
        [ 4,  4,  0, ...,  0,  1,  0],
        ...,
        [ 5,  2,  1, ...,  0,  0,  0],
        [10, 16,  1, ...,  0,  0,  2],
        [ 1,  0,  2, ...,  0,  1,  0]],

       [[ 1, 13,  0, ...,  0,  0,  0],
        [ 0,  1,  0, ...,  0,  1,  0],
        [ 8,  4,  2, ...,  0,  0,  0],
        ...,
        [ 3,  6,  1, ...,  0,  0,  0],
        [ 6,  1,  1, ...,  0,  0,  0],
        [ 1, 14,  1, ...,  0,  0,  0]],

       [[ 0,  9,  1, ...,  1,  0,  0],
        [ 5,  2,  0, ...,  0,  0,  0],
        [ 0,  7,  2, ...,  0,  1,  0],
        ...,
        [ 2,  1,  2, ...,  0,  0,  0],
        [ 1,  7,  1, ...,  0,  0,  0],
        [ 1,  2,  0, ...,  0,  0,  0]],

       ...,

       [[ 2, 10,  3, ...,  0,  0,  0],
        [ 1,  7,  0, ...,  0,  0,  0],
        [ 0,  8,  1, ...,  0,  2,  0],
        ...,
        [10, 14,  0, ...,  0,  1,  1],
        [ 6, 12,  0, ...,  1,  1,  0],
        [ 0, 17,  0, ...,  0,  2

## Next Steps

After loading the spike count data, you can proceed to main analysis:

1. **Data Preprocessing**:
   - Normalize spike counts
   - Average across trials
   - Create Representational Dissimilarity Matrices (RDMs)

2. **Alignment Analysis**:
   - Group alignment between pseudo-mice
   - Individual alignment between mouse pairs

For details on these steps, see `02_execute_unsupervised_alignment.ipynb`