# Get To Know A Dataset: Interhemispheric Cortex Connectivity Microscopy Dataset

*Institut Pasteur â€” AWS Open Data Sponsorship Program*

## 1. What is this dataset?

This dataset contains high-resolution **ExA-SPIM** (Expansion-Assisted Selective Plane Illumination Microscopy) light-sheet imaging volumes of mouse brains, focused on mapping interhemispheric cortical connectivity. The data captures individual callosal projection neurons and their axonal arbors crossing between the two brain hemispheres at sub-micron effective resolution.

**Key characteristics:**
- **Modality:** ExA-SPIM light-sheet fluorescence microscopy
- **Species:** Mouse (*Mus musculus*)
- **Resolution:** ~300 nm lateral, ~800 nm axial (effective, post-expansion)
- **Format:** OME-Zarr (multiscale, cloud-optimized)
- **Focus:** Interhemispheric (callosal) cortical projection neurons

The dataset is produced by the [Institut Pasteur](https://www.pasteur.fr) and hosted through the [AWS Open Data Sponsorship Program](https://aws.amazon.com/opendata/open-data-sponsorship-program/).

## 2. How can I access the data?

The data is stored in a public Amazon S3 bucket. No AWS account is required.

In [None]:
# List available data assets using boto3
import boto3
from botocore import UNSIGNED
from botocore.config import Config

s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
bucket = '[BUCKET-NAME-TBD]'

# List top-level data assets
response = s3.list_objects_v2(Bucket=bucket, Delimiter='/')
for prefix in response.get('CommonPrefixes', []):
    print(prefix['Prefix'])

In [None]:
# Alternatively, using the AWS CLI (run in terminal):
# aws s3 ls --no-sign-request s3://[BUCKET-NAME-TBD]/

## 3. What does the data look like?

Each data asset is a directory following [AIND-style naming conventions](https://allenneuraldynamics.github.io/data.html):

```
exaspim_<subject-id>_<date>_<time>/
    exaspim/
        image.ome.zarr/       # Multiscale OME-Zarr volume
    data_description.json     # Dataset metadata
    subject.json              # Animal information
    procedures.json           # Experimental procedures
    acquisition.json          # Microscope settings
    processing.json           # Processing provenance
```

### Reading metadata

In [None]:
import json
import s3fs

fs = s3fs.S3FileSystem(anon=True)

# Read subject metadata for an example asset
example_asset = 'exaspim_[EXAMPLE-SUBJECT]_[DATE]_[TIME]'

with fs.open(f'{bucket}/{example_asset}/subject.json', 'r') as f:
    subject = json.load(f)
    print("=== Subject metadata ===")
    print(json.dumps(subject, indent=2))

In [None]:
# Read acquisition parameters
with fs.open(f'{bucket}/{example_asset}/acquisition.json', 'r') as f:
    acquisition = json.load(f)
    print("=== Acquisition parameters ===")
    print(json.dumps(acquisition, indent=2))

## 4. How can I load and visualize the imaging data?

### Load a low-resolution overview with zarr and dask

In [None]:
import zarr
import dask.array as da
import matplotlib.pyplot as plt
import numpy as np

store = s3fs.S3Map(
    root=f'{bucket}/{example_asset}/exaspim/image.ome.zarr/3',  # Low-res pyramid level
    s3=fs
)
z = zarr.open(store, mode='r')
data = da.from_zarr(z)

print(f"Shape (Z, Y, X): {data.shape}")
print(f"Dtype: {data.dtype}")
print(f"Chunk size: {data.chunks}")

In [None]:
# Display a maximum intensity projection (MIP) of a coronal subvolume
mid_z = data.shape[0] // 2
subvol = data[mid_z - 25 : mid_z + 25, :, :].compute()
mip = subvol.max(axis=0)

plt.figure(figsize=(14, 10))
plt.imshow(mip, cmap='gray', vmin=np.percentile(mip, 1), vmax=np.percentile(mip, 99.5))
plt.title('Maximum Intensity Projection â€” Coronal View')
plt.colorbar(label='Fluorescence intensity')
plt.axis('off')
plt.tight_layout()
plt.show()

### Interactive 3D visualization with napari

In [None]:
# Uncomment to launch napari viewer (requires local installation)
# import napari
# viewer = napari.Viewer()
# viewer.open(
#     f"s3://{bucket}/{example_asset}/exaspim/image.ome.zarr",
#     plugin="napari-ome-zarr"
# )
# napari.run()

## 5. A deeper look: exploring interhemispheric connectivity

### Visualizing callosal projections

The corpus callosum is the primary white matter tract connecting the two cortical hemispheres. In this dataset, labeled projection neurons send their axons through the corpus callosum to innervate the contralateral hemisphere.

In [None]:
# TODO: Load a subvolume centered on the corpus callosum
# Define region of interest coordinates based on atlas registration
# or manual annotation of the midline crossing point.

# cc_z_range = slice(z_start, z_end)
# cc_y_range = slice(y_start, y_end)
# cc_x_range = slice(x_start, x_end)
# cc_roi = data[cc_z_range, cc_y_range, cc_x_range].compute()

# plt.figure(figsize=(12, 8))
# plt.imshow(cc_roi.max(axis=0), cmap='magma')
# plt.title('Corpus Callosum Region â€” Axonal Crossing')
# plt.axis('off')
# plt.show()

### Quantifying fluorescence across hemispheres

In [None]:
# TODO: Compute a fluorescence intensity profile across the midline
# to reveal the spatial distribution of callosal projections.

# profile = np.mean(mip, axis=0)  # Mean along dorsal-ventral axis
# x_coords = np.arange(len(profile))
# midline = len(profile) // 2

# plt.figure(figsize=(10, 4))
# plt.plot(x_coords, profile, 'k-', linewidth=0.5)
# plt.axvline(x=midline, color='red', linestyle='--', alpha=0.7, label='Midline')
# plt.fill_between(x_coords[:midline], profile[:midline], alpha=0.3, color='blue', label='Left hemisphere')
# plt.fill_between(x_coords[midline:], profile[midline:], alpha=0.3, color='green', label='Right hemisphere')
# plt.xlabel('Medial-Lateral position (pixels)')
# plt.ylabel('Mean fluorescence intensity')
# plt.title('Interhemispheric Fluorescence Distribution')
# plt.legend()
# plt.tight_layout()
# plt.show()

## 6. Community Challenge ðŸ§ 

**Can you develop an automated method to detect, segment, and trace individual callosal axons crossing the corpus callosum in whole-brain ExA-SPIM data?**

Specifically:

1. **Detection:** Identify the location of labeled cell bodies in the cortex and their primary axons
2. **Tracing:** Follow individual axons through the corpus callosum to their contralateral targets
3. **Classification:** Characterize projection patterns â€” do axons from a given cortical area project to homotopic or heterotopic contralateral regions?

This challenge is relevant to understanding how cortical areas communicate across hemispheres and could leverage modern deep learning approaches for neuron tracing (e.g., flood-filling networks, transformer-based segmentation).

We welcome solutions, analyses, and derived datasets. Please contact **florent.haiss@pasteur.fr** if you would like to discuss your approach.

## Requirements

```
pip install boto3 s3fs zarr dask numpy matplotlib napari napari-ome-zarr
```

## References

- [OME-Zarr specification (OME-NGFF)](https://ngff.openmicroscopy.org/)
- [ExA-SPIM: Expansion-assisted selective plane illumination microscopy](https://pmc.ncbi.nlm.nih.gov/articles/PMC12208669/)
- [Allen Institute for Neural Dynamics â€” Data Access](https://allenneuraldynamics.github.io/data.html)
- [aind-data-schema](https://github.com/AllenNeuralDynamics/aind-data-schema/)
- [Registry of Open Data on AWS](https://registry.opendata.aws/)