This tutorial describes how to access datasets associated with ZAPBench with Python.

Datasets are hosted on Google Cloud Storage in the `zapbench-release` bucket, see [dataset README for acknowledgements and license (CC-BY)](http://zapbench-release.storage.googleapis.com/volumes/README.html). Datasets that may be especially relevant include:

- Functional activity volume (`gs://zapbench-release/volumes/20240930/raw`)
- Functional anatomy volume (`gs://zapbench-release/volumes/20240930/anatomy`)
- Aligned activity volume (`gs://zapbench-release/volumes/20240930/aligned`)
- Aligned and normalized activity volume (`gs://zapbench-release/volumes/20240930/df_over_f`)
- Annotations used for segmentation model training and eval (`gs://zapbench-release/volumes/20240930/annotations/...`)
- Segmentation used to extract traces (`gs://zapbench-release/volumes/20240930/segmentation`)
- Traces used for time-series forecasting (`gs://zapbench-release/volumes/20240930/traces`)

Datasets can also be browsed and downloaded directly using [gsutil](https://cloud.google.com/storage/docs/gsutil), e.g.:

In [None]:
import matplotlib.pyplot as plt
import tensorstore as ts


# Create handle to the remote dataset.
# ds = ts.open({
#     'open': True,
#     # Datasets are generally stored in zarr v3 format ('zarr3').
#     # There are a few exceptions, where v2 is used ('zarr').
#     'driver': 'zarr3',
#     # Path of the dataset we want to load.
#     'kvstore': 'gs://zapbench-release/volumes/20240930/raw'
# }).result()

# # Display info about the dataset.
# print(ds.schema)

# # Fetch a xy-slice using the handle.
# z, t = 36, 0
# example_xy_slice = ds[:, :, z, t].read().result()

# # Plot slice.
# plt.figure(figsize=(6, 12))
# plt.imshow(example_xy_slice)
# plt.title(f'xy slice at {z=}, {t=}');

In [None]:
# Create handle to the remote dataset.
ds_traces = ts.open({
    'open': True,
    'driver': 'zarr3',
    'kvstore': 'gs://zapbench-release/volumes/20240930/traces'
}).result()

ds_traces.schema

In [None]:
ds_traces

As described in [the manuscript](https://openreview.net/pdf?id=oCHsDpyawq), the experiment is subdivided into multiple conditions. Using `zapbench.data_utils` we can get the per-condition bounds for indexing the trace matrix:

In [None]:
from zapbench import constants
from zapbench import data_utils

# Print the indexing bounds per condition.
# Note that we keep a minimal amount of "padding" between conditions.
for condition_id, condition_name in enumerate(constants.CONDITION_NAMES):
  inclusive_min, exclusive_max = data_utils.get_condition_bounds(condition_id)
  print(f'{condition_name} has bounds [{inclusive_min}, {exclusive_max}).')

In [None]:
constants

Using these bounds, we can get traces for any given condition, e.g.:

In [None]:
condition_name = 'turning'

# Use the bounds to plot the traces of one of the conditions.
inclusive_min, exclusive_max = data_utils.get_condition_bounds(
    constants.CONDITION_NAMES.index(condition_name))
traces_condition = ds_traces[inclusive_min:exclusive_max, :].read().result()

# Plot traces.
fig = plt.figure(figsize=(12, 12))
plt.title(f'traces for {condition_name} condition')
im = plt.imshow(traces_condition.T, aspect="auto")
plt.xlabel('timestep')
plt.ylabel('neuron')
cbar = fig.colorbar(im)
cbar.set_label("normalized activity (df/f)")
plt.show();

# For training and testing, we will want to further adjust these bounds for
# splits, see `help(data_utils.adjust_condition_bounds_for_split)`.
# As this is covered in other notebooks, we will not do this here.