# Accessing the CA3 dataset with CAVE

This tutorial provides a high-level overview for how to access CA3 dataset through CAVE. CAVE is the [connectome annotation versioning engine](https://doi.org/10.1038/s41592-024-02426-z), a service infrastructure for managing connectomics datasets and is hosted in the cloud for broad access. CAVE supports proofreading of datasets and their analysis even while proofreading is ongoing.

# CAVEclient and setup

The CAVEclient is a python library that facilitates communication with a CAVE system. It can be install with 

`pip install caveclient`

To install the caveclient when running this in a colab notebook, run:

In [1]:
!pip install caveclient
!pip install seaborn

and imported like so:

In [1]:
import caveclient
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

## CAVE account setup

Each and every user needs to create a CAVE account and download a user token to access CAVE's services programmatically. CA3 data is publicly available which means that no extra permissions need to be given to a new user account to access the data.

A Google account (or Google-enabled account) is required to create a CAVE account.

#### Start here if you do not have a CAVE account or are not sure

Login to CAVE to setup a new account. To do this go to this [website](https://prod.flywire-daf.com/materialize/views/datastack/flywire_fafb_public).

#### Once you have an account: Setup your token

Create a new token by running the next cell. Then, copy the token and insert it into the argument of the following cell. These two cells should be redone together to make sure that the correct token is stored on your machine. You can copy your token and store on as many machines as you like. If you think your token has been compromised just reset it but rerunning the following cell.

In [3]:
client = caveclient.CAVEclient()
client.auth.setup_token(make_new=True)

### Set or save your token

From the website that just opened up, paste your token here:

In [4]:
my_token = "your token goes here"

If you are running this on your local machine or on a server you can (optionally) store the token on your machine. This makes future uses easier.

In [5]:
# This might not work and that is okay
client.auth.save_token(token=my_token, overwrite=True)

## Initialize CAVEclient with a datastack

Datasets in CAVE are organized as datastacks. These are a combination of an EM dataset, a segmentation and a set of annotations. The datastack for public release is `zheng_ca3`. When you instantiate your client with this datastack, it loads all relevant information to access it.

In [2]:
datastack_name = "zheng_ca3"
client = caveclient.CAVEclient(datastack_name)

# One can pass the token directly to the client:
# client = caveclient.CAVEclient(datastack_name, auth_token=my_token)

## Materialization versions

Data in CAVE is timestamped and periodically versioned - each (materialization) version corresponds to a specific timestamp. Individual versions are made publicly available. The materialization service provides annotation queries to the dataset. It is available under `client.materialize`. 

Currently the following versions are publicly available:

In [3]:
client.materialize.get_versions()

[1, 195, 332, 333]

And these are their associated timestamps (all timestamps are in UTC):

In [4]:
for version in client.materialize.get_versions():
    print(f"Version {version}: {client.materialize.get_timestamp(version)}")

Version 1: 2024-08-16 03:02:50.468339+00:00
Version 195: 2025-02-26 10:10:01.468822+00:00
Version 332: 2025-07-19 13:10:01.305123+00:00
Version 333: 2025-07-20 13:10:01.304323+00:00


The client will automatically query the latest materialization version. You can specify a `materialization_version` for every query if you want to access a specific version.

In [5]:
client.materialize.get_tables()

['synapses_ca3_v1', 'ca3_cell_type', 'ca3_cell_id', 'c3_nuclei_v1']