# Download the public datasets

To get a feel for the structure of the data we recommended first downloading the alf data for a single repeated site session and exploring how the data is stored locally on disk. An example alf folder can be downloaded from [here](https://ibl-brain-wide-map-public.s3.amazonaws.com/sample_data/mainenlab/Subjects/ZM_2241/2020-01-30/001/alf_ZM2241_2020-01-30_001.zip). Documentation explaining the data structure can be found [here](https://int-brain-lab.github.io/iblenv/notebooks_external/data_structure.html).

In the following sections, we explain how to use the [ONE-api](https://int-brain-lab.github.io/ONE/) to search for and download datasets for any session released. Using the ONE-api is the recommended method to browse through and download available datasets.

## Installation
### Environment
To use IBL data you will need a python environment with python > 3.7. To create a new environment from scratch you can install [anaconda](https://www.anaconda.com/products/distribution#download-section) and follow the instructions below to create a new python environment (more information can also be found [here](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html))

```
conda create --name ibl python=3.9
```
Make sure to always activate this environment before installing or working with the IBL data
```
conda activate ibl
```

### Install packages

To use IBL data you will need to install the ONE-api package. We also recommend installing ibllib. These can be installed via pip.
```python
pip install ONE-api
pip install ibllib
```

### Setting up credentials
Credentials can be setup in a python terminal in the following way

In [None]:
from one.api import ONE
pw = 'international'
one = ONE(base_url='https://openalyx.internationalbrainlab.org', password=pw, silent=True)

## Explore and download data using the ONE-api

### Useful links
To get a good understanding of the ONE-api and the various methods available we recommend working through these tutorials.

* [ONE quickstart](https://int-brain-lab.github.io/iblenv/notebooks_external/one_quickstart.html)
* [Searching with ONE](https://int-brain-lab.github.io/ONE/notebooks/one_search/one_search.html)
* [Listing with ONE](https://int-brain-lab.github.io/ONE/notebooks/one_list/one_list.html)
* [Loading with ONE](https://int-brain-lab.github.io/ONE/notebooks/one_load/one_load.html)
* [ONE alyx rest API](https://int-brain-lab.github.io/ONE/notebooks/one_advanced/one_advanced.html)(advanced)

Quick-start examples are given below.

### Launch the ONE-api
Prior to do any searching / downloading, you need to instantiate ONE :

In [None]:
from one.api import ONE
one = ONE(base_url='https://openalyx.internationalbrainlab.org')

### List all sessions available
Once ONE is instantiated, you can use the REST ONE-api to list all sessions publicly available:

In [None]:
sessions = one.alyx.rest('sessions', 'list')

Each session is given a unique identifier (EID); this EID is what you will use to download data for a given session:

In [None]:
# Take the first session
example_sess = sessions[0]
# Each session has a unique experiment id
eid = example_sess['id']

### Find a session that has a dataset of interest
Not all sessions will have all the datasets available. As such, it may be important for you to filter and search for only sessions with particular datasets of interest. The detailed list of datasets can be found in this [document](https://docs.google.com/document/d/1OqIqqakPakHXRAwceYLwFY9gOrm8_P62XIfCTnHwstg/edit#).

In the example below, we want to find all sessions that have `spikes.times` data:

In [None]:
# Find sessions that have spikes.times datasets
sessions_with_spikes = one.alyx.rest('sessions', 'list', dataset_types='spikes.times')

### Find data associated with a release or publication
Datasets are often associated to a publication, and are tagged as such to facilitate reproducibility of analysis. You can list all tags and their associated publications like this:

In [None]:
# List and print all tags in the public database
tags = {t['name']: t['description'] for t in one.alyx.rest('tags', 'list') if t['public']}
for key, value in tags.items():
    print(f"{key}\n{value}\n")

The sessions associated with a given tag can be listed in the following way (here taking the example of the Reproducible ephys paper):

If you are only interested in data with a specific tag, the cleanest approach is to follow [these instructions](ttps://int-brain-lab.github.io/ONE/FAQ.html#how-do-i-download-the-datasets-cache-for-a-specific-ibl-paper-release) to work with a tag-specific cache table.

You can also use the tag to filter when browsing the full public database:

In [None]:
# Find sessions that have data and are tagged for the repeated site paper
sessions_rep_site = one.alyx.rest('sessions', 'list', dataset_types='spikes.times', tag='2022_Q2_IBL_et_al_RepeatedSite')

### Downloading data using the ONE-api
Once sessions of interest are identified with the unique identifier (EID), we can download all files in the **alf** collection:

In [None]:
# Download all data in alf collection
files = one.load_collection(eid, 'alf', download_only=True)

# Show where files have been downloaded to
print(f'Files downloaded to {files[0].parent}')

To download the spikesorting data we need to find out which probe label (`probeXX`) was used for this session. This can be done by finding the probe insertion associated with this session

In [None]:
insertion = one.alyx.rest('insertions', 'list', session=eid)[0]
probe_label = insertion['name']
files = one.load_collection(eid, f'alf/{probe_label}/pykilosort', download_only=True)

# Show where files have been downloaded to
print(f'Files downloaded to {files[0].parent}')

To load in the data we can use some of the following loading methods

In [None]:
# Load in all trials datasets
trials = one.load_object(eid, 'trials', collection='alf')

# Load in a single wheel dataset
wheel_times = one.load_dataset(eid, '_ibl_wheel.timestamps.npy')

### Loading different objects
Examples for loading different objects can be found in the following tutorials [here](https://int-brain-lab.github.io/iblenv/loading_examples.html)

### Advanced examples
#### Example 1: Searching for sessions from a specific lab
Let's imagine you are interested in obtaining the data from a given lab, that was part of the Reproducible Ephys data release.
If you want to use data associated to a given lab only, you could simply query for the whole dataset as shown above, and filter `sessions_rep_site` for the key "lab" of a given value, for example:

In [None]:
lab_name = 'mrsicflogellab'
sessions_lab = [item for item in sessions_rep_site if item['lab'] == lab_name]

However, if you wanted to query only the data for a given lab, it might be most judicious to first
know the list of all labs available, select an arbitrary lab name from it, and query the specific sessions from it.

To get this list, use [one.alyx.rest](https://openalyx.internationalbrainlab.org/docs/#labs-list)

In [None]:
# List of labs (and all metadata information associated)
labs = one.alyx.rest('labs', 'list',
                     django='session__data_dataset_session_related__tags__name,2022_Q2_IBL_et_al_RepeatedSite')
# Note the change in the django filter compared to searching over 'sessions'

# Example lab name
lab_name = labs[0]['name']  # e.g. 'mrsicflogellab'

# Searching for RS sessions with specific lab name
sessions_lab = one.alyx.rest('sessions', 'list', dataset_types='spikes.times', lab=lab_name,
                             tag='2022_Q2_IBL_et_al_RepeatedSite')