# Download full recordings

author: laquitainesteeve@gmail.com

purpose: download all full recordings from Dandi Archive

Description:
* model: biophysical simulation
* duration: 34 min
* size: 100 GB
* layers: 6 layers: L1, L2/3, L4, L5, L6
* noise: background noise fitted to Marques-Smith
* network state: spontaneous

Execution time: 3h40 just for 30 min of 384 channels
* note: writing speed: maximize chunk_size, set to n_jobs=20. This maximizes speed while avoiding overhead [1].

Hardware: CPU

Tested on: 32 cores, 2TB storage, 188GB RAM Ubuntu machine




# Setup 

Activate virtual environment (envs/spikebias.yml)

```bash
python -m ipykernel install --user --name dandi --display-name "dandi"
```

In [1]:
%%time 

# import python packages
import os
import numpy as np
from time import time
from dandi.dandiapi import DandiAPIClient
import spikeinterface.extractors as se
import spikeinterface.sorters as ss
import spikeinterface
import uuid
from datetime import datetime
from dateutil.tz import tzlocal
from dandi.dandiapi import DandiAPIClient
import spikeinterface.extractors as se
print("spikeinterface", spikeinterface.__version__)

# set the project path
PROJ_PATH = "/home/steeve/steeve/epfl/code/spikebias"

# set the raw dataset path
RAW_DATASET = os.path.join(PROJ_PATH, "dataset/00_raw/")

spikeinterface 0.101.2
CPU times: user 1.02 s, sys: 1.89 s, total: 2.92 s
Wall time: 370 ms


  from .autonotebook import tqdm as notebook_tqdm


## Custom functions

In [3]:
class DataLoader:
    """Data loader for dandi datasets
    """
    def __init__(self, raw_dataset_path:str, dandiset_id:str, filepath:str, is_recording=True, is_sorting=True):
        self.raw_dataset_path = raw_dataset_path
        self.dandiset_id = dandiset_id
        self.filepath = filepath
        self.is_recording = is_recording
        self.is_sorting = is_sorting        
        self.recording = None
        self.sorting = None
        self.s3_path = None

    def load_data(self):
        
        # Get the file path on S3
        with DandiAPIClient() as client:
            asset = client.get_dandiset(self.dandiset_id, 'draft').get_asset_by_path(self.filepath)
            self.s3_path = asset.get_content_url(follow_redirects=1, strip_query=True)
        print("s3_path:", self.s3_path)

        # Get RecordingExtractor and SortingExtractor
        if self.is_recording:
            self.recording = se.NwbRecordingExtractor(file_path=self.s3_path, stream_mode="remfile")
        if self.is_sorting:
            self.sorting = se.NwbSortingExtractor(file_path=self.s3_path, stream_mode="remfile")

        # Report
        print('\nDownloaded recording:', self.recording)
        print('\nDownloaded sorting:', self.sorting)

    def save_data(self, recording_folder:str, sorting_folder:str, n_jobs=30, chunk_size=800000, dtype='float32', duration_secs=None):
        if self.is_recording:
            if duration_secs: 
                self.recording = self.recording.frame_slice(start_frame=0, end_frame=self.recording.sampling_frequency*duration_secs)
            self.recording.save(folder=recording_folder, n_jobs=n_jobs, verbose=True, progress_bar=True, overwrite=True, dtype=dtype, chunk_size=chunk_size)                

        if self.is_sorting:
            if duration_secs: 
                self.sorting = self.sorting.frame_slice(start_frame=0, end_frame=self.recording.sampling_frequency*duration_secs)
            self.sorting.save(folder=sorting_folder, progress_bar=True, overwrite=True)
        
        # Report
        print('\nSaved recording:', self.recording)
        print('\nSaved sorting:', self.sorting)            

## NPX spont biophy

In [None]:
%%time

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-001-fitted/sub-001-fitted_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_npx_spont")
sorting_folder = os.path.join(RAW_DATASET, "sorting_npx_spont")

# download and save dataset (3h40)
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/0c3/2c4/0c32c475-1251-485b-9934-83667b3ba4ba

 NwbRecordingExtractor: 384 channels - 40.0kHz - 1 segments - 82,319,958 samples 
                       2,058.00s (34.30 minutes) - float32 dtype - 117.76 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/0c3/2c4/0c32c475-1251-485b-9934-83667b3ba4ba

 NwbSortingExtractor: 1388 units - 1 segments - 40.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/0c3/2c4/0c32c475-1251-485b-9934-83667b3ba4ba
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=1.14 GiB - total_memory=34.33 GiB - chunk_duration=20.00s


write_binary_recording: 100%|██████████| 103/103 [3:33:22<00:00, 124.29s/it] 


CPU times: user 1.03 s, sys: 773 ms, total: 1.8 s
Wall time: 3h 34min 5s


## NPX evoked biophy

- Execution time: 14 min for 10 min recording

In [3]:
%%time

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-002-fitted/sub-002-fitted_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_npx_evoked")
sorting_folder = os.path.join(RAW_DATASET, "sorting_npx_evoked")
DURATION_SECS = 900 # 15 min

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32', duration_secs=DURATION_SECS) # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/9d6/6ed/9d66ed40-af31-43aa-b4ba-246d2206dcad

Downloaded recording: NwbRecordingExtractor: 384 channels - 20.0kHz - 1 segments - 72,359,964 samples 
                       3,618.00s (1.00 hours) - float32 dtype - 103.51 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/9d6/6ed/9d66ed40-af31-43aa-b4ba-246d2206dcad

Downloaded sorting: NwbSortingExtractor: 1836 units - 1 segments - 20.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/9d6/6ed/9d66ed40-af31-43aa-b4ba-246d2206dcad
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=1.14 GiB - total_memory=34.33 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 23/23 [25:01<00:00, 65.29s/it]   



Saved recording: FrameSliceRecording: 384 channels - 20.0kHz - 1 segments - 18,000,000 samples 
                     900.00s (15.00 minutes) - float32 dtype - 25.75 GiB

Saved sorting: FrameSliceSorting: 1836 units - 1 segments - 20.0kHz
CPU times: user 870 ms, sys: 379 ms, total: 1.25 s
Wall time: 25min 34s


In [10]:
import spikeinterface as si
Recording = si.load_extractor('/home/steeve/steeve/epfl/code/spikebias/dataset/00_raw/recording_npx_spont')


## Dense depth 1 biophy

In [None]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-003-fitted/sub-003-fitted_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_dense_probe1")
sorting_folder = os.path.join(RAW_DATASET, "sorting_dense_probe1")

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/dec/e65/dece6568-cee4-4ade-80bf-c1166a03fe2a

 NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 34,299,965 samples 
                       1,715.00s (28.58 minutes) - float32 dtype - 16.36 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/dec/e65/dece6568-cee4-4ade-80bf-c1166a03fe2a

 NwbSortingExtractor: 287 units - 1 segments - 20.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/dec/e65/dece6568-cee4-4ade-80bf-c1166a03fe2a
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=390.62 MiB - total_memory=11.44 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 43/43 [07:57<00:00, 11.10s/it]  


## Dense depth 2 biophy

In [None]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-004-fitted/sub-004-fitted_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_dense_probe2")
sorting_folder = os.path.join(RAW_DATASET, "sorting_dense_probe2")

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/eef/9e9/eef9e95c-fb5b-46d2-a24c-878d8170b5e0

 NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 23,519,976 samples 
                       1,176.00s (19.60 minutes) - float32 dtype - 11.22 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/eef/9e9/eef9e95c-fb5b-46d2-a24c-878d8170b5e0

 NwbSortingExtractor: 770 units - 1 segments - 20.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/eef/9e9/eef9e95c-fb5b-46d2-a24c-878d8170b5e0
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=390.62 MiB - total_memory=11.44 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 30/30 [04:30<00:00,  9.02s/it]  


CPU times: user 245 ms, sys: 241 ms, total: 486 ms
Wall time: 4min 37s


## Dense depth 3 biophy

In [None]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-005-fitted/sub-005-fitted_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_dense_probe3")
sorting_folder = os.path.join(RAW_DATASET, "sorting_dense_probe3")

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/ee2/816/ee2816de-d861-4b55-9cde-416a52e54049

 NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 35,279,964 samples 
                       1,764.00s (29.40 minutes) - float32 dtype - 16.82 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/ee2/816/ee2816de-d861-4b55-9cde-416a52e54049

 NwbSortingExtractor: 1123 units - 1 segments - 20.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/ee2/816/ee2816de-d861-4b55-9cde-416a52e54049
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=390.62 MiB - total_memory=11.44 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 45/45 [07:44<00:00, 10.32s/it]  


CPU times: user 313 ms, sys: 216 ms, total: 529 ms
Wall time: 7min 51s


## Marques-Smith

In [36]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-vivo-marques-smith/sub-vivo-marques-smith_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_marques_smith")
sorting_folder = None

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath, is_sorting=False)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/109/db6/109db6a7-500b-4e59-83ca-8422c27137cf

 NwbRecordingExtractor: 384 channels - 30.0kHz - 1 segments - 36,451,538 samples 
                       1,215.05s (20.25 minutes) - int16 dtype - 26.07 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/109/db6/109db6a7-500b-4e59-83ca-8422c27137cf

 None
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=585.94 MiB - total_memory=17.17 GiB - chunk_duration=26.67s


write_binary_recording: 100%|██████████| 46/46 [05:54<00:00,  7.70s/it]  


CPU times: user 99 ms, sys: 189 ms, total: 288 ms
Wall time: 5min 57s


## Horvath depth 1

In [35]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-vivo-horvath-depth-1/sub-vivo-horvath-depth-1_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_horvath_probe1")
sorting_folder = None

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath, is_sorting=False)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/10a/c1a/10ac1a40-6918-4eea-b80c-d887bad92ae9

 NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 72,131,040 samples 
                       3,606.55s (1.00 hours) - int16 dtype - 17.20 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/10a/c1a/10ac1a40-6918-4eea-b80c-d887bad92ae9

 None
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=195.31 MiB - total_memory=5.72 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 91/91 [07:46<00:00,  5.13s/it]  


CPU times: user 118 ms, sys: 203 ms, total: 321 ms
Wall time: 7min 49s


## Horvath depth 2

In [37]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-vivo-horvath-depth-2/sub-vivo-horvath-depth-2_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_horvath_probe2")
sorting_folder = None

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath, is_sorting=False)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/4d2/875/4d2875ba-d83b-44d5-9036-74f42e19e8a0

 NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 73,773,360 samples 
                       3,688.67s (1.02 hours) - int16 dtype - 17.59 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/4d2/875/4d2875ba-d83b-44d5-9036-74f42e19e8a0

 None
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=195.31 MiB - total_memory=5.72 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 93/93 [07:55<00:00,  5.11s/it]  


CPU times: user 116 ms, sys: 173 ms, total: 290 ms
Wall time: 7min 58s


## Horvath depth 3

In [38]:
%%time 

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-vivo-horvath-depth-3/sub-vivo-horvath-depth-3_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_horvath_probe3")
sorting_folder = None

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath, is_sorting=False)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/14a/205/14a205ea-306f-47fd-97e8-284a6f00626b

 NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 72,061,920 samples 
                       3,603.10s (1.00 hours) - int16 dtype - 17.18 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/14a/205/14a205ea-306f-47fd-97e8-284a6f00626b

 None
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=195.31 MiB - total_memory=5.72 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 91/91 [07:38<00:00,  5.04s/it]  


CPU times: user 108 ms, sys: 167 ms, total: 275 ms
Wall time: 7min 40s


### Get raw data


In [None]:
%%time 

# set dataset parameters
dandiset_id = '000034'
filepath = 'sub-MEAREC-250neuron-Neuropixels/sub-MEAREC-250neuron-Neuropixels_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_buccino")
sorting_folder = os.path.join(RAW_DATASET, "sorting_buccino")

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/465/9c0/4659c033-b336-4b8b-b947-77434cebf494

 NwbRecordingExtractor: 384 channels - 32.0kHz - 1 segments - 19,200,000 samples 
                       600.00s (10.00 minutes) - float32 dtype - 27.47 GiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/465/9c0/4659c033-b336-4b8b-b947-77434cebf494

 NwbSortingExtractor: 250 units - 1 segments - 32.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/465/9c0/4659c033-b336-4b8b-b947-77434cebf494
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=1.14 GiB - total_memory=34.33 GiB - chunk_duration=25.00s


write_binary_recording: 100%|██████████| 24/24 [10:01:15<00:00, 1503.14s/it]   


CPU times: user 1.52 s, sys: 448 ms, total: 1.96 s
Wall time: 10h 1min 27s


## Single-cell isolated traces with Reyes probe 

- 40 second recording
- Execution time: 4 min
- size: 390.62 MB 

In [4]:
%%time

# set dataset parameters
dandiset_id = '001250'
filepath = 'sub-biophy-isolated-traces-reyes/sub-biophy-isolated-traces-reyes_ses-006_ecephys.nwb'
recording_folder = os.path.join(RAW_DATASET, "recording_reyes_isolated_traces")
sorting_folder = os.path.join(RAW_DATASET, "sorting_reyes_isolated_traces")

# download and save dataset
data_loader = DataLoader(raw_dataset_path=RAW_DATASET, dandiset_id=dandiset_id, filepath=filepath)
data_loader.load_data() # Load the data
data_loader.save_data(recording_folder=recording_folder, sorting_folder=sorting_folder, n_jobs=30, chunk_size=800000, dtype='float32') # save

s3_path: https://dandiarchive.s3.amazonaws.com/blobs/2f3/edc/2f3edc58-b09e-4571-bb1c-1ffb2f768b57

Downloaded recording: NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 799,900 samples - 39.99s 
                       float32 dtype - 390.58 MiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/2f3/edc/2f3edc58-b09e-4571-bb1c-1ffb2f768b57

Downloaded sorting: NwbSortingExtractor: 1 units - 1 segments - 20.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/2f3/edc/2f3edc58-b09e-4571-bb1c-1ffb2f768b57
write_binary_recording 
n_jobs=30 - samples_per_chunk=800,000 - chunk_memory=390.62 MiB - total_memory=11.44 GiB - chunk_duration=40.00s


write_binary_recording: 100%|██████████| 1/1 [04:27<00:00, 267.91s/it]


Saved recording: NwbRecordingExtractor: 128 channels - 20.0kHz - 1 segments - 799,900 samples - 39.99s 
                       float32 dtype - 390.58 MiB
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/2f3/edc/2f3edc58-b09e-4571-bb1c-1ffb2f768b57

Saved sorting: NwbSortingExtractor: 1 units - 1 segments - 20.0kHz
  file_path: https://dandiarchive.s3.amazonaws.com/blobs/2f3/edc/2f3edc58-b09e-4571-bb1c-1ffb2f768b57
CPU times: user 129 ms, sys: 33.8 ms, total: 163 ms
Wall time: 4min 33s





# References

[1] https://github.com/SpikeInterface/spikeinterface/issues/3252
* effect of n_jobs and chunk_size on writing speed: