Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
"sphinx_gallery.gen_gallery",
"sphinx_sitemap",
"sphinx_copybutton",
"sphinx.ext.graphviz",
]

templates_path = ["_templates"]
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ The archive is currently still in :bdg-danger:`beta testing` mode, so be kind.
:hidden:

Installing <install/install>
User Guide <user_guide>
API <api/api>
Dataset Catalog <dataset_summary>
Examples <generated/auto_examples/index>
202 changes: 202 additions & 0 deletions docs/source/user_guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
.. _user_guide:

User Guide
==========

This guide provides a comprehensive overview of the ``eegdash`` library, focusing on its core data access object, ``EEGDashDataset``. You will learn how to use this object to find, access, and manage EEG data for your research and analysis tasks.

The EEGDash Object
------------------

While :class:`~eegdash.EEGDashDataset` is the main tool for loading data for machine learning, the :class:`~eegdash.EEGDash` object provides a lower-level interface for directly interacting with the metadata database. It is useful for exploring the available data, performing complex queries, or managing metadata records.

Initializing ``EEGDash``
~~~~~~~~~~~~~~~~~~~~~~~~

You can create a client to connect to the public database like this:

.. code-block:: python

from eegdash import EEGDash

# Connect to the public database
eegdash = EEGDash()

Finding Records
~~~~~~~~~~~~~~~

The ``find()`` method allows you to query the database for records matching specific criteria. You can pass keyword arguments for simple filters or a full MongoDB query dictionary for more advanced searches.

.. code-block:: python

# Find records for a specific dataset and subject
records = eegdash.find(dataset="ds002718", subject="012")
print(f"Found {len(records)} records.")

# You can also use more complex queries
query = {"dataset": "ds002718", "subject": {"$in": ["012", "013"]}}
records_advanced = eegdash.find(query)
print(f"Found {len(records_advanced)} records with advanced query.")

``EEGDash`` vs. ``EEGDashDataset``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's important to understand the distinction between these two objects:

- **EEGDash**: Use this for querying and managing metadata. It returns a list of dictionaries, where each dictionary is a record from the database.
- **EEGDashDataset**: Use this when you need to load EEG data for analysis or machine learning. It returns a PyTorch-compatible dataset object where each item can load the actual EEG signal.

In general, you will use ``EEGDashDataset`` for most of your data loading needs.

The EEGDashDataset Object
-------------------------

The :class:`~eegdash.EEGDashDataset` is the primary entry point for working with EEG recordings in ``eegdash``. It acts as a high-level interface that allows you to query a metadata database and load corresponding EEG data, either from a remote source or from a local cache.

Initializing ``EEGDashDataset``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To get started, you need to create an instance of ``EEGDashDataset``. The two most important parameters are ``cache_dir`` and ``dataset``.

- ``cache_dir``: This is the local directory where ``eegdash`` will store downloaded data.
- ``dataset``: The identifier of the dataset you want to work with (e.g., ``"ds002718"``).

Here's a basic example of how to initialize the dataset:

.. code-block:: python

from eegdash import EEGDashDataset

# Initialize the dataset for ds002718
dataset = EEGDashDataset(
cache_dir="./eeg_data",
dataset="ds002718",
)

print(f"Found {len(dataset)} recordings in the dataset.")

This will create a dataset object containing all recordings from ``ds002718``. The data files will be downloaded to the ``./eeg_data/ds002718/`` directory when accessed.

Querying for Specific Data
--------------------------

``EEGDashDataset`` offers powerful filtering capabilities, allowing you to select a subset of recordings based on various criteria. You can filter by task, subject, session, or run.

Filtering by Task
~~~~~~~~~~~~~~~~~

You can easily select recordings associated with a specific experimental task. For example, to get all resting-state recordings:

.. code-block:: python

# Filter by a single task
resting_state_dataset = EEGDashDataset(
cache_dir="./eeg_data",
dataset="ds002718",
task="RestingState"
)

print(f"Found {len(resting_state_dataset)} resting-state recordings.")

Filtering by Subject
~~~~~~~~~~~~~~~~~~~~

You can also filter the data to get recordings from one or more subjects.

.. code-block:: python

# Filter by a single subject
subject_dataset = EEGDashDataset(
cache_dir="./eeg_data",
dataset="ds002718",
subject="012"
)

print(f"Found {len(subject_dataset)} recordings for subject 012.")

# Filter by a list of subjects
multi_subject_dataset = EEGDashDataset(
cache_dir="./eeg_data",
dataset="ds002718",
subject=["012", "013", "014"]
)

print(f"Found {len(multi_subject_dataset)} recordings for subjects 012, 013, and 014.")


Combining Filters
~~~~~~~~~~~~~~~~~

You can combine multiple filters to create more specific queries. For instance, to get the resting-state recordings for a specific set of subjects:

.. code-block:: python

# Combine subject and task filters
combined_filter_dataset = EEGDashDataset(
cache_dir="./eeg_data",
dataset="ds002718",
subject=["012", "013"],
task="RestingState"
)

print(f"Found {len(combined_filter_dataset)} recordings matching the criteria.")

Advanced Querying with MongoDB Syntax
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For more complex queries, you can pass a MongoDB-style query dictionary directly using the ``query`` parameter. This allows for advanced filtering, such as using operators like ``$in``.

.. code-block:: python

# Use a MongoDB-style query
query = {
"dataset": "ds002718",
"subject": {"$in": ["012", "013"]},
"task": "RestingState"
}
advanced_dataset = EEGDashDataset(cache_dir="./eeg_data", query=query)

print(f"Found {len(advanced_dataset)} recordings using an advanced query.")


Working with Local Data (Offline Mode)
--------------------------------------

``eegdash`` also supports working with local data that you have already downloaded or manage separately. By setting ``download=False``, you can instruct ``EEGDashDataset`` to use local BIDS-compliant data instead of accessing the database or remote storage.

To use this feature, your data must be organized in a BIDS-like structure within your ``cache_dir``. For example, if your ``cache_dir`` is ``./eeg_data`` and your dataset is ``ds002718``, the files should be located at ``./eeg_data/ds002718/``.

Here is how to use ``EEGDashDataset`` in offline mode:

.. code-block:: python

# Initialize in offline mode
local_dataset = EEGDashDataset(
cache_dir="./eeg_data",
dataset="ds002718",
download=False
)

print(f"Found {len(local_dataset)} local recordings.")

When ``download=False``, ``eegdash`` will scan the specified directory for EEG files and construct the dataset from the local file system. This is useful for environments without internet access or when you want to work with your own curated datasets.

Accessing Data from the Dataset
-------------------------------

Once you have your ``EEGDashDataset`` object, you can access individual recordings as if it were a list. Each item in the dataset is an ``EEGDashBaseDataset`` object, which contains the metadata and methods to load the actual EEG data.

.. code-block:: python

if len(dataset) > 0:
# Get the first recording
recording = dataset[0]

# Load the EEG data as a raw MNE object
raw = recording.load()

print(f"Loaded recording for subject: {recording.description['subject']}")
print(f"Sampling frequency: {raw.info['sfreq']} Hz")
print(f"Number of channels: {len(raw.ch_names)}")

This provides a powerful and flexible way to integrate ``eegdash`` into your data analysis pipelines, whether you are working with remote or local data.
Loading
Loading