Research datasets in Brain Imaging Data Structure (BIDS) format, hosted on GitHub + AWS S3 infrastructure.
This organization hosts BIDS-formatted datasets that cannot be hosted on public repositories (OpenNeuro, Zenodo, etc.) due to restrictive licenses, while remaining freely available for research use.
Hosting criteria:
- ✅ BIDS-compliant format
- ✅ Freely available for academic/research use
- ✅ Restrictive license preventing hosting on public repositories (e.g., non-commercial, research-only)
Dataset | ID | DOI | Size | Modality | Description |
---|---|---|---|---|---|
HBN-EEG NC | nm000103 | 10.5281/zenodo.17306881 | 270 GB | EEG | Healthy Brain Network EEG, Non-commercial |
emg2qwerty | nm000104 | 10.5281/zenodo.17287904 | 149 GB | EMG | Typing task sEMG dataset |
discrete_gestures | nm000105 | 10.5281/zenodo.17283594 | 14 GB | EMG | Hand gesture recognition |
handwriting | nm000106 | 10.5281/zenodo.17283866 | 30 GB | EMG | Handwriting sEMG dataset |
wrist | nm000107 | 10.5281/zenodo.17282508 | 1.9 GB | EMG | Wrist control sEMG dataset |
DataLad enables efficient access to large datasets stored across GitHub (metadata) and S3 (data files).
# Install DataLad (macOS)
brew install datalad
# Clone dataset (lightweight - only downloads metadata)
datalad clone https://github.com/nemarDatasets/nm000107.git
cd nm000107
# Download specific files
datalad get sub-01/emg/sub-01_task-wrist_emg.edf
# Download all data
datalad get .
# Remove data files (keep metadata)
datalad drop .
# Clone repository (metadata only, no large files)
git clone https://github.com/nemarDatasets/nm000107.git
cd nm000107
# View S3 URLs for data files
cat .git/annex/objects/.../...
Large binary files (.edf, .bdf) are stored on S3 with public read access:
# List dataset files
aws s3 ls s3://nemar/nm000107/ --recursive --no-sign-request
# Download specific file
aws s3 cp s3://nemar/nm000107/path/to/file.edf . --no-sign-request
Found incorrect metadata, missing files, or BIDS compliance issues?
- Go to the dataset repository (e.g.,
nm000107
) - Click Issues → New Issue
- Describe the problem with:
- File path or subject ID
- Expected vs actual behavior
- BIDS validator output (if applicable)
For metadata corrections (JSON, TSV, README):
- Fork the dataset repository
- Clone your fork locally
- Make changes to metadata files
- Commit with clear message:
fix: correct participant age in participants.tsv
- Push to your fork
- Open Pull Request with description of changes
For data file issues:
- File Issues only (data files are immutable annexes)
- Corrections will be released as new dataset versions
Datasets use semantic versioning (v1.0.0
, v1.1.0
, etc.):
- Patch (v1.0.1): Metadata fixes, documentation updates
- Minor (v1.1.0): New participants, additional sessions
- Major (v2.0.0): Breaking changes, restructuring
Each version gets:
- Git tag
- GitHub release
- Zenodo DOI (versioned)
Each dataset has its own license specified in dataset_description.json
and root LICENSE
file. Common restrictions:
- ✅ Academic/research use
- ❌ Commercial use
- ❌ Redistribution without attribution
- ❌ Public repository hosting (e.g., OpenNeuro)
Always check the dataset's LICENSE
file before use.
Infrastructure:
- GitHub: Metadata (JSON, TSV, README) + DataLad/git-annex pointers
- AWS S3: Binary data files (EMG recordings)
- Zenodo: DOI registration + archived releases
BIDS Validation:
- Datasets pass basic BIDS checks (required files, structure)
- Full validator compliance is work in progress
Data Access:
- S3 public read access (no AWS account needed)
- No rate limiting on downloads
- Free egress for research use
- Issues: Use repository-specific issue trackers
- General questions: Open discussion in
.github
repository - New dataset submissions: Contact dataset maintainers
Hosted by NEMAR (NeuroElectroMagnetic Archive) infrastructure