genomebundle

genomebundle bundles metadata and files for Earth BioGenome Project (EBP) genome assemblies into a single, reproducible package.

The core idea is simple: when you use a genome assembly in your research, you should be able to document exactly what you downloaded, when, and from where — with checksums. genomebundle does this by aggregating data from three sources into a machine-readable manifest.json:

GoaT (Genomes on a Tree) — taxonomy and cross-references
NCBI Datasets — assembly statistics and FTP file URLs
BlobToolKit — BUSCO completeness results (assembly quality metrics)

This makes it easier to cite the data precisely and to keep pipelines reproducible across time.

Installation

pip install genomebundle

Basic CLI usage

# Download FASTA and GFF
genomebundle fetch GCF_040938575.1 --files fasta,gff

# Download all associated files
genomebundle fetch GCF_040938575.1 --files all

# Build manifest only (no download)
genomebundle fetch GCF_040938575.1 --no-download

# Verify checksums of a downloaded bundle
genomebundle verify ./GCF_040938575.1/

# Print manifest of an existing bundle
genomebundle show ./GCF_040938575.1/

Python API

from genomebundle import fetch_assembly, fetch_assembly_report, fetch_busco

goat = fetch_assembly("GCF_040938575.1")
ncbi = fetch_assembly_report("GCF_040938575.1")
btk  = fetch_busco("GCF_040938575.1")

Output

Each bundle contains:

manifest.json — machine-readable, includes SHA256 checksums and source URLs
README.txt — human-readable summary
downloaded files (optional)

References

Challis et al. (2023). GoaT: Genomes on a Tree. Wellcome Open Research. https://doi.org/10.12688/wellcomeopenres.18658.1
Byrd et al. (2024). Best practices for genetic and genomic data archiving. Nature Ecology & Evolution. https://doi.org/10.1038/s41559-024-02423-7
Dainat et al. (2025). Guidelines for gene and genome assembly nomenclature. Genetics. https://doi.org/10.1093/genetics/iyaf006

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
src/genomebundle		src/genomebundle
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

genomebundle

Installation

Basic CLI usage

Python API

Output

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

genomebundle

Installation

Basic CLI usage

Python API

Output

References

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages