genomebundle bundles metadata and files for Earth BioGenome Project (EBP) genome assemblies into a single, reproducible package.
The core idea is simple: when you use a genome assembly in your research, you should be able to document exactly what you downloaded, when, and from where — with checksums. genomebundle does this by aggregating data from three sources into a machine-readable manifest.json:
- GoaT (Genomes on a Tree) — taxonomy and cross-references
- NCBI Datasets — assembly statistics and FTP file URLs
- BlobToolKit — BUSCO completeness results (assembly quality metrics)
This makes it easier to cite the data precisely and to keep pipelines reproducible across time.
pip install genomebundle# Download FASTA and GFF
genomebundle fetch GCF_040938575.1 --files fasta,gff
# Download all associated files
genomebundle fetch GCF_040938575.1 --files all
# Build manifest only (no download)
genomebundle fetch GCF_040938575.1 --no-download
# Verify checksums of a downloaded bundle
genomebundle verify ./GCF_040938575.1/
# Print manifest of an existing bundle
genomebundle show ./GCF_040938575.1/from genomebundle import fetch_assembly, fetch_assembly_report, fetch_busco
goat = fetch_assembly("GCF_040938575.1")
ncbi = fetch_assembly_report("GCF_040938575.1")
btk = fetch_busco("GCF_040938575.1")Each bundle contains:
manifest.json— machine-readable, includes SHA256 checksums and source URLsREADME.txt— human-readable summary- downloaded files (optional)
- Challis et al. (2023). GoaT: Genomes on a Tree. Wellcome Open Research. https://doi.org/10.12688/wellcomeopenres.18658.1
- Byrd et al. (2024). Best practices for genetic and genomic data archiving. Nature Ecology & Evolution. https://doi.org/10.1038/s41559-024-02423-7
- Dainat et al. (2025). Guidelines for gene and genome assembly nomenclature. Genetics. https://doi.org/10.1093/genetics/iyaf006
MIT