Skip to content

gbell27/genomebundle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

genomebundle

genomebundle bundles metadata and files for Earth BioGenome Project (EBP) genome assemblies into a single, reproducible package.

The core idea is simple: when you use a genome assembly in your research, you should be able to document exactly what you downloaded, when, and from where — with checksums. genomebundle does this by aggregating data from three sources into a machine-readable manifest.json:

  • GoaT (Genomes on a Tree) — taxonomy and cross-references
  • NCBI Datasets — assembly statistics and FTP file URLs
  • BlobToolKit — BUSCO completeness results (assembly quality metrics)

This makes it easier to cite the data precisely and to keep pipelines reproducible across time.

Installation

pip install genomebundle

Basic CLI usage

# Download FASTA and GFF
genomebundle fetch GCF_040938575.1 --files fasta,gff

# Download all associated files
genomebundle fetch GCF_040938575.1 --files all

# Build manifest only (no download)
genomebundle fetch GCF_040938575.1 --no-download

# Verify checksums of a downloaded bundle
genomebundle verify ./GCF_040938575.1/

# Print manifest of an existing bundle
genomebundle show ./GCF_040938575.1/

Python API

from genomebundle import fetch_assembly, fetch_assembly_report, fetch_busco

goat = fetch_assembly("GCF_040938575.1")
ncbi = fetch_assembly_report("GCF_040938575.1")
btk  = fetch_busco("GCF_040938575.1")

Output

Each bundle contains:

  • manifest.json — machine-readable, includes SHA256 checksums and source URLs
  • README.txt — human-readable summary
  • downloaded files (optional)

References

License

MIT

About

Reproducible bundles for EBP genome assemblies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages