This repository contains the API used to manage the PanGBank database, which stores collections of pangenomes built with PPanGGOLiN.
The API is built with FastAPI and uses SQLModel as its ORM.
It provides a RESTful interface for querying and exploring pangenome collections. Alongside the API, a command-line tool pangbank_db is included to manage the database.
-
Clone the repository:
git clone https://github.com/labgem/PanGBank-api.git cd PanGBank-api -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate pip install .
-
Run the API in development mode:
export PANGBANK_DB_PATH="<path/to/database.sqlite>" export PANGBANK_DATA_DIR="<path/to/pangenome_directory>" fastapi dev pangbank_api/main.py
PANGBANK_DB_PATHis the path to your SQLite database file.PANGBANK_DATA_DIRis the root directory containing your pangenome data and mash files.
All CLI commands require the PANGBANK_DB_PATH environment variable to be set.
export PANGBANK_DB_PATH="<path/to/database.sqlite>"To add a new collection of pangenomes in the database, use:
pangbank_db add-collection-release <collection_release.json>Note
This command requires two environment variables:
export PANGBANK_DB_PATH="<path/to/database.sqlite>"
export PANGBANK_DATA_DIR="<root/path/serving/pangenomes>"JSON Schema Example
- Paths for
pangenomes_directoryandmash_sketchmust be relative toPANGBANK_DATA_DIR. - Paths for
taxonomy.file,genome_sources[*].file, andgenome_metadata_sources[*].filemust be absolute file paths.
pangbank_db list-collectionpangbank_db delete-collection <collection_name> --release-version <version>We use Alembic to manage schema changes in the PanGBank database.
Generate a migration after updating your SQLModel models (e.g., adding or changing columns):
alembic revision --autogenerate -m "Describe your change here"This applies all pending migrations:
alembic upgrade headIf something went wrong, you can revert the last migration:
alembic downgrade -1Or go back to the base (empty schema):
alembic downgrade baseNote
- The SQLite database path is defined in
config.pyvia thepangbank_db_pathsetting (PANGBANK_DB_PATHenv var). - Alembic is configured to read this dynamically, so no need to change
alembic.ini.
- Fork the repository.
- Create a feature branch (
git checkout -b feature-name). - Commit your changes (
git commit -m 'Add new feature'). - Push to the branch (
git push origin feature-name). - Open a pull request.
For any inquiries or issues, open an issue on the GitHub repository.
{ "collection": { "name": "GTDB_all_sampled", "description": "GTDB all is a collection of pangenomes made of GTDB species that have at least 15 genomes." }, "release": { "version": "1.0.0", "ppanggolin_version": "2.2.4", "pangbank_wf_version": "0.0.2", "pangenomes_directory": "GTDB_refseq/release_v1.0.0/data/pangenomes/", // relative to PANGBANK_DATA_DIR "release_note": "", "date": "2025-07-10", "mash_sketch": "GTDB_refseq/release_v1.0.0/data/mash_sketch/families_persistent_all.msh", // relative to PANGBANK_DATA_DIR "mash_version": "2.3" }, "taxonomy": { "name": "GTDB", "version": "10-RS226", "ranks": "Domain; Phylum; Class; Order; Family; Genus; Species", "file": "/absolute/path/to/taxonomy.tsv" }, "genome_sources": [ { "name": "RefSeq", "file": "/absolute/path/to/genomes.tsv", "version": "", "description": "", "source": "", "url": "" } ], "genome_metadata_sources": [ { "name": "GTDB 10-RS226 metadata", "description": "Metadata collected from GTDB. Some columns have been filtered out.", "url": "https://data.ace.uq.edu.au/public/gtdb/data/releases/release226/226.0/", "strain_attribute": "ncbi_strain_identifiers", "organism_name_attribute": "ncbi_organism_name", "file": "/absolute/path/to/metadata.tsv" } ] }