TaxonSampler is an open-source web platform that integrates taxonomic classifications from the Catalogue of Life (COL) with genome assembly metadata from the NCBI Datasets API.
| Feature | Description |
|---|---|
| Taxonomic Integration | Imports and reconciles species from COL/ChecklistBank with NCBI taxonomy (26+ ranks) |
| Genome Metadata | Assembly level, contig N50, scaffold count, genome size, GC content, annotation status |
| Interactive Visualization | D3.js hierarchical tree with breadcrumb navigation, search, and zoom/pan |
| Sampling Wizard | Three-step workflow: scope selection → quality filters → export |
| Multi-format Export | JSON, TXT, Newick (ETE3), or XLSX with genome metadata |
| Background Processing | Celery + Redis for async NCBI sync and batch sampling |
📚 Tutorials available at: /taxonomy/tutorials/
The application includes built-in documentation accessible from the About & Tutorials link in the sidebar:
| Tutorial | Description | Level |
|---|---|---|
| Getting Started | Navigation basics and interface overview | Beginner |
| Basic Quality Filtering | Filter genomes by assembly level and N50 | Beginner |
| Representative Sampling | Select N species per taxonomic group | Intermediate |
| Quality-First Strategy | Rank genomes by weighted quality score | Intermediate |
| Broad Coverage | Maximize phylogenetic diversity | Advanced |
| Research Workflow | Complete use case example (Coleoptera) | Advanced |
- Templates:
taxbridge/apps/taxonomy/templates/taxonomy/pages/about.html— Software information and methodologytutorials.html— Step-by-step sampling guides (accordion layout)report_issue.html— GitHub issue wizard
- CSS:
taxbridge/apps/taxonomy/static/taxonomy/css/about.css— About and tutorials stylesreport-issue.css— Issue wizard stylestree.css— Taxonomy tree visualization
- Python ≥ 3.13
- PostgreSQL ≥ 14
- Redis ≥ 7.0
# Clone and setup
git clone https://github.com/joanjir/taxonSampler.git
cd taxonSampler
# Virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1 # Windows
source .venv/bin/activate # Linux/macOS
# Install dependencies
pip install -r requirements.txtCreate taxbridge/.env:
DJANGO_SECRET_KEY=<secret-key>
DJANGO_DEBUG=True
DATABASE_URL=postgres://user:pass@localhost:5432/taxonsampler
CELERY_BROKER_URL=redis://localhost:6379/0
NCBI_API_KEY=<your-ncbi-api-key>Get a free NCBI API key at ncbi.nlm.nih.gov/account/settings
cd taxbridge
python manage.py migrate
python manage.py createsuperuser
python manage.py collectstatic --noinputpython manage.py runserver 0.0.0.0:8000Open http://localhost:8000/taxonomy/
| Layer | Technology |
|---|---|
| Backend | Django 6.0, Django REST Framework |
| Language | Python 3.13 |
| Task Queue | Celery 5.4 + Redis |
| Database | PostgreSQL |
| Frontend | Tabler (Bootstrap 5), D3.js v5 |
| Phylogenetics | ETE3 |
| Spreadsheet | openpyxl |
| Markdown | Markdown 3.7 + bleach 6.3 |
taxbridge/
├── apps/taxonomy/ # Core application
│ ├── models/ # Taxon, ExternalTaxon, NCBIGenome
│ ├── ncbi/ # NCBI API integration
│ ├── sampling/ # Sampling engine
│ ├── tree/ # Visualization logic
│ ├── api/ # REST endpoints
│ ├── templates/ # HTML templates
│ └── static/taxonomy/ # CSS, JS, images
├── config/ # Django settings
└── requirements.txt
cd taxbridge
# Import NCBI taxa and genomes
python manage.py import_ncbi_from_xlsx <file>
python manage.py import_genomes_from_xlsx <file>
# Match with COL
python manage.py match_ncbi_to_col
# Sync genome metadata
python manage.py sync_ncbicd taxbridge
# Start worker
celery -A config worker -l info -Q default,ncbi_sync
# Start scheduler (periodic tasks)
celery -A config beat -l infoOn Windows use the Python module invocation and the solo pool (prefork pool is not supported on Windows):
# from project root, with your virtualenv activated
python -m celery -A config worker -l info --pool=solo -Q default,ncbi_sync
# start beat (scheduler)
python -m celery -A config beat -l infoOn Linux (or Unix-like systems) you can run the standard celery executable which uses the prefork pool by default:
# from project root (activate your venv first)
celery -A config worker -l info -Q default,ncbi_sync
celery -A config beat -l infoFor production deployments prefer running Celery under a process manager (systemd, supervisord, or a container orchestration platform).
@software{taxonsampler2026,
author = {Izquerdo, Joan},
title = {{TaxonSampler}: A web platform for taxonomic sampling
and genome data integration},
year = {2026},
url = {https://github.com/joanjir/taxonSampler}
}This software is the property of the Universidad de Talca and is distributed under the terms of the MIT License. See LICENSE for details.