Skip to content

joanjir/taxonSampler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

154 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TaxonSampler

Python 3.13 Django 6.0 PostgreSQL License: MIT

TaxonSampler is an open-source web platform that integrates taxonomic classifications from the Catalogue of Life (COL) with genome assembly metadata from the NCBI Datasets API.


Features

Feature Description
Taxonomic Integration Imports and reconciles species from COL/ChecklistBank with NCBI taxonomy (26+ ranks)
Genome Metadata Assembly level, contig N50, scaffold count, genome size, GC content, annotation status
Interactive Visualization D3.js hierarchical tree with breadcrumb navigation, search, and zoom/pan
Sampling Wizard Three-step workflow: scope selection → quality filters → export
Multi-format Export JSON, TXT, Newick (ETE3), or XLSX with genome metadata
Background Processing Celery + Redis for async NCBI sync and batch sampling

Documentation & Tutorials

📚 Tutorials available at: /taxonomy/tutorials/

The application includes built-in documentation accessible from the About & Tutorials link in the sidebar:

Tutorial Description Level
Getting Started Navigation basics and interface overview Beginner
Basic Quality Filtering Filter genomes by assembly level and N50 Beginner
Representative Sampling Select N species per taxonomic group Intermediate
Quality-First Strategy Rank genomes by weighted quality score Intermediate
Broad Coverage Maximize phylogenetic diversity Advanced
Research Workflow Complete use case example (Coleoptera) Advanced

File Locations

  • Templates: taxbridge/apps/taxonomy/templates/taxonomy/pages/
    • about.html — Software information and methodology
    • tutorials.html — Step-by-step sampling guides (accordion layout)
    • report_issue.html — GitHub issue wizard
  • CSS: taxbridge/apps/taxonomy/static/taxonomy/css/
    • about.css — About and tutorials styles
    • report-issue.css — Issue wizard styles
    • tree.css — Taxonomy tree visualization

Quick Start

Prerequisites

  • Python ≥ 3.13
  • PostgreSQL ≥ 14
  • Redis ≥ 7.0

Installation

# Clone and setup
git clone https://github.com/joanjir/taxonSampler.git
cd taxonSampler

# Virtual environment
python -m venv .venv
.\.venv\Scripts\Activate.ps1  # Windows
source .venv/bin/activate      # Linux/macOS

# Install dependencies
pip install -r requirements.txt

Configuration

Create taxbridge/.env:

DJANGO_SECRET_KEY=<secret-key>
DJANGO_DEBUG=True
DATABASE_URL=postgres://user:pass@localhost:5432/taxonsampler
CELERY_BROKER_URL=redis://localhost:6379/0
NCBI_API_KEY=<your-ncbi-api-key>

Get a free NCBI API key at ncbi.nlm.nih.gov/account/settings

Database Setup

cd taxbridge
python manage.py migrate
python manage.py createsuperuser
python manage.py collectstatic --noinput

Run

python manage.py runserver 0.0.0.0:8000

Open http://localhost:8000/taxonomy/


Technology Stack

Layer Technology
Backend Django 6.0, Django REST Framework
Language Python 3.13
Task Queue Celery 5.4 + Redis
Database PostgreSQL
Frontend Tabler (Bootstrap 5), D3.js v5
Phylogenetics ETE3
Spreadsheet openpyxl
Markdown Markdown 3.7 + bleach 6.3

Project Structure

taxbridge/
├── apps/taxonomy/           # Core application
│   ├── models/              # Taxon, ExternalTaxon, NCBIGenome
│   ├── ncbi/                # NCBI API integration
│   ├── sampling/            # Sampling engine
│   ├── tree/                # Visualization logic
│   ├── api/                 # REST endpoints
│   ├── templates/           # HTML templates
│   └── static/taxonomy/     # CSS, JS, images
├── config/                  # Django settings
└── requirements.txt

Data Import

cd taxbridge

# Import NCBI taxa and genomes
python manage.py import_ncbi_from_xlsx <file>
python manage.py import_genomes_from_xlsx <file>

# Match with COL
python manage.py match_ncbi_to_col

# Sync genome metadata
python manage.py sync_ncbi

Celery Workers

cd taxbridge

# Start worker
celery -A config worker -l info -Q default,ncbi_sync

# Start scheduler (periodic tasks)
celery -A config beat -l info

On Windows use the Python module invocation and the solo pool (prefork pool is not supported on Windows):

# from project root, with your virtualenv activated
python -m celery -A config worker -l info --pool=solo -Q default,ncbi_sync
# start beat (scheduler)
python -m celery -A config beat -l info

On Linux (or Unix-like systems) you can run the standard celery executable which uses the prefork pool by default:

# from project root (activate your venv first)
celery -A config worker -l info -Q default,ncbi_sync
celery -A config beat -l info

For production deployments prefer running Celery under a process manager (systemd, supervisord, or a container orchestration platform).


Citation

@software{taxonsampler2026,
  author  = {Izquerdo, Joan},
  title   = {{TaxonSampler}: A web platform for taxonomic sampling
             and genome data integration},
  year    = {2026},
  url     = {https://github.com/joanjir/taxonSampler}
}

License

This software is the property of the Universidad de Talca and is distributed under the terms of the MIT License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors