A comprehensive database and query tool for Quercus (oak) species and their hybrids.
Live site: oakcompendium.org
- 682 Species: Complete iNaturalist Quercus taxonomy with species data
- Web Application: Modern Svelte 5 PWA with offline support
- CLI Tool: Go-based tool for managing taxonomic data
- Multi-Source: Combines data from iNaturalist and Oaks of the World
- Offline-First: Works without internet after initial load
# Install web dependencies first
cd web && npm install && cd ..
# Start both API server (:8080) and web dev server (:5173)
make devcd web
npm install
npm run dev # Uses production API
npm run dev:local # Uses local API at localhost:8080
# Open http://localhost:5173cd cli
go build -o oak .
# View taxonomy tree
./oak taxa list
# Search for species
./oak find alba
# Export to JSON for web app
./oak export ../quercus_data.jsoncd cli
# Import iNaturalist taxonomy and species
./oak taxa import --clear data/quercus-taxonomy.yaml
./oak import-bulk data/quercus-species.yaml --source-id 1Data Sources CLI Tool Deployment
───────────── ──────── ──────────
iNaturalist ──────┐
(taxonomy) │
│
Oaks of the World ├──▶ oaks.db ──▶ quercus_data.json ──▶ git push
(descriptions) │ (SQLite) (JSON export) │
│ ▼
Bear App ─────────┘ GitHub Actions
(personal notes) │
▼
GitHub Pages
Data Sources:
- iNaturalist (Source 1): Authoritative taxonomy and species list
- Oaks of the World (Source 2): Morphological descriptions from scraping
- Bear App (Source 3): Personal field notes and observations
Workflow:
cd cli
oak import-bear # Import from Bear
oak export ../web/public/quercus_data.json # Export for web
git add -A && git commit -m "Update data" && git push
# GitHub Actions auto-deploys to GitHub PagesSee CLAUDE.md for detailed architecture documentation.
cd scrapers/oaksoftheworld
# First run (or resume from last position)
python3 scraper.py
# Force restart from beginning
python3 scraper.py --restart
# Test mode (first 50 species)
python3 scraper.py --test
# Process specific number of species
python3 scraper.py --limit=10- Auto-resume: Automatically continues from where it left off
- Progress tracking: Saves state every 10 species
- Error handling: Continues past failures, tracks failed URLs
- Rate limiting: 0.5 second delay between requests
quercus_data.json- Final structured data (in root directory)tmp/scraper/scraper_progress.json- Progress state (can be deleted to restart)tmp/scraper/data_inconsistencies.log- Taxonomic notes and name mismatchestmp/scraper/html_cache/- Cached HTML pages
{
"species": [
{
"name": "Quercus alba",
"is_hybrid": false,
"author": "L. 1753",
"synonyms": [...],
"local_names": ["white oak", "eastern white oak"],
"range": "Eastern North America; 0 to 1600 m",
"growth_habit": "reaches 25 m high...",
"leaves": "8-20 cm long, 5-10 cm wide...",
"taxonomy": {
"subgenus": "Quercus",
"section": "Quercus",
"complex": null
},
"hybrids": ["Quercus × bebbiana", ...],
"url": "http://..."
},
{
"name": "Quercus × bebbiana",
"is_hybrid": true,
"parent_formula": "alba x macrocarpa",
"parent1": "Quercus alba",
"parent2": "Quercus macrocarpa",
...
}
]
}All species include (when available):
- name: Scientific name
- is_hybrid: Boolean flag
- author: Taxonomic authority
- synonyms: List of alternative names
- local_names: Common names
- range: Geographic distribution
- growth_habit: Size and form description
- leaves: Leaf morphology
- flowers: Flower description
- fruits: Acorn characteristics
- bark_twigs_buds: Bark and twig features
- hardiness_habitat: Growing conditions
- taxonomy: Subgenus, section, subsection, complex classification
- conservation_status: IUCN status if applicable
- subspecies_varieties: Infraspecific taxa
- url: Link to source page
Hybrids additionally include:
- parent_formula: Original hybrid formula (e.g., "alba x macrocarpa")
- parent1: First parent species
- parent2: Second parent species
- Python 3.7+
- requests
- beautifulsoup4
- lxml
See scrapers/oaksoftheworld/requirements.txt for the complete list.
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
Data scraped from Oaks of the World with respect to their rate limits.
This project uses a dual-license structure:
Source Code: MIT License - see LICENSE
Data Files: All Rights Reserved - see DATA_LICENSE
The data files (quercus_data.json, cli/data/*.yaml, oaks.db) are
proprietary and not covered by the MIT License. The data incorporates information
from multiple sources; see the application for individual source attributions.
- Geographic filtering
- Taxonomy visualization
- Export functionality (CSV, PDF)
- Image gallery integration
- Mobile-responsive design
Thanks to the maintainers of Oaks of the World for compiling this comprehensive resource.