Use FlyBase bulk files for agent workloads. Live API: helper only.
https://api.flybase.org/api/v1.0/exists.- some endpoints return useful JSON now, eg
domain/FBgn0001250,sequence/id/FBgn0001250. - some plausible endpoints return empty body today.
- bulk bucket + release files: better for repeatable agent queries.
- release bucket:
https://s3ftp.flybase.org/releases/current/ - precomputed files:
https://s3ftp.flybase.org/releases/current/precomputed_files/ - Postgres dump:
https://s3ftp.flybase.org/releases/current/psql/FB2026_01.sql.gz - API root:
https://api.flybase.org/api/v1.0/ - batch download:
https://flybase.org/batchdownload
src/flybase_cli/: package codetests/: stdlibunittestflybase_cli.py: thin repo-root shimpyproject.toml: package metadata / console entrypoint
python3 flybase_cli.py presets
python3 flybase_cli.py sync gene-core
python3 flybase_cli.py sync gene-core --release FB2026_01
python3 flybase_cli.py sync gene-knowledge --release FB2026_01
python3 flybase_cli.py full-sync --release FB2026_01
python3 flybase_cli.py full-sync \
--release FB2026_01 \
--include 'best_gene_summary|entity_publication'
python3 flybase_cli.py sync-incremental \
gene-knowledge \
--from-release FB2025_06 \
--release FB2026_01
python3 flybase_cli.py release-diff \
--preset gene-knowledge \
--from-release FB2025_06 \
--to-release FB2026_01
python3 flybase_cli.py genomes --release FB2026_01
python3 flybase_cli.py sync-genome \
--release FB2026_01 \
--genome dmel_r6.67 \
--section fasta \
--asset mirna
python3 flybase_cli.py genome-presets
python3 flybase_cli.py sync-genome \
--release FB2026_01 \
--genome dmel_r6.67 \
--preset mirna-fasta
PYTHONPATH=src python3 -m flybase_cli sync gene-expression
python3 flybase_cli.py manifest \
--url https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r6.67_FB2026_01/fasta/ \
--include 'miRNA'
python3 flybase_cli.py sync-url \
--url https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r6.67_FB2026_01/fasta/ \
--include 'miRNA'
python3 flybase_cli.py ingest \
data/flybase/precomputed_files/genes/best_gene_summary_fb_2026_01.tsv.gz \
data/flybase/precomputed_files/genes/fbgn_fbtr_fbpp_fb_2026_01.tsv.gz \
data/flybase/precomputed_files/genes/fbgn_annotation_ID_fb_2026_01.tsv.gz
python3 flybase_cli.py tables --columns
python3 flybase_cli.py describe --sample-values 2
python3 flybase_cli.py schema-export --sample-values 1
python3 flybase_cli.py query-plan --sample-values 1 --limit 5
python3 flybase_cli.py query-run --template-name gene-summary-by-fbgn --param fbgn_id=FBgn0002121
python3 flybase_cli.py fts-build
python3 flybase_cli.py search 'memory formation'
python3 flybase_cli.py pg-load --release FB2026_01
python3 flybase_cli.py sql \
"select * from fb_best_gene_summary_fb_2026_01 limit 5"
python3 flybase_cli.py sql \
"select s.fbgn_id, s.gene_symbol, a.annotation_id, p.flybase_fbtr, p.flybase_fbpp \
from fb_best_gene_summary_fb_2026_01 s \
join fb_fbgn_annotation_id_fb_2026_01 a on a.primary_fbgn = s.fbgn_id \
left join fb_fbgn_fbtr_fbpp_fb_2026_01 p on p.flybase_fbgn = s.fbgn_id \
limit 5"
python3 flybase_cli.py api domain/FBgn0001250gene-core: summaries + FBgn/FBtr/FBpp + annotation IDs + SO annotationsgene-expression: curated/high-throughput/scRNA expression slicesreferences: publication/link tablesgene-knowledge: core gene facts + representative publications + orthology tablesorthology: ortholog, paralog, and disease-association tablesinteractions: gene- and allele-level interaction tables
full-synccrawls an entire release prefix, defaultprecomputed_files/- default behavior: download only files the current loaders can ingest into SQLite
- use
--all-filesif you want non-ingestable release artifacts too - use
--include/--excludeto stage a narrower smoke or partial warehouse - default manifest path:
data/flybase/manifests/<release>/full-sync.json
genomes --release FB2026_01lists genome builds linked from that FlyBase releasesync-urlturns a crawlable FlyBase directory URL into a one-step local syncsync-genomeresolves a release/build pair into the right genome-section URL automaticallygenome-presetslists reusable genome asset sync recipes
- sections:
fasta,gff,gtf,dna,chado-xml - asset shortcuts include
mirna,transcript,translation,gene,chromosome,cds,ncrna,gff,gtf - presets include
mirna-fasta,transcript-fasta,translation-fasta,gene-fasta,chromosome-fasta,ncrna-fasta,gff-all,gtf-all - use
--include/--excludefor narrower file selection on top of the asset preset
- delimited:
tsv,csv, gzipped variants - sequence:
fasta,fa,fna,faa, gzipped variants - annotation:
gff,gff3,gtf, gzipped variants - JSON:
json,json.gz
- top-level scalar JSON fields become queryable SQLite columns
- one nested dict level is flattened, eg
gene.symbol->gene_symbol - repeated top-level lists become child tables, eg
symbolSynonyms-><table>_symbolsynonyms - repeated lists nested inside child dict rows become descendant tables, eg
genomeLocations[].exons[]-><table>_genomelocations_exons - full source record remains in
payload_json
Example:
python3 flybase_cli.py sql \
"select record_id, symbol, gene_geneId from fb_ncrna_genes_fb_2026_01 limit 5"
python3 flybase_cli.py sql \
"select parent_record_id, ordinal, value \
from fb_ncrna_genes_fb_2026_01_symbolsynonyms \
limit 5"
python3 flybase_cli.py sql \
"select parent_record_id, parent_ordinal, ordinal, startPosition, endPosition \
from fb_ncrna_genes_fb_2026_01_genomelocations_exons \
limit 5"fts-buildcreates a local SQLite FTS5 index from ingested tablessearchqueries that index without calling the live FlyBase API- record ids prefer stable FlyBase-like columns such as
fbgn_id,primary_fbgn,flybase_fbtr
describesummarizes ingested tables with row counts, source paths, semantic tags, columns, and representative non-empty valuesschema-exportwrites the same metadata to a deterministic JSON artifact beside the SQLite DB, egFB2026_01.schema.jsonschema-exportalso includes inferredrelationshipsfor nested child tables and common FlyBase ID joinsschema-exportalso emitssemantic_summaryfor table/entity tag coverageschema-exportalso emits ready-to-runquery_templatesquery-planprints starter SQL without the larger schema payloadquery-plannow includes named biological templates such asgene-summary-by-fbgn,transcript-protein-links,publications-for-gene, and coordinate lookups when matching tables existquery-runselects one template and executes it with parameter values- useful first step before writing ad hoc SQL or building agent query plans
Example:
python3 flybase_cli.py schema-export \
--db data/flybase/FB2026_01.sqlite \
--sample-values 1
python3 flybase_cli.py query-plan \
--db data/flybase/FB2026_01.sqlite \
--sample-values 1 \
--limit 5
python3 flybase_cli.py query-run \
--db data/flybase/FB2026_01.sqlite \
--template-name gene-summary-by-fbgn \
--param fbgn_id=FBgn0002121- nested JSON child tables keep lineage columns like
parent_record_id,parent_ordinal,ordinal. - many FlyBase files start with
##metadata lines; loader skips those. syncwrites a preset manifest underdata/flybase/manifests/<release>/.full-syncis the broadest offline path for release bulk data without going through the full Postgres dump.sync --release FB2026_01defaults todata/flybase/FB2026_01.sqliteto avoid cross-release mixing.sync-incrementaluses stable manifest keys so release-renamed files still land inupdatedinstead of noisy add/remove pairs.release-diffcompares releases either by raw prefix or by curated multi-prefix preset.manifest --urllets you crawl non-releases/FlyBase directories such as genome FASTA/GFF trees.sync-urlis the shortest path for genome assets once you know the directory URL.sync-genomeis the shortest path when you know the FlyBase release + genome build label.sync-genome --preset ...is the preferred path for common genome asset pulls.- some FlyBase
.gff.gzassets are tar-wrapped gzip archives; loader handles that transparently. sqlandquery-runshape results as record-oriented JSON with summary metadata for agent chaining.pg-loadstages the full Postgres import script forreleases/<release>/psql/<release>.sql.gz.pg-load --executeruns the staged script whencreatedbandpsqlare installed locally.- SQLite keeps setup minimal; switch to DuckDB/Postgres if you want bigger joins/faster scans.
- if you only need a few IDs, FlyBase Batch Download may be simpler than syncing files.
- use
--no-headerfor files whose first non-comment row is data, not column names.
python3 -m unittest discover -s tests