Releases: scverse/gget
Releases · scverse/gget
v0.30.6 - bugs and fallbacks
gget blat: Improved resilience against UCSC BLAT endpoint failures (fixes intermittently failing tests).- Added retry-with-exponential-backoff for transient failures (HTTP 429/5xx, network errors, and non-JSON 200 responses caused by UCSC rate-limiting or HTML error pages). Up to 4 attempts with 1.5s → 3s → 6s backoff.
- Replaced the misleading "sequence too short or assembly invalid" message with the actual server response (status code, response preview) so failures are diagnosable.
HTTPErrorandURLErrorare now caught explicitly instead of bubbling up as unhandled exceptions.
- Bug fixes:
gget cosmic: Fixed misleading error message when the download step fails — was reporting the previous command's return code/stderr instead of the failing command's.gget cosmic: Narrowed the JSON parse exception handler tojson.JSONDecodeErrorso unrelatedValueErrors are no longer masked by the "Failed to download file" message.gget --version,gget --help,ggetinvoked with no arguments, andgget <module>with no further arguments now all exit with status 0 instead of 1, so CI scripts and shell pipelines no longer treat these informational outputs as failures.- Added request timeouts to previously-unguarded
requestscalls ingget ref,gget info,gget 8cube,gget enrichr, andgget opentargets. Default is 10s connect / 60s read; configurable via the newDEFAULT_REQUESTS_TIMEOUTconstant. - Narrowed a bare
except:inutils.get_uniprot_seqsto(KeyError, IndexError, TypeError)so unrelated errors (includingKeyboardInterrupt) are no longer swallowed. - Added
utils.http_json()andutils.dig()helpers that issue a request and parse JSON / walk a nested response path with consistent error reporting. Migratedgget bgee,gget opentargets, and one.json()callsite ingget virusto use them; remaining modules will migrate opportunistically. Upstream HTML error pages, malformed JSON, and missing response keys now surface as clearRuntimeErrors naming the failing service instead of crypticJSONDecodeError/KeyErrortracebacks. utils.http_json()now retries transient failures (connection errors, read timeouts, HTTP 5xx) up to 3 times with exponential backoff. Smooths over short upstream blips (e.g. bgee.org read timeouts) without affecting 4xx errors, which still raise immediately.gget virus: Replaced 11 bareexcept: passblocks aroundfile.close()/os.remove()cleanup calls with narrowedexcept OSErrorhandlers that log the failure atDEBUG. Previously, real I/O issues during cleanup (disk full, permissions) were silently dropped and the cleanup path also swallowedKeyboardInterrupt.gget cbio: Fixed a code path incbio_plotthat called the removed-in-pandas-2.0DataFrame.append()inside a loop when filling missing CNA genes — the entire branch crashed on modern pandas. It now builds a single DataFrame of missing rows and concatenates once.
- Performance:
utils.get_uniprot_seqs: Collect per-ID DataFrames in a list andpd.concat(..., ignore_index=True)once at the end, avoiding the O(n²) cost of growing a DataFrame inside the request loop.- Cached
utils.find_latest_ens_rel,utils.search_species_options,utils.ref_species_options, andutils.find_nv_kingdomwithfunctools.lru_cache. These hit Ensembl FTP listings that are stable for a release; repeated calls within one Python process are now free. - Added
utils.parallel_map, a thinThreadPoolExecutorwrapper for I/O-bound work. Used to fan oututils.get_uniprot_seqsacross the input ID list — looking up N IDs is now bounded by ~N / pool_sizeUniProt round-trips instead ofN. Pool size defaults to 8 and can be overridden via theGGET_MAX_WORKERSenvironment variable.
v0.30.5 - gget virus updates
gget opentargets: Rewrote this module to reflect the new Open Targets API structure- some output column/key names may differ to reflect the new API structure
- Removed the
--filter_modeargument
gget blast: Fixed compatibility with newer pandas versions (≥ 2.0) wherepd.read_html()no longer accepts raw HTML strings directly, causing aFileNotFoundError/OSError: Filename too longerror when parsing BLAST resultsgget cosmic: Added overwrite and gzip arguments to internals.gget virusupdates
v0.30.3 - gget virus updates
Version ≥ 0.30.3 (Feb 26, 2026):
gget virus: New filtering options, quiet mode, and improved download reliability- Added
--segmentfilter for segmented viruses (e.g., Influenza A segments like 'HA', 'NA', 'PB1') - Added
--vaccine_strainfilter to include or exclude vaccine strain sequences - Added
--source_databasefilter to select sequences from 'genbank' or 'refseq' (replacesrefseqOnly) - Added
-q/--quietflag to suppress progress information - Extended fallback strategies for improved download reliability on large datasets
- Command summary file now includes software version
- Added
v0.30.2 - gget virus updates
gget virusupdates: Metadata streaming optimization, improved protein filtering, and enhanced error handling and retry logic- Metadata now streams to disk during fetch to prevent memory exhaustion on large datasets (100,000+ records)
- Fixed metadata CSV mapping (camelCase → snake_case) for organism name, host, and collection date
- Enhanced protein filtering for segmented viruses with improved FASTA header parsing
- Added
annotated=Falseoption for filtering unannotated sequences - Added progress bars to batched sequence downloads
- Fixed collection date naming bug
- Improved error messages for invalid filter dates
- Added enhanced retry attempts for virus name resolution
- Added verbosity to influenza A and COVID-19 checking steps
v0.30.0 - gget virus & gget 8cube
-
NEW MODULES:
-
SECURITY IMPROVEMENTS:
- Replaced
os.system()with f-strings containing URLs from external APIs ingget/main.py - Replaced
exec()withimportlib.import_module()ingget setupfor safer dynamic imports - Replaced
shell=Truesubprocess calls with list-based arguments ingget muscle,gget diamond, andgget setupto prevent command injection
- Replaced
v0.29.3 - BLAT access
Version ≥ 0.29.3 (Sep 11, 2025):
gget blat: Updated API request to new permissions.gget pdb: Added wwpdb mirror; falls back to rcsb if wwpdb fails.gget cellxgene: Improved argument handling; frontend unchanged. Fixes issue 181.gget setup/gget alphafold: Fixed pip_cmd bug in gget.setup("alphafold")
v0.29.2 - uv pip install gget
Version ≥ 0.29.2 (Jul 03, 2025):
ggetcan now be installed usinguv pip install gget- All package metadata (version, author, description, etc.) is now managed in setup.cfg for full compatibility with modern tools like uv, pip, and PyPI
- gget now uses a minimal setup.py and is fully PEP 517/518 compatible
gget setupwill now try to useuv pip installfirst for speed and modern dependency resolution, and fall back ontopip installif uv fails or is not available- Users are informed at each step which installer is being used and if a retry is happening
- Note: Some scientific dependencies (e.g., cellxgene-census) may not yet support Python 3.12. If you encounter installation errors, try using Python 3.9 or 3.10. (The pip installation might also still succeed in these cases.)
- All required dependencies are now listed in setup.cfg under install_requires -> Installing gget with
pip install .oruv pip install .will automatically install all dependencies
v0.29.1 - mutate and cosmic overhaul
gget mutate:- gget mutate has been simplified to focus on taking as input a list of mutations and associated reference genome with corresponding annotation information, and produce as output the sequences with the mutation incorporated and a short region of surrounding context. For the full functionality of the previous version and how it integrates in the context of a novel variant screening pipeline, visit the varseek repository being developed by members of the gget team at https://github.com/pachterlab/varseek.git.
- Added additional information to returned data frames as described here: #169
gget cosmic:- Major restructuring of the
gget cosmicmodule to adhere to new login requirements set by COSMIC - New arguments
emailandpasswordwere added to allow the user to manually enter their login credentials without required input for data download - Default changed:
gget_mutate=False - Deprecated argument:
entity - Argument
mutation_classis nowcosmic_project
- Major restructuring of the
gget bgee:type="orthologs"is now the default, removing the need to specify thetypeargument when calling orthologs- Allow querying multiple genes at once.
gget diamond:- Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the
--translatedflag.
- Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the
gget elm:- Improved server error handling.
v0.29.0 - cbio, opentargets, bgee and more
- New modules:
gget enrichrnow also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichRgget mutate:
gget mutatewill now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.gget cosmic:
Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.gget ref:
Added out file option.gget infoandgget seq:
Switched to Ensembl POST API to increase speed (nothing changes in front end).- Other "behind the scenes" changes:
- Unit tests reorganized to increase speed and decrease code
- Requirements updated to allow newer mysql-connector versions
- Support Numpy>= 2.0
v0.28.6 - gget mutate, download_cosmic, fixes for Ensembl v112
- New module:
gget mutate gget cosmic: You can now download entire COSMIC databases using the argumentdownload_cosmicargumentgget ref: Can now fetch the GRCh37 genome assembly usingspecies='human_grch37'gget search: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)