Skip to content

Releases: scverse/gget

v0.30.6 - bugs and fallbacks

11 Jun 02:11

Choose a tag to compare

  • gget blat: Improved resilience against UCSC BLAT endpoint failures (fixes intermittently failing tests).
    • Added retry-with-exponential-backoff for transient failures (HTTP 429/5xx, network errors, and non-JSON 200 responses caused by UCSC rate-limiting or HTML error pages). Up to 4 attempts with 1.5s → 3s → 6s backoff.
    • Replaced the misleading "sequence too short or assembly invalid" message with the actual server response (status code, response preview) so failures are diagnosable.
    • HTTPError and URLError are now caught explicitly instead of bubbling up as unhandled exceptions.
  • Bug fixes:
    • gget cosmic: Fixed misleading error message when the download step fails — was reporting the previous command's return code/stderr instead of the failing command's.
    • gget cosmic: Narrowed the JSON parse exception handler to json.JSONDecodeError so unrelated ValueErrors are no longer masked by the "Failed to download file" message.
    • gget --version, gget --help, gget invoked with no arguments, and gget <module> with no further arguments now all exit with status 0 instead of 1, so CI scripts and shell pipelines no longer treat these informational outputs as failures.
    • Added request timeouts to previously-unguarded requests calls in gget ref, gget info, gget 8cube, gget enrichr, and gget opentargets. Default is 10s connect / 60s read; configurable via the new DEFAULT_REQUESTS_TIMEOUT constant.
    • Narrowed a bare except: in utils.get_uniprot_seqs to (KeyError, IndexError, TypeError) so unrelated errors (including KeyboardInterrupt) are no longer swallowed.
    • Added utils.http_json() and utils.dig() helpers that issue a request and parse JSON / walk a nested response path with consistent error reporting. Migrated gget bgee, gget opentargets, and one .json() callsite in gget virus to use them; remaining modules will migrate opportunistically. Upstream HTML error pages, malformed JSON, and missing response keys now surface as clear RuntimeErrors naming the failing service instead of cryptic JSONDecodeError / KeyError tracebacks.
    • utils.http_json() now retries transient failures (connection errors, read timeouts, HTTP 5xx) up to 3 times with exponential backoff. Smooths over short upstream blips (e.g. bgee.org read timeouts) without affecting 4xx errors, which still raise immediately.
    • gget virus: Replaced 11 bare except: pass blocks around file.close() / os.remove() cleanup calls with narrowed except OSError handlers that log the failure at DEBUG. Previously, real I/O issues during cleanup (disk full, permissions) were silently dropped and the cleanup path also swallowed KeyboardInterrupt.
    • gget cbio: Fixed a code path in cbio_plot that called the removed-in-pandas-2.0 DataFrame.append() inside a loop when filling missing CNA genes — the entire branch crashed on modern pandas. It now builds a single DataFrame of missing rows and concatenates once.
  • Performance:
    • utils.get_uniprot_seqs: Collect per-ID DataFrames in a list and pd.concat(..., ignore_index=True) once at the end, avoiding the O(n²) cost of growing a DataFrame inside the request loop.
    • Cached utils.find_latest_ens_rel, utils.search_species_options, utils.ref_species_options, and utils.find_nv_kingdom with functools.lru_cache. These hit Ensembl FTP listings that are stable for a release; repeated calls within one Python process are now free.
    • Added utils.parallel_map, a thin ThreadPoolExecutor wrapper for I/O-bound work. Used to fan out utils.get_uniprot_seqs across the input ID list — looking up N IDs is now bounded by ~N / pool_size UniProt round-trips instead of N. Pool size defaults to 8 and can be overridden via the GGET_MAX_WORKERS environment variable.

v0.30.5 - gget virus updates

24 May 01:02
39c7ac2

Choose a tag to compare

  • gget opentargets: Rewrote this module to reflect the new Open Targets API structure
    • some output column/key names may differ to reflect the new API structure
    • Removed the --filter_mode argument
  • gget blast: Fixed compatibility with newer pandas versions (≥ 2.0) where pd.read_html() no longer accepts raw HTML strings directly, causing a FileNotFoundError / OSError: Filename too long error when parsing BLAST results
  • gget cosmic: Added overwrite and gzip arguments to internals.
  • gget virus updates

v0.30.3 - gget virus updates

27 Feb 02:17
7abba8c

Choose a tag to compare

Version ≥ 0.30.3 (Feb 26, 2026):

  • gget virus: New filtering options, quiet mode, and improved download reliability
    • Added --segment filter for segmented viruses (e.g., Influenza A segments like 'HA', 'NA', 'PB1')
    • Added --vaccine_strain filter to include or exclude vaccine strain sequences
    • Added --source_database filter to select sequences from 'genbank' or 'refseq' (replaces refseqOnly)
    • Added -q / --quiet flag to suppress progress information
    • Extended fallback strategies for improved download reliability on large datasets
    • Command summary file now includes software version

v0.30.2 - gget virus updates

08 Feb 19:23
d34e1ce

Choose a tag to compare

  • gget virus updates: Metadata streaming optimization, improved protein filtering, and enhanced error handling and retry logic
    • Metadata now streams to disk during fetch to prevent memory exhaustion on large datasets (100,000+ records)
    • Fixed metadata CSV mapping (camelCase → snake_case) for organism name, host, and collection date
    • Enhanced protein filtering for segmented viruses with improved FASTA header parsing
    • Added annotated=False option for filtering unannotated sequences
    • Added progress bars to batched sequence downloads
    • Fixed collection date naming bug
    • Improved error messages for invalid filter dates
    • Added enhanced retry attempts for virus name resolution
    • Added verbosity to influenza A and COVID-19 checking steps

v0.30.0 - gget virus & gget 8cube

20 Jan 01:21
163db87

Choose a tag to compare

  • NEW MODULES:

  • SECURITY IMPROVEMENTS:

    • Replaced os.system() with f-strings containing URLs from external APIs in gget/main.py
    • Replaced exec() with importlib.import_module() in gget setup for safer dynamic imports
    • Replaced shell=True subprocess calls with list-based arguments in gget muscle, gget diamond, and gget setup to prevent command injection

v0.29.3 - BLAT access

11 Sep 22:32
e9fe5d5

Choose a tag to compare

Version ≥ 0.29.3 (Sep 11, 2025):

v0.29.2 - uv pip install gget

03 Jul 19:13
a1c6c88

Choose a tag to compare

Version ≥ 0.29.2 (Jul 03, 2025):

  • gget can now be installed using uv pip install gget
    • All package metadata (version, author, description, etc.) is now managed in setup.cfg for full compatibility with modern tools like uv, pip, and PyPI
    • gget now uses a minimal setup.py and is fully PEP 517/518 compatible
  • gget setup will now try to use uv pip install first for speed and modern dependency resolution, and fall back onto pip install if uv fails or is not available
    • Users are informed at each step which installer is being used and if a retry is happening
    • Note: Some scientific dependencies (e.g., cellxgene-census) may not yet support Python 3.12. If you encounter installation errors, try using Python 3.9 or 3.10. (The pip installation might also still succeed in these cases.)
  • All required dependencies are now listed in setup.cfg under install_requires -> Installing gget with pip install . or uv pip install . will automatically install all dependencies

v0.29.1 - mutate and cosmic overhaul

21 Apr 23:19
0af4d0c

Choose a tag to compare

  • gget mutate:
    • gget mutate has been simplified to focus on taking as input a list of mutations and associated reference genome with corresponding annotation information, and produce as output the sequences with the mutation incorporated and a short region of surrounding context. For the full functionality of the previous version and how it integrates in the context of a novel variant screening pipeline, visit the varseek repository being developed by members of the gget team at https://github.com/pachterlab/varseek.git.
    • Added additional information to returned data frames as described here: #169
  • gget cosmic:
    • Major restructuring of the gget cosmic module to adhere to new login requirements set by COSMIC
    • New arguments email and password were added to allow the user to manually enter their login credentials without required input for data download
    • Default changed: gget_mutate=False
    • Deprecated argument: entity
    • Argument mutation_class is now cosmic_project
  • gget bgee:
    • type="orthologs" is now the default, removing the need to specify the type argument when calling orthologs
    • Allow querying multiple genes at once.
  • gget diamond:
    • Now supports translated alignment of nucleotide sequences to amino acid reference sequences using the --translated flag.
  • gget elm:
    • Improved server error handling.

v0.29.0 - cbio, opentargets, bgee and more

26 Sep 02:07
5589dee

Choose a tag to compare

  • New modules:
  • gget enrichr now also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichR
  • gget mutate:
    gget mutate will now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.
  • gget cosmic:
    Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.
  • gget ref:
    Added out file option.
  • gget info and gget seq:
    Switched to Ensembl POST API to increase speed (nothing changes in front end).
  • Other "behind the scenes" changes:

fixes #157
fixes #121
fixes #144
fixes #140
fixes #103

v0.28.6 - gget mutate, download_cosmic, fixes for Ensembl v112

03 Jun 06:05
4664916

Choose a tag to compare

  • New module: gget mutate
  • gget cosmic: You can now download entire COSMIC databases using the argument download_cosmic argument
  • gget ref: Can now fetch the GRCh37 genome assembly using species='human_grch37'
  • gget search: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)