fix(#176): splice + var_filter parity between open() and with_settings()#179
Merged
Conversation
Three bugs identified: 1. with_settings(var_filter=...) doesn't propagate to _recon 2. choose_exonic_variants mis-indexes 2-D SVAR offsets 3. Existing regression test used wrong 2-D layout Plan: unify open/with_settings, fix kernel indexing, rebuild tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks: rebuild 2-D regression test, fix choose_exonic_variants indexing, fix with_settings var_filter propagation, unify Dataset.open through with_settings, add end-to-end parity test, verify. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
One configuration path. Eliminates the open()/with_settings divergence that masked #176 — open() previously reached into Haps(filter=...) and constructed SpliceIndexer inline; with_settings had its own copy (broken for var_filter). Both now flow through with_settings.
SVAR offsets are constructed as offsets.reshape(2, -1) in Haps.from_path. geno_offsets[o_idx] returns a length-n_slices row, not a 2-tuple, producing garbage o_s/o_e -> MemoryError or negative-dim ValueError. Mirror filter_af's geno_offsets[:, o_idx]. Fixes #176 (kernel half).
…info)
Dataset.open(splice_info=...) now calls with_seqs("haplotypes") internally
before delegating to with_settings, so the caller does not need to
interleave with_seqs between open and with_settings. Prevents
_check_valid_state raising "Splicing is not supported with variants" when
splice_info is passed to open on a genotype+reference-backed dataset.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput End-to-end regression: state probe (filter on _recon, splice indexer present) and output equality on a small SVAR-backed dataset with per-region single-exon transcripts. Catches both halves of #176 — the kernel mis-indexing and the with_settings propagation gap. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s) SVAR layout The previous fixture used (total_variants, 2) which accidentally made geno_offsets[o_idx] return a 2-element row. Real SVAR offsets are (2, n_slices) per Haps.from_path; with n_slices > 2 the existing indexing returns a row that can't unpack into (o_s, o_e), exposing #176.
The next assertion (`is not None`) subsumes the `(None) == (None)` check. Review nit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ce-exonic-filter # Conflicts: # python/genvarloader/_dataset/_impl.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #176. Three concrete bugs, one refactor:
with_settings(var_filter=...)silently failed to propagate to_recon. After.with_seqs("haplotypes"),_reconis a separateHapsinstance with the user-chosenkind.__getitem__uses_recon, not_seqs, so updating only_seqsmeant the filter was a no-op for the code path that actually ran. (This is why the reporter's "Path B" appeared to work — it was producing un-filtered haplotypes.)choose_exonic_variantsmis-indexed 2-D SVAR offsets:geno_offsets[o_idx]returned a length-n_slicesrow, not a 2-tuple, producing garbageo_s/o_eandMemoryError/negative-dim ValueError. Mirror the siblingfilter_afkernel:geno_offsets[:, o_idx].(total_variants, 2)), which coincidentally made the wrong indexing return 2 elements per row. Rebuilt on the real(2, n_slices)SVAR layout withn_slices > 2so wrong indexing fails loudly.Dataset.opennow delegatessplice_info/var_filterconfiguration toself.with_settings(...), eliminating the two-code-path divergence that masked Dataset.open(splice_info, var_filter) breaks choose_exonic_variants; with_settings(splice_info, var_filter) works #176 in the first place.One in-flight scope expansion
After the refactor,
Dataset.open(splice_info=...)now reaches_check_valid_stateviawith_settings, which rejects splicing with the defaultsequence_type == "variants".Dataset.openauto-promotes towith_seqs("haplotypes")in that case (narrow guard: only whensplice_infois set,_seqsisHaps, andsequence_typeis the default "variants"). This eliminates a pre-existing silent footgun where the OLD code returned a Dataset in an invalid state until the user called.with_seqs("haplotypes")themselves.Files changed
python/genvarloader/_dataset/_genotypes.pychoose_exonic_variantspython/genvarloader/_dataset/_impl.pywith_settingspropagatesvar_filterto_recon;Dataset.opendelegates splice/filter towith_settings; auto-promote to haplotypes whensplice_infois suppliedtests/dataset/genotypes/test_choose_exonic_variants.py(2, n_slices)SVAR layouttests/dataset/test_with_settings_var_filter.pyvar_filter→_reconpropagationtests/dataset/test_open_vs_settings_parity.pyTest plan
pixi run -e dev pytest tests/dataset/genotypes/test_choose_exonic_variants.py— both pass on the real(2, n_slices)layoutpixi run -e dev pytest tests/dataset/test_with_settings_var_filter.py— direct_recon.filterpropagation probes passpixi run -e dev pytest tests/dataset/test_open_vs_settings_parity.py— both paths produce identical state and byte-for-byte identical[0,:]output on SVAR-backed datapixi run -e dev test— full pytest + cargo, 459 passed, 5 skipped, 2 xfailedpixi run -e dev ruff check python/ tests/— cleanOut of scope
min_af/max_af/var_fieldsrecon-rebuild block at_impl.py:427-434has a separate latent bug (clobbers_recon.kind). This fix deliberately does not touch it.🤖 Generated with Claude Code