Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix smoothing bug when signal is shorter than a window #590

Merged
merged 6 commits into from
May 11, 2021

Conversation

tskir
Copy link
Collaborator

@tskir tskir commented Apr 5, 2021

Closes #587. Thank you @tetedange13 for an amazingly detailed investigation which helped a lot!

In Savitzky-Golay smoothing, for certain combinations of parameters, a (wing-padded) signal length can turn out to be shorter than the smoothing window requested, which is mathematically impossible to compute and causes an index mismatch exception downstream.

This change avoids this by checking for this condition and reducing the smoothing window length and, if necessary, also the polynomial order. This ensures (compared to e.g. returning the original signal) that at least some sensible smoothing is still being performed.

@tskir
Copy link
Collaborator Author

tskir commented Apr 5, 2021

@etal I'm reasonably certain that this should be a sound solution (and I also verified that it definitely fixes #587), but perhaps it would be useful if you could double-check that the math checks out, as it's not quite my area of expertise

@etal
Copy link
Owner

etal commented Apr 5, 2021

Interesting, I think I meant for check_inputs to do this operation internally, but apparently it doesn't.

Would you like to attempt merging check_inputs and check_smoothing_parameters into a single function? If not, and you're happy with the results of this function, then I'm fine with merging this PR as it is.

@tetedange13
Copy link
Contributor

This PR should also fix a tinu bug I got from running cnvkit.py scatter --by-bin --trend my_patient.cnr, where chrY crashes due to --trend flag

Full traceback here: traceback.txt
=> It is a bit different from #587, IMHO core problem is the same

Here is a minimal".cnr" that reproduce the bug: sample.cnr.txt

Hope this helps.
Thanks again.
Felix.

In Savitzky-Golay smoothing, for certain combinations of parameters, a
(wing-padded) signal length can turn out to be shorter than the
smoothing window requested, which is mathematically impossible to
compute and causes an index mismatch exception downstream.

This change avoids this by checking for this condition and reducing the
smoothing window length and, if necessary, also the polynomial order.
This ensures (compared to e.g. returning the original signal) that at
least some sensible smoothing is still being performed.
@tskir
Copy link
Collaborator Author

tskir commented Apr 24, 2021

@etal Thank you for your comments! I've now investigated this further and here's what I found.

You are right: check_inputs already checks for this condition via _width2wing, and for short signals the wing size is correctly reduced when necessary:

elif width >= 2 and int(width) == width:
# Ensure window width <= len(x) to avoid TypeError
width = min(width, len(x) - 1)
wing = int(width // 2)

The problem is how these results are then used. After calling check_inputs, all functions except savgol() recalculate the final window width as 2 * total_wing + 1, which will always work.

Because savgol() was not taking that into account, this could potentially cause two distinct problems:

  • For short signals, the originally requested total_width is too big;
  • And for really short signals, even the one-iteration window width (7 by default) is too big.

I have now rewritten this PR to properly address both of them. An additional function is no longer required, as well as any modifications to check_inputs.

@tskir tskir requested a review from etal April 24, 2021 01:30
@tskir
Copy link
Collaborator Author

tskir commented Apr 24, 2021

@tetedange13 Thank you for the additional test file! I've verified that the latest version of this PR should indeed fix the scatter crash.

Copy link
Owner

@etal etal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good. Thanks!

@etal etal merged commit b05e57b into etal:master May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CNVkit v0.9.8] Errored smoothing with HAAR-segmentation ?
3 participants