Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CNVkit v0.9.8] Errored smoothing with HAAR-segmentation ? #587

Closed
tetedange13 opened this issue Mar 24, 2021 · 2 comments · Fixed by #590
Closed

[CNVkit v0.9.8] Errored smoothing with HAAR-segmentation ? #587

tetedange13 opened this issue Mar 24, 2021 · 2 comments · Fixed by #590

Comments

@tetedange13
Copy link
Contributor

tetedange13 commented Mar 24, 2021

Hi,


First, thanks for this very useful tool and your investment developping it !

Informatic context: Running CNVkit v0.9.8 on CentOS 6 virtual machine, installed through:

  1. conda create -n my_env pip
  2. conda activate my_env && pip install cnvkit (+ conda installation of 'r-base' for DNAcopy R package)

Data:
Hybrid capture panel of genes and hotspots (capture probes coordinates used as manifest), somatic DNA (FFPE) with no normal sample
I was previously using CNVkit v0.9.6 on the same panel but with an amplicon tech. Segmenting with HAAR used to give better results than CBS, so I wanted to give it a try on my new data and with an upgraded CNVkit.
=> I should precise that using v0.9.8 on my hybrid-capture data with a CBS-segmentation went fine (so no installation issues I guess ?)

Exact command and error:
cnvkit.py batch my_patient.bam -p 1 -n --segment-method haar -f hg19.fa -t capture_probes.manifest.bed --drop-low-coverage
Most patient when fine, except for a couple, that raised following error:

  File "/home/bioinfo/miniconda3/envs/cnvkit_repo/lib/python3.9/site-packages/cnvlib/segmentation/haar.py", line 328, in HaarConv
      highNonNormed += signal[highEnd] * weight[highEnd] - signal[k-1] * weight[k-1]
IndexError: index 2 is out of bounds for axis 0 with size 2

(put complete traceback here for clarity)

Possible cause:
This error happens during segmentation of chrY and can be reproduced: cnvkit.py segment --drop-low-coverage -m haar my_patient.cnr (I think I can even share with you both a buggy and a fonctionning ".cnr" with equivalent chrY depth, if needed)
My panel has a single probe on SRY used as a control, which is "binned" by CNVkit into 1 antitarget and 1 target region within each .cnr as examplified here:

Y  2507140  2654431  Antitarget                               0        -10.8612  0.0001
Y  2654931  2655203  NM_003140|SRY|3PUTR|1;NM_003140|SRY|1/1  1286.11  0.110645  0.932772

  • Every patient "HAAR-segmentable" has a 0 mean_depth for SRY antitarget region and a "classic" depth for target region (> 1000X for male and around 0 for female)
    => low-covered antitarget is excluded lead to a 1-sized signal_array, so smoothing returns a np.zeros(signalSize, dtype=np.float_) thanks to this condition
    if stepHalfSize > signalSize:
  • In the contrary, my bogued patients have a very small depth at their antitarget SRY region (0.000536353, but not 0)
    => So antitarget is kept and this leads to a 2-sized signal_arr, next given to smoothing.savgol() via cnarr.smooth_log() and after "winging", we obtain a 4-sized signal_arr
    => It is here that something's wrong: Going through smoothing.convolve_weighted(), the array shape is messed up (7 for both signal_arr and weights_arr, when we expect them to be 4-shaped), leading to this "index out of range" error described abovehead

Possible fixes:
My understanding of these "smoothing things" is too limited to understand the precise origin of this bug, but here are some ideas:

  • In smoothing.savgol(), you have the following condition
    if len(x) < 2:

    => Including case where len(x)==2 should not have a huge impact on segmentation ?
  • I don't know where in the code regions from the ".cnr" with depth at 0 are excluded. But maybe you should insert a cutoff there, to exclude regions having a depth bellow a certain value ?

Thanks in advance for your help.
Best regards.
Felix.

@tetedange13 tetedange13 changed the title [CNVkit v0.9.8] Errored smoothing when HAAR-segmentation ? [CNVkit v0.9.8] Errored smoothing with HAAR-segmentation ? Mar 25, 2021
@tetedange13
Copy link
Contributor Author

My current hot fix is to return "UNsmoothed"(original) signal when a shape issue is detected at the end of smoothing:savgol():

bad_idx = (y > x.max()) | (y < x.min())
    if bad_idx.any():
        logging.warning("Smoothing overshot at {} / {} indices: "
                        "({}, {}) vs. original ({}, {})"
                        .format(bad_idx.sum(), len(bad_idx),
                                y.min(), y.max(),
                                x.min(), x.max()))

# >> START MODIF
if y[total_wing:-total_wing].shape != x.shape: # FE ADD:
        print("[WARNING]: array shape mess up --> returnin UNsmoothed values")
        return x 
# << END MODIF

return y[total_wing:-total_wing]

@tskir
Copy link
Collaborator

tskir commented Apr 5, 2021

Hi @tetedange13, thank you for this amazingly detailed bug report! I've investigated this and submitted a PR with what I think might be a permanent solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants