Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard coded additional GC content filter in fix.py module #738

Closed
tsivaarumugam opened this issue May 5, 2022 · 3 comments
Closed

Hard coded additional GC content filter in fix.py module #738

tsivaarumugam opened this issue May 5, 2022 · 3 comments

Comments

@tsivaarumugam
Copy link

Hello,

In cnvkit version 0.9.8 and above, there is a hard coded GC content filter at line number 125 in the "mask_bad_bins" definition in fix.py module commit.

I understand the reason behind the filter and referred the supporting article/publication as well, but this new condition is filters out approximately 10% of our regions of interest.

Is it possible to club this filtering condition with the existing --no-gc flag command line argument and add the hard coded values to params.py module along with preexisting hard code values, so that the filter can be toggled on or off by the user without modifying the code.

Attaching the screenshots for reference.

Kindly go through and let me know.

screenshot of code snippet cnvkits version 0.9.6
cnvkits_branch0 9 6_function_mask_bad_bins

screenshot of code snippet cnvkits version 0.9.8
cnvkits_branch_0 9 8_function_mask_bad_bins

Thanks

@tetedange13
Copy link
Contributor

Hi @tsivaarumugam,

I took a look and I do not think it is possible to connect this hard-coded GC filter you mention, to --no-gc param
=> Simply because they do not actionnate the same thing :

  • --no-gc param toggles whether or not "correction for GC biais" is made (fix_gc==False if fix --no-gc) :

    cnvkit/cnvlib/fix.py

    Lines 86 to 90 in e29ec7d

    if fix_gc:
    if 'gc' in ref_matched:
    logging.info("Correcting for GC bias...")
    cnarr = center_by_window(cnarr, .1, ref_matched['gc'])
    cnarr_index_reset = True

  • Whereas mask_bad_bins() is always run, in order to filter bad bins (ie ones having extreme values of GC-content, as shown by your 2nd screenshot) :

    ok_cvg_indices = ~mask_bad_bins(ref_matched)

But I can probably submit a PR adding a new EXTREME_GC_FRACTION param to params.py, as you suggested
=> And you will just have to set it to 1 to disable GC-masking of bad bins

Hope this helps !
Have a nice day.
Felix.

@tsivaarumugam
Copy link
Author

tsivaarumugam commented Jun 10, 2022 via email

@etal
Copy link
Owner

etal commented Aug 31, 2022

Hi Felix and Siva, thanks for your attention to this. I like the solution of specifying the thresholds in params.py to ensure they're used consistently across the analysis (#753), rather than specifying a command-line option each time (#752). I'll comment in those PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants