Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excluded regions in cnvkit.py access don't work #574

Closed
dajana17 opened this issue Feb 24, 2021 · 2 comments · Fixed by #581
Closed

Excluded regions in cnvkit.py access don't work #574

dajana17 opened this issue Feb 24, 2021 · 2 comments · Fixed by #581

Comments

@dajana17
Copy link

Hi,

When I pass region in one chromosome which I want to exclude, access command excludes that region from all chromosomes. It can be seen it in access-exclude.bed file.
I run command on dummy files (5 chromosomes, start 0, end 1000):

cnvkit.py access test.fa -x excludes.bed -o access-excludes.test.bed

For example excludes.bed is:
chr2 0 200

I get access-excludes.test.bed:
chr1 200 1000
chr2 200 1000
chr3 200 1000
chr4 200 1000
chr5 200 1000

Also, it is only possible to exclude regions from the start or from the end of chromosome. If I pass some regions from the middle, nothing happens in access-excludes.test.bed.

excludes.bed
chr2 200 400

access-excludes.test.bed
chr1 0 1000
chr2 0 1000
chr3 0 1000
chr4 0 1000
chr5 0 1000

Any help would be appreciated.

@tskir
Copy link
Collaborator

tskir commented Mar 18, 2021

Hi, thank you for reporting this! This is indeed a bug, I've submitted a PR to fix it in #581.

The bug only manifests itself when all chromosomes in excludes.bed appear only once. So until the PR is merged, you can work around the bug by duplicating a row (any row) in that file. For example, if you populate it with:

chr2 0 200
chr2 0 200

Then it will work.

@tskir
Copy link
Collaborator

tskir commented Mar 18, 2021

And regarding your second question about the region in the middle of the chromosome: this actually works as intended. If the remaining access regions are separated by less than MIN_GAP_SIZE, they are joined. So when [200, 400] is subtracted from [0, 1000], you get two regions [0, 200] and [400, 1000], which are only separated by 200 bases, so they are joined back together into [0, 1000].

The default value for MIN_GAP_SIZE is 5000 bases.

@etal etal closed this as completed in #581 Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants