-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle an edge case when requested region is not in fragment file #1348
Conversation
- Motivation - When GetReadsInRegion is called, if there are no fragments that overlap the region, for some reason Rsamtools::scanTabix is reporting all fragments if there is no overlap. The catch is that it returns a named list with a 'region' called NA and lists all fragments in the file. Therefore, when TabixOutputToDataFrame slurps up the overlapping fragments, it incorrectly thinks this named list element is a real overlapping region. - Solution - Within GetReadsInRegion, don't bother scanning the the tabix and converting the output with TabixOutputToDataFrame if there are no contigs in common. Return a dataframe with no entries. - Within TabixOutputToDataFrame, correctly check if there really are no overlaps. If there are no overlaps, return an empty reads dataframe.
Here's a code snippet to reproduce the error: I have a fragment_object that only has chr22 fragments. I use GetFragmentData to find all fragments that overlap a region on chr1 (spoiler alert, there shouldn't be any). The current version incorrectly outputs all [chr22] fragments. The patch correctly outputs no fragments.
Here's the output w/ Signac_1.9.0 (I just grabbed the first few lines):
Here's the output w/ the commit:
Here's my sessionInfo:
|
Hi @nrockweiler, thanks for the PR! This is an interesting edge case, not something that I've come across before. Thanks for the fix and clear explanation! |
np @timoast. When I was debugging this, I came across another potential issue. I think there will be an error when the following happens:
This is because I think the fix would to just initialize empty dataframes with a |
Hi @nrockweiler, I agree, I have updated the function to make sure the structure of the dataframe is the same whether reads are present or not |
Motivation
Solution