Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoiding infinite loop in bed_makewindows() and deprecate genome argument #312

Closed
kriemo opened this issue Dec 7, 2017 · 1 comment · Fixed by #313

Comments

@kriemo
Copy link
Member

@kriemo kriemo commented Dec 7, 2017

In bed_makewindows(), if an interval is much smaller than the requested number of windows (num_win arg), then makewindows_impl() will enter into an infinite loop and crash R.

Additionally, when an interval is smaller than num_win, the default output is to simply truncate the number of windows to match the interval size. This is an undesired behavior as it will lead to misleading summary statistics computed per .win_id.

bedtools handles these cases by skipping intervals that are smaller than than the requested number of windows and issuing a warning:
https://github.com/arq5x/bedtools2/blob/master/test/makewindows/test-makewindows.sh#L172

I propose to implement the same behavior.

Also, I've noticed that the genome input is unnecessary and can be deprecated. I can put together a PR to address these issues.

Here's the current behavior.

library(valr)
bed <- trbl_interval(~chrom, ~start, ~end,
              "chr1", 0, 10)
genome <- trbl_genome(~chrom, ~size,
                      "chr1", 1e3)

# requesting a larger number of windows that the interval size
# results in fewer bins than requested
bed_makewindows(bed, genome, num_win = 15)
#> # A tibble: 10 x 4
#>    chrom start   end .win_id
#>    <chr> <int> <int>   <int>
#>  1  chr1     0     1       1
#>  2  chr1     1     2       2
#>  3  chr1     2     3       3
#>  4  chr1     3     4       4
#>  5  chr1     4     5       5
#>  6  chr1     5     6       6
#>  7  chr1     6     7       7
#>  8  chr1     7     8       8
#>  9  chr1     8     9       9
#> 10  chr1     9    10      10

Instead, if we follow the bedtools approach the behavior would be as follows:

bed_makewindows(bed, genome, num_win = 15)
#> A tibble: 0 x 4
#> ... with 4 variables: chrom <chr>, start <int>, end <int>, .win_id <int>
Warning message:
In makewindows_impl(x, win_size, num_win, step_size, reverse) :
  WARNING: Interval "chr1":0-10, smaller than requested number of windows. skipping

Lastly, here's the condition that will cause an infinite loop.

# if the number of windows is far larger the interval size
# then will enter into an infinite loop with a push.back() call
# do not run!
#bed_makewindows(bed, genome, num_win = 1000)
@jayhesselberth

This comment has been minimized.

Copy link
Member

@jayhesselberth jayhesselberth commented Dec 7, 2017

Sounds good, PR is welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.