Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'negative extents to matrix' from permuteGeneral(..., repetition=TRUE,...) #45

Closed
astrophys opened this issue Jul 10, 2023 · 3 comments
Closed

Comments

@astrophys
Copy link

astrophys commented Jul 10, 2023

Hello,

I'm debugging a colleague's R code and am perplexed by what exactly is causing the following error. I'm running this on Red Hat 8.6 with R/4.3.0 and RcppAlgos_2.7.2.

Fails for ng=16

library(RcppAlgos)
ng=16;
aa = permuteGeneral(v = c(0:ng), m=31-ng, repetition = TRUE, constraintFun = "sum", comparisonFun = c(">","<"), limitConstraints = c((ng)-1,(ng)+1))
Error in permuteGeneral(v = c(0:ng), m = 31 - ng, repetition = TRUE, constraintFun = "sum", :
negative extents to matrix
No traceback available
`

Works for ng=17

ng=17
aa = permuteGeneral(v = c(0:ng), m=31-ng, repetition = TRUE, constraintFun = "sum", comparisonFun = c(">","<"), limitConstraints = c((ng)-1,(ng)+1))`

Question :

  1. What is causing "negative extents to matrix"?

I feel like I may be reaching the bounds of an integer overflow, given how large the numbers are, but I'm perplexed about why ng=17 works fine.

@jwood000
Copy link
Owner

@astrophys ,

Thanks for posting. I believe you are right that this will boil down to an integer overflow problem.

I've started looking at it. I hope to have this complete by the end of the week.

Regards,
Joseph

@astrophys astrophys changed the title 'negative extents to matrix' from permuteGeneral(..., repitition=TRUE,...) 'negative extents to matrix' from permuteGeneral(..., repetition=TRUE,...) Jul 11, 2023
@jwood000
Copy link
Owner

@astrophys,

A few notes:

First, I found the faulty code.

const int vecLen = cnstrntVec.size();

Details

This algorithm is the most general and is employed when there is no closed form solution (i.e. we do not know the number of results upfront). Because of this, we rely heavily on std::vector from the STL in C++. In particular, when we find a new result, we call push_back. Generally, array like containers are limited by the integer type std::size_t and not plain old vanilla int which you see in the code above. On most platforms std::size_t is 8 bytes (max size of 2^64 (unsigned)) whereas int is only 4 bytes (max size of 2^31 - 1 (signed)).

In your example, we end up generating 145422675 results and since there are 15 elements per result, we end up with a vector size of 145422675 * 15 = 2181340125, which is barely larger than 2^31 - 1 = 2147483647. When the code referenced above runs, we end up with overflow and vecLen becomes negative, hence the error you see.

This is all fixed now by #46.

A Better Way

As I was investigating your problem, I couldn't help but notice that these are simply weak integer compositions. RcppAlgos has highly optimized algorithms specifically for this. Plus, the results are in lexicographical order. See Integer Compositions for more information.

library(RcppAlgos)

ng <- 17

system.time({
    aa <- permuteGeneral(v = c(0:ng), m=31-ng, repetition = TRUE,
                         constraintFun = "sum", comparisonFun = c(">","<"),
                         limitConstraints = c((ng)-1,(ng)+1))
})
#>   user  system elapsed 
#>  4.339   1.439   5.798 
  
system.time({
    bb <- compositionsGeneral(
        v = c(0:ng), m=31-ng, repetition = TRUE, weak = TRUE
    )
})
#>   user  system elapsed 
#>  1.032   0.369   1.405

dim(aa)
#> [1] 119759850        14

dim(bb)
#> [1] 119759850        14

An Even Better Way

Since you are dealing with so many results, a better option is to use iterators. Right now, we are needing over 12 Gb of memory to obtain all of your results:

print(object.size(bb), unit = "Gb")
#> 6.2 Gb

We need 6.2 Gb for the finished product and about the same for the intermediate vector.

With iterators, we can generate n at a time just as fast while keeping memory low:

system.time({
    cc <- compositionsIter(
        v = c(0:ng), m=31-ng, repetition = TRUE, weak = TRUE
    )
    
    iter <- cc@nextIter()
    total <- 1L
    
    while (!is.null(iter)) {
        iter <- cc@nextNIter(1e5) ## Generate 100,000 at a time
        ## Your processing code here
        if (!is.null(iter)) total <- total + nrow(iter)
    }
    
    print(total)
})
#> [1] 119759850
#>    user  system elapsed 
#>   1.021   0.424   1.443

We get the same number of results in about the same time while only using a fraction of the memory:

cc@startOver()
print(object.size(cc@nextNIter(1e5)), units = "Mb")
#> 5.3 Mb

With the additional iterator fix as part PR #46, we can even use the permuteIter to achieve your result:

system.time({
    dd <- permuteIter(
        v = c(0:ng), m=31-ng, repetition = TRUE,
        constraintFun = "sum", comparisonFun = c(">","<"),
        limitConstraints = c((ng)-1,(ng)+1)
    )
    
    iter <- dd@nextIter()
    total <- 1L
    
    while (!is.null(iter)) {
        iter <- dd@nextNIter(1e5) ## Generate 100,000 at a time
        ## Your processing code here
        if (!is.null(iter)) total <- total + nrow(iter)
    }
    
    print(total)
})
#> No more results.

#> [1] 119759850
#>    user  system elapsed 
#>   4.483   0.606   5.087

Again, not as efficient, but you still don't have to deal with memory issues.

This fix should be on CRAN in about a week or so. I'm waiting for my recent publication to go through all of the checks before I submit this. In the meantime, you can check out the development version to use this new update.

Please let me know if you have any questions.

Regards,
Joseph

@astrophys
Copy link
Author

Thank you for the quick response to this issue; I really appreciate it. I'll offer your suggestions to my colleague. I just installed the development version and the fix works for me as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants