Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting different results for Sample #141

Closed
shokoohi opened this issue Dec 8, 2022 · 2 comments
Closed

Getting different results for Sample #141

shokoohi opened this issue Dec 8, 2022 · 2 comments

Comments

@shokoohi
Copy link

shokoohi commented Dec 8, 2022

Hello
I ran the following code but I get different results

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadilloExtensions/sample.h>
using namespace Rcpp ;
// [[Rcpp::export]]
CharacterVector csample_char( CharacterVector x, 
                              int size,
                              bool replace, 
                              NumericVector prob = NumericVector::create()
) {
    CharacterVector ret = RcppArmadillo::sample(x, size, replace, prob) ;
    return ret ;
}


/*** R
N <- 10
set.seed(7)
sample.r <- sample(letters, N, replace=T)
print(sample.r)
set.seed(7)
sample.c <- csample_char(letters, N, replace=T)
print(sample.c)
print(identical(sample.r, sample.c))
*/

Here is what I get:

> N <- 10
> set.seed(7)
> sample.r <- sample(letters, N, replace=T)
> print(sample.r)
 [1] "j" "s" "g" "b" "o" "z" "v" "h" "x" "c"
> set.seed(7)
> sample.c <- csample_char(letters, N, replace=T)
> print(sample.c)
 [1] "z" "k" "d" "b" "g" "u" "i" "z" "e" "l"
> print(identical(sample.r, sample.c))
[1] FALSE
@eddelbuettel
Copy link
Member

Wow. That is an excellent issue ticket. The only thought I had -- and it turns out it was the right one -- is that the original article on this sample() approach via RcppArmadillo is so old ... that it was affected by the one change made in R 3.6.0. See under 'Details' in help(set.seed):

 ‘sample.kind’ can be ‘"Rounding"’ or ‘"Rejection"’, or partial
 matches to these.  The former was the default in versions prior to
 3.6.0: it made ‘sample’ noticeably non-uniform on large
 populations, and should only be used for reproduction of old
 results.  See PR#17494 for a discussion.

When I use that with your code I get same sample as sample.c gets:

> RNGkind(sample.kind="Rounding")
Warning message:
In RNGkind(sample.kind = "Rounding") : non-uniform 'Rounding' sampler used
> set.seed(7)
> (sample.r <- sample(letters, N, replace=T))
 [1] "z" "k" "d" "b" "g" "u" "i" "z" "e" "l"
> print(identical(sample.r, sample.c))
[1] TRUE
> 

I will add a note to the article.

@eddelbuettel
Copy link
Member

I just added short note to this piece. Thanks again for pointing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants