-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading the fst packages initializes the RNG even though it is not used #251
Comments
Hi @mlell, thanks for submitting your issue! Parameter .Random.seed is not set directly in I'm not sure I understand your samplertest_1 and samplertest_2 tests, doesn't that show that you can set the random seed and it's not changed by loading the (so loading the namespace only initializes the seed when it's not initialized yet?) |
@MarcusKlik , samplertest_1 and samplertest_2 show that the RNG is not used by fst, no random number is actually generated during package init. This was meant to show that fst (or one of the dependencies) does not need the RNG for anything. But the example above (".Random.seed") shows that it (or a dep) does initialize the RNG, in contrast to other packages like dplyr (i chose that because dplyr makes heavy use of C code. I suspected that the RNG is initialized by Rcpp or so, but apparently it isn't) Yes I also suspected first that RStudio initializes the RNG, that's why I included the "negative controls" and ran the scripts in base R to eliminate this source of errors. I hope I managed to be clearer this time... |
@MarcusKlik, @mlell, this is due to the functions called I did a quick check on a local checkout of branch A nice example of this is package qs, with NOTE: I am referring to the |
@riccardoporreca, thank you for this research! So is it as simple as adding this attribute, or should we include tests that make sure that the RNG is not used? But I don't know how to do that. First I thought of something similar like in the script above: set.seed(123) # we know that the first random number in {1..100} with this seed will be 31
C_function_to_test()
expect_equal(sample(1:100, 1), 31) But would that work? If a C function uses the RNG without initializing, maybe the RNG state will not be forwarded to R so the test cannot detect that? Did I understand that correctly? Or am I overthinking this? |
@mlell, I think one should just follow what the vignettes recommends
I doubt package maintainers would have to rely on unit tests to know if their C++ code uses RNG, and if that is the case (unless completely expected, like in C++ functions doing random things) this should be investigated and at least documented. Overall I think we can summarize the matter as follows, which makes it ultimately a matter of choice for @MarcusKlik.
|
Hi @riccardoporreca, great, thanks a lot for clearing that up, I didn't know about In that case, I think we can safely use the |
Hi @mlell and @riccardoporreca, with the latest commit (in exists(".Random.seed")
#> [1] FALSE
require(fstcore)
#> Loading required package: fstcore
exists(".Random.seed")
#> [1] FALSE |
To make sure that all my scripts are reproducible, I ensure that they either receive a command line argument that sets the RNG seed via
set.seed()
or that they do not use the RNG. As the R variable.Random.seed
is defined on the first RNG run in a session, I throw an error if.Random.seed
exists but no SEED was provided to the script.However, loading the
fst
packages creates.Random.seed
even though the RNG is not used. I show this in the script below: Runningsample()
before and after loading the packages does not give a different result, but.Random.seed
is created.To provide a means of writing verifiably deterministic R scripts,
fst
should not initialize the RNG if it is not needed.I installed the development version of fst using
remotes::install_github("fstpackage/fst", ref = "59a18110")
.This is the script I use:
And this is the script output in my shell (
$
= my input)you can see that
.Random.seed
is set by theloadNamespace("fst")
call but not for (example)loadNamespace("dplyr")
. Moreover, a random number between 1 and 100 is chosen identically regardless of whether it is before or after theloadNamespace("fst")
call.The text was updated successfully, but these errors were encountered: