Random number generators always use the same seed #176

Closed
rcurtin opened this Issue Dec 29, 2014 · 6 comments

Projects

None yet

1 participant

@rcurtin
Member
rcurtin commented Dec 29, 2014

Reported by march on 3 Mar 42036560 10:14 UTC
The boost random number generation stuff we're wrapping is always using the same seed on startup. Is there a simple way to make it use the system time or something, or at least allow the user to specify a seed before generating numbers?

I'm using boost 1.47, so this may not be the case for older versions.

@rcurtin rcurtin self-assigned this Dec 29, 2014
@rcurtin rcurtin added this to the mlpack 1.0.1 milestone Dec 29, 2014
@rcurtin rcurtin closed this Dec 29, 2014
@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 12 Aug 42036572 21:13 UTC
I didn't realize we weren't initializing a random seed on startup. That one's pretty easy to fix, but the bigger issue I've struggled with for a bit longer.

at least allow the user to specify a seed before generating numbers

There are a couple ways to do this and I am not sure which to go with.

  1. Have CLI take another builtin option (--seed), like --help and --info, which sets the random seed. This has a drawback in that now, a --seed option is available for a method which has no use or need of random number generators. Alternately... only include the PARAM_INT("seed", ...) in random.hpp so that --seed only appears when the developer includes random.hpp.
  2. Have CLI (or some other similar singleton which is always built at startup time) just set the seed to time(NULL) or similar. Then, if an application wants to be able to change the seed, it just adds its own option in its main() file and handles it there.

I'm CCing Neil and James because I'd like their opinions, too, on which to go with. I'm not sure which is the better choice.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by jcline3 on 8 Sep 42036592 02:02 UTC
The default if nothing is specified should probably be to use the system time. It's what most people would expect, I should think.

Having a --seed option would also be nice for debugging purposes, at least. If we do have one, it makes more sense I think to do it the first way with the PARAM_INT("seed", ...) in random.hpp, unless that goes against what we had decided for CLI stuff.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by nslagle on 15 Oct 42042327 06:52 UTC
Frankly, this isn't really a bug in MLPack, unless our wrappers fail to provide the means to change the seed value. (In my code, I've accessed boost::random objects directly where necessary.)

From what I understand, one can modify the seed value as easily as invoking

generator.seed(someUnsignedInt);

on the current generator object. So long as random.hpp provides the means to do this, let the author of the machine learning method decide the importance of modifying the seed by placing a hook in his/her/its code.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 22 Nov 42047773 09:40 UTC
I'll create a wrapper function in random.hpp: void RandomSeed(const size_t seed), and this will set the random seed for each internal object which the random functions wrap. Unfortunately the Boost Random library is particularly unwieldy and I haven't yet entirely figured out how to do this...

Neil, what do you think of the --seed option? We've violated the idea of "parameters only in main files" for --help and --info already (since they are special cases), so the question is whether or not --seed also qualifies as a special case.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by nslagle on 13 Apr 42048229 05:41 UTC
I guess I'm not familiar with how the --help and --info options function; if the function the way I think they do, having them available from CLI's header is probably okay as a free-floating feature. I would draw the line there, as --help and --info don't take arguments (at least I think they don't...)

Modifying the seed value of a pseudo random number generator shouldn't be a command line argument that anyone from anywhere clinging to the same process can apply, much to the chagrin of the process owner; while tempting, following the path of implementing a feature which seems to offer a shortcut like this led us, inexorably, to the untenable world of the universal CLI, or rather, CLI for all, and all for CLI.

We should access features of a class/interface/API in more conventional ways. So, if you choose to use a random number generator, and you want to change the seed value, do so through the appropriate interface. If the command line argument seems essential to ease the user's burden in a particular main file case, let the author of that main file create a CLI hook for it.

So there's my response, handmade in exhaustion...

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 7 Jul 42052679 06:38 UTC
r11143 adds mlpack::math::RandomSeed(const size_t) which can be used to set the random seed. I've not added a call to RandomSeed() to the CLI constructor and I think it's better that way. All the necessary programs (GMM, HMM, K-Means, RADICAL) now have that option. As far as I know, our code is compatible across all relatively recent versions of Boost.Random (I went as far back as 1.39.0 and as far forward as 1.48.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment