Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuing a random number sequence #16

Closed
sinkovit opened this issue Jul 9, 2018 · 4 comments
Closed

Continuing a random number sequence #16

sinkovit opened this issue Jul 9, 2018 · 4 comments

Comments

@sinkovit
Copy link

sinkovit commented Jul 9, 2018

In our research, we anticipate the need to generate very large synthetic repertoires. Since this process can take a long time, it would be nice to have the ability to pick up the random number sequence where we left off so that the repertoire generation does not need to be done as a single compute job.

Although we can probably choose a new seed for each run - for a 64 bit random number generator it is highly unlikely that we would choose a seed that overlaps the previous sequence - it would be better to continue where we stopped.

This is a low-priority request and we are happy to assist.

@qmarcou
Copy link
Owner

qmarcou commented Aug 25, 2018

Hi @sinkovit ,
Sorry for the long time it took me to answer this one.
I originally looked into making this possible, however it would require to change quite a few functions for generating functions.
I have conducted a small experiment using IGoR's new random seed generator using the following piece of code in the custom code section of the main:

	else{
		//Write your custom procedure here
		size_t n_seeds = 99999999;
		ofstream file ("/tmp/random_seeds.csv");
		for(size_t i=0;i!=n_seeds;++i){
			file<<draw_random_64bits_seed()<<endl;
		}
	}

Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure:
seed_test

I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique.

Given this information I think this is a won't fix (at least not in the near future)

Quentin

@sinkovit
Copy link
Author

sinkovit commented Aug 27, 2018 via email

@qmarcou qmarcou closed this as completed Sep 5, 2018
@qmarcou
Copy link
Owner

qmarcou commented Sep 6, 2018

Hi Bob,
I have closed the issue but great to hear that you manage to produce so many reads!
Out of curiosity: has this process taken a lot of computation time? Did you feel it was a bottleneck in your analysis? Do you think parallelizing random sequence generation is worth doing?
It would most likely be limited by i/o time then.
Best,

@sinkovit
Copy link
Author

sinkovit commented Sep 6, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants