Continuing a random number sequence #16

sinkovit · 2018-07-09T15:29:08Z

In our research, we anticipate the need to generate very large synthetic repertoires. Since this process can take a long time, it would be nice to have the ability to pick up the random number sequence where we left off so that the repertoire generation does not need to be done as a single compute job.

Although we can probably choose a new seed for each run - for a 64 bit random number generator it is highly unlikely that we would choose a seed that overlaps the previous sequence - it would be better to continue where we stopped.

This is a low-priority request and we are happy to assist.

qmarcou · 2018-08-25T19:03:41Z

Hi @sinkovit ,
Sorry for the long time it took me to answer this one.
I originally looked into making this possible, however it would require to change quite a few functions for generating functions.
I have conducted a small experiment using IGoR's new random seed generator using the following piece of code in the custom code section of the main:

	else{
		//Write your custom procedure here
		size_t n_seeds = 99999999;
		ofstream file ("/tmp/random_seeds.csv");
		for(size_t i=0;i!=n_seeds;++i){
			file<<draw_random_64bits_seed()<<endl;
		}
	}

Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure:

I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique.

Given this information I think this is a won't fix (at least not in the near future)

Quentin

sinkovit · 2018-08-27T02:38:32Z

Hi Quentin, Thanks for the follow up and I agree that this could be left off the list of software improvements. By the way, I recently used IGoR to generate some repertoires containing on the order of one billion productive reads.

…

-- Bob From: qmarcou <notifications@github.com> Reply-To: qmarcou/IGoR <reply@reply.github.com> Date: Saturday, August 25, 2018 at 12:03 PM To: qmarcou/IGoR <IGoR@noreply.github.com> Cc: "Sinkovits, Robert" <sinkovit@sdsc.edu>, Mention <mention@noreply.github.com> Subject: Re: [qmarcou/IGoR] Continuing a random number sequence (#16) Hi @sinkovit<https://github.com/sinkovit> , Sorry for the long time it took me to answer this one. I originally looked into making this possible, however it would require to change quite a few functions for generating functions. I have conducted a small experiment using IGoR's new random seed generator using the following piece of code in the custom code section of the main: else{ //Write your custom procedure here size_t n_seeds = 99999999; ofstream file ("/tmp/random_seeds.csv"); for(size_t i=0;i!=n_seeds;++i){ file<<draw_random_64bits_seed()<<endl; } } Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure: [seed_test]<https://user-images.githubusercontent.com/18257721/44621623-292a1b00-a877-11e8-9b4c-f7fe84b58eef.png> I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique. Given this information I think this is a won't fix (at least not in the near future) Quentin — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE_YMPqRhPag33A9y1CwEkYFOM45GxFzks5uUZ-NgaJpZM4VH60r>.

qmarcou · 2018-09-06T16:54:16Z

Hi Bob,
I have closed the issue but great to hear that you manage to produce so many reads!
Out of curiosity: has this process taken a lot of computation time? Did you feel it was a bottleneck in your analysis? Do you think parallelizing random sequence generation is worth doing?
It would most likely be limited by i/o time then.
Best,

sinkovit · 2018-09-06T17:42:33Z

Hi Quentin, I don’t think that you need to parallelize the sequence generation process. I just ran ten instances of IGoR in parallel using different seeds. Given the repeat length of the RNG, it’s highly unlikely that the different seeds would generate overlapping sets of random numbers. I was able to confirm by constructing rarefaction curves (number of unique sequences vs. total number of sequences).

…

-- Bob From: qmarcou <notifications@github.com> Reply-To: qmarcou/IGoR <reply@reply.github.com> Date: Thursday, September 6, 2018 at 9:54 AM To: qmarcou/IGoR <IGoR@noreply.github.com> Cc: "Sinkovits, Robert" <sinkovit@sdsc.edu>, Mention <mention@noreply.github.com> Subject: Re: [qmarcou/IGoR] Continuing a random number sequence (#16) Hi Bob, I have closed the issue but great to hear that you manage to produce so many reads! Out of curiosity: has this process taken a lot of computation time? Did you feel it was a bottleneck in your analysis? Do you think parallelizing random sequence generation is worth doing? It would most likely be limited by i/o time then. Best, — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE_YMGIdSNTN3fi1oXlfJ_0Lq9jIsaK6ks5uYVM_gaJpZM4VH60r>.

qmarcou added wontfix enhancement labels Aug 25, 2018

qmarcou closed this as completed Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continuing a random number sequence #16

Continuing a random number sequence #16

sinkovit commented Jul 9, 2018

qmarcou commented Aug 25, 2018

sinkovit commented Aug 27, 2018 via email

qmarcou commented Sep 6, 2018

sinkovit commented Sep 6, 2018 via email

Continuing a random number sequence #16

Continuing a random number sequence #16

Comments

sinkovit commented Jul 9, 2018

qmarcou commented Aug 25, 2018

sinkovit commented Aug 27, 2018 via email

qmarcou commented Sep 6, 2018

sinkovit commented Sep 6, 2018 via email