-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuing a random number sequence #16
Comments
Hi @sinkovit , else{
//Write your custom procedure here
size_t n_seeds = 99999999;
ofstream file ("/tmp/random_seeds.csv");
for(size_t i=0;i!=n_seeds;++i){
file<<draw_random_64bits_seed()<<endl;
}
} Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure: I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique. Given this information I think this is a won't fix (at least not in the near future) Quentin |
Hi Quentin,
Thanks for the follow up and I agree that this could be left off the list of software improvements.
By the way, I recently used IGoR to generate some repertoires containing on the order of one billion productive reads.
…-- Bob
From: qmarcou <notifications@github.com>
Reply-To: qmarcou/IGoR <reply@reply.github.com>
Date: Saturday, August 25, 2018 at 12:03 PM
To: qmarcou/IGoR <IGoR@noreply.github.com>
Cc: "Sinkovits, Robert" <sinkovit@sdsc.edu>, Mention <mention@noreply.github.com>
Subject: Re: [qmarcou/IGoR] Continuing a random number sequence (#16)
Hi @sinkovit<https://github.com/sinkovit> ,
Sorry for the long time it took me to answer this one.
I originally looked into making this possible, however it would require to change quite a few functions for generating functions.
I have conducted a small experiment using IGoR's new random seed generator using the following piece of code in the custom code section of the main:
else{
//Write your custom procedure here
size_t n_seeds = 99999999;
ofstream file ("/tmp/random_seeds.csv");
for(size_t i=0;i!=n_seeds;++i){
file<<draw_random_64bits_seed()<<endl;
}
}
Importing the corresponding file with pandas in python and making a histogram out of it with 5000 bins I get the following figure:
[seed_test]<https://user-images.githubusercontent.com/18257721/44621623-292a1b00-a877-11e8-9b4c-f7fe84b58eef.png>
I think this shows that the seed generator is indeed uniformly random (1,84x10^19 is 2^64, the maximum value of a 64 bits integer) and getting the same seed twice is very unlikely. I have actually checked and all the 99999999 seeds were unique.
Given this information I think this is a won't fix (at least not in the near future)
Quentin
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE_YMPqRhPag33A9y1CwEkYFOM45GxFzks5uUZ-NgaJpZM4VH60r>.
|
Hi Bob, |
Hi Quentin,
I don’t think that you need to parallelize the sequence generation process. I just ran ten instances of IGoR in parallel using different seeds. Given the repeat length of the RNG, it’s highly unlikely that the different seeds would generate overlapping sets of random numbers. I was able to confirm by constructing rarefaction curves (number of unique sequences vs. total number of sequences).
…-- Bob
From: qmarcou <notifications@github.com>
Reply-To: qmarcou/IGoR <reply@reply.github.com>
Date: Thursday, September 6, 2018 at 9:54 AM
To: qmarcou/IGoR <IGoR@noreply.github.com>
Cc: "Sinkovits, Robert" <sinkovit@sdsc.edu>, Mention <mention@noreply.github.com>
Subject: Re: [qmarcou/IGoR] Continuing a random number sequence (#16)
Hi Bob,
I have closed the issue but great to hear that you manage to produce so many reads!
Out of curiosity: has this process taken a lot of computation time? Did you feel it was a bottleneck in your analysis? Do you think parallelizing random sequence generation is worth doing?
It would most likely be limited by i/o time then.
Best,
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#16 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE_YMGIdSNTN3fi1oXlfJ_0Lq9jIsaK6ks5uYVM_gaJpZM4VH60r>.
|
In our research, we anticipate the need to generate very large synthetic repertoires. Since this process can take a long time, it would be nice to have the ability to pick up the random number sequence where we left off so that the repertoire generation does not need to be done as a single compute job.
Although we can probably choose a new seed for each run - for a 64 bit random number generator it is highly unlikely that we would choose a seed that overlaps the previous sequence - it would be better to continue where we stopped.
This is a low-priority request and we are happy to assist.
The text was updated successfully, but these errors were encountered: