-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGoR sequence generation - random number generation issue #1
Comments
Hi @sinkovit , PS: although I guess you did it for debuging purposes, one should probably refrain to chose actually chose the seed (in your case 12345678) as it will most likely not exploit all available 64bits. Note that is is however mostly a problem if you're running many different simulations. |
This issue is now fully fixed:
All this is already available on the dev branch and will constitute part of IGoR's v1.2.0 next release. |
I’ve been using the IGoR software to generate simulated T cell beta chain repertoires and ran into an issue that I believe is related to the random number generation. My colleagues at Vanderbilt University Medical Center, Cinque Soto and James Crowe, suggested I get in touch with you.
We need to generate very large repertoires to compare with the results obtained from deep immunome sequencing being carried out at Vanderbilt. When I was processing IGoR output, I noticed that the same results are generated every 52,895,649 sequences. We discovered this when calculating the number of unique clonotypes as a function of the number of sequences – after 52M+ reads, the number of clonotypes was unchanged.
Here’s the igor command line that we’ve been using. Note that I tried this with two different seeds and get the same behavior and same cycle length (52,895,649 sequences).
igor -threads -1 -set_wd $PWD -batch tcr_beta -species human -chain beta
-generate 1100000000 --CDR3 --seed 12345678 --noerr
For reasons that I don’t understand, the first few sequences are not repeated, but after a short time the random number generator settles into a predictable cycle.
The example grep output below shows how the 750th sequence is repeated. As you probably know, the “-n” option to grep prefixes the output with the line number for the matching line. You’ll notice that the difference in line numbers is always a multiple of 52,895,649 (e.g. 1,057,913,732 - 752 = 52,895,649 x 20)
$ grep -n 'TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT' \ generated_seqs_noerr_CDR3_info.csv
752:750,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
52896401:52896399,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
105792050:105792048,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
158687699:158687697,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
211583348:211583346,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
264478997:264478995,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
317374646:317374644,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
370270295:370270293,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
423165944:423165942,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
476061593:476061591,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
528957242:528957240,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
581852891:581852889,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
634748540:634748538,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
687644189:687644187,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
740539838:740539836,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
793435487:793435485,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
846331136:846331134,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
899226785:899226783,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
952122434:952122432,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
1005018083:1005018081,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
1057913732:1057913730,TGTGCCAGCATCCCTACCTTTTGGTCACCGGGGGGGGAGCACAGATACGCAGTATTTT,1,0
$ grep -n '(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)' \ generated_realizations_noerr.csv
752:750;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
52896401:52896399;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
105792050:105792048;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
158687699:158687697;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
211583348:211583346;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
264478997:264478995;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
317374646:317374644;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
370270295:370270293;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
423165944:423165942;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
476061593:476061591;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
528957242:528957240;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
581852891:581852889;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
634748540:634748538;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
687644189:687644187;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
740539838:740539836;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
793435487:793435485;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
846331136:846331134;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
899226785:899226783;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
952122434:952122432;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
1005018083:1005018081;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
1057913732:1057913730;(78);(9);(1);(11);(12);(5);(5);(18);(3,1,1,1,3,0,1,1,3,3,3,3,2,2,3,1,0,1);(3);(0,2,2)
Any help would be greatly appreciated. Let me add that this is fantastic software. We’ve been looking for a rigorous way to generate simulated repertoires and we’re convinced that IGoR is the right way to do this.
The text was updated successfully, but these errors were encountered: