-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large number of sites overflows in parser #8
Comments
I'll try to fix this soon, I don't think that the fix is easy if you Alexis On 11.05.2015 19:43, A.P. Jason de Koning wrote:
Alexandros (Alexis) Stamatakis Research Group Leader, Heidelberg Institute for Theoretical Studies |
Dear Jason, I think that I have fixed it but I need access to the dataset for testing. Cheers, Alexis On 20.05.2015 21:36, Alexandros Stamatakis wrote:
Alexandros (Alexis) Stamatakis Research Group Leader, Heidelberg Institute for Theoretical Studies |
Hey Alexis, Thanks so much and sorry for the delay in responding. You can download the compressed dataset (5GB, sorry!) here http://hyperion.ucalgary.ca/example.phy.bz2. I’ll leave the link up for a couple of days. If you have a problem downloading it, you could just simulate a similar dataset. The dimensions are 7 OTUs and 3,036,303,846 sites with very little divergence (most of this will compress out if indexing site patterns). Best wishes,
A.P. Jason de Koning, Ph.D. Assistant Professor Health Sciences Centre 1150 Suite Office: 403-210-7638 | Fax: 403-270-8928
|
Hi Jason, The modified parser works now, how quickly do you need the fix? I am in the middle of a larger re-design, thus the code with the fixed Below is the output of the parser, does that look right? It looks rather Alexis Pattern compression: ON Alignment has 200630281 completely undetermined sites that will be Your alignment has 5956 unique patterns Under CAT the memory required by ExaML for storing CLVs and tip vectors Under GAMMA the memory required by ExaML for storing CLVs and tip Please note that, these are just the memory requirements for doing Binary and compressed alignment file written to file HUGE.binary Parsing completed, exiting now ... On 26.05.2015 23:06, A.P. Jason de Koning wrote:
Alexandros (Alexis) Stamatakis Research Group Leader, Heidelberg Institute for Theoretical Studies |
Hey Alexis, this looks approximately correct to me. We’d previously run just the variable sites from this dataset and had similar results. Can you possibly make the binary output of the parser for this dataset available to us for download? Or allow us access to the revised parser? This is for the last piece of a student project that is otherwise complete. Thanks! Jason
|
just sent the code to your university email, alexis On 29.05.2015 16:23, A.P. Jason de Koning wrote:
Alexandros (Alexis) Stamatakis Research Group Leader, Heidelberg Institute for Theoretical Studies |
In
axml.h
, therawdata->sites
variable is defined as typeint
. Attempting to compress an alignment with about 3B positions is resulting in a "too few sites" error, presumably because we are overflowing theint
. We will also have more than 32k site patterns after compression, and some of these will occur more than 32k times in the dataset - so we will still be causing overflows in the site/alias indexes even after changingsites
tolong long int
. Is there a quick fix for this? Thanks!The text was updated successfully, but these errors were encountered: