-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
masked alignment different from global tree in GISAID #21
Comments
Hi @lpipes, there are two parts to this answer. First, the most recent global tree contains almost 3M sequences, although that's still fewer than in the alignment. The reason for the discrepancy is that the alignment contains all sequences, but the tree is built only with those that are good enough to build a tree from. The older trees only had 600K sequences, because that's all fasttree could handle. These were subsampled to include all of the most recent sequences, and something like 100K other sequences for context. In both cases, the way to get an alignment that has only the sequences contained in the tree is to pull out of the alignment just the sequences you want. To do that, I'd:
Hope that helps! Rob |
Hi Rob, Thanks for your explanation. The tree I recently downloaded (dated 2021-09-26) only had ~600K sequences in it. But I just downloaded the most recent tree (dated 2021-10-05) which had ~3million. Using
I also encountered this error with the previous *tar.xz files that were posted. Any idea on what could be the problem? -Lenore |
Huh, that's odd. I would have done the same, like:
tar -xf alignment.tar.xz
I'll have to take a look next week. But you could try doing the xz first,
like:
xz -d alignment.tar.xz
then un-tarring it after that.
…On Fri, 8 Oct 2021 at 17:09, Lenore Pipes ***@***.***> wrote:
Hi Rob,
Thanks for your explanation. The tree I recently downloaded (dated
2021-09-26) only had ~600K sequences in it. But I just downloaded the most
recent tree (dated 2021-10-05) which had ~3million. Using faSomeRecords
makes sense but I am actually having a lot of trouble extracting the MSA
from the tar file.
tar xf mmsa_2021-10-06.tar.xz xz: (stdin): Unexpected end of input tar:
Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not
recoverable: exiting now
I also encountered this error with the previous *tar.xz files that were
posted. Any idea on what could be the problem?
-Lenore
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG2SE7RWG3TPUEGFZNW2WDUF2DJ5ANCNFSM5FSKE2CQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Rob Lanfear
Division of Ecology and Evolution,
Research School of Biology,
The Australian National University,
Canberra
www.robertlanfear.com
|
Hmm seems like that doesn't work either ugh... |
In fact, I've tried to extract every single MSA file that they have posted and all of them have an |
Hello, I tried to download the masked alignment from GISAID but it contains >3 million sequences while the global tree they uploaded is only for ~600K sequences. Do you know where I can download the MSA file for the most recent global tree? Thanks.
The text was updated successfully, but these errors were encountered: