-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Group file sorted differently in SMMAT vs. SMMAT.meta #46
Comments
Hi Grace, Thank you for your interest in SMMAT! I have not seen this issue before, but I guess the problem was probably because this tri-allelic marker was ordered differently in the GDS file and the group definition file. In SMMAT (which uses the GDS file to generate meta-analysis files), the variants are sorted based on the variant.id. In SMMAT.meta, since we assume no access to individual GDS files, we could only sort them based on chr and pos. For tri-allelic markers with the same chr and pos, it is possible that the order is different in the GDS files (not necessarily alphabetical). If that was the case, the easiest solution would be to use a group definition file with variants in the same order as your GDS files. For example, if your C/G is before C/T in your group definition file, but C/T is before C/G in the GDS files, you might be able to fix the problem by switching C/G and C/T in your group definition file, without having to ask each cohort to rerun. Please let me know if it does not work. Best, |
Thanks Han for the quick reply!
I expect it will be complicated if different cohorts have multiallelic
variants ordered differently in their GDS files… but so far we only see
this issue with one cohort. I will go with your suggestion and update you
on how it goes.
Best wishes,
Grace
On Tue, 19 Jul 2022 at 18:05, Han Chen ***@***.***> wrote:
Hi Grace,
Thank you for your interest in SMMAT! I have not seen this issue before,
but I guess the problem was probably because this tri-allelic marker was
ordered differently in the GDS file and the group definition file. In SMMAT
(which uses the GDS file to generate meta-analysis files), the variants are
sorted based on the variant.id. In SMMAT.meta, since we assume no access
to individual GDS files, we could only sort them based on chr and pos. For
tri-allelic markers with the same chr and pos, it is possible that the
order is different in the GDS files (not necessarily alphabetical).
If that was the case, the easiest solution would be to use a group
definition file with variants in the same order as your GDS files. For
example, if your C/G is before C/T in your group definition file, but C/T
is before C/G in the GDS files, you might be able to fix the problem by
switching C/G and C/T in your group definition file, without having to ask
each cohort to rerun. Please let me know if it does not work.
Best,
Han
—
Reply to this email directly, view it on GitHub
<#46 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVGC5P5EOKEZK4CNTTMSZQ3VU3G5BANCNFSM54AMQUQQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
Best wishes,
Grace
|
Hello. I tried alphabetical. I tried combine the variant positions, outputting a GDS then using that order. I tried running a fake dataset and using the outputted scores file. None of them worked. I ended up just dropping mulitiallelic positions. Thanks, |
Hi Andrew, Have you tried fixing the order in your group definition file (instead of the order in GDS) as I suggested above? If you could send me a small reproducible example, I am happy to take a look. Thanks, |
Hi Han, I haven't tried your suggestion on changing the order of variants in the group file, but wouldn't there also be a possibility that the issue may be fixed for one cohort, but the same error occurs on another cohort? I'm still trying to wrap my head around how the analysis is being performed in the SMMAT.meta function so this might be a stupid question, but is there a fundamental reason for why the variants needs to be strictly ordered to perform the analysis? I understand we need to "know" the order of variants because we're dealing with score files and covariance matrices across multiple cohorts that may have different sets of variants. But when it comes to a point of collecting the variants from the score files and the covariance matrices for each groups to run the meta-analysis, what's the reason for needing the files across different cohorts to conform to the order of variants in the group file? Best wishes, |
Hi YC, That's a very good question. Best, |
Hi @hanchenphd, I have thoroughly gone through the Both the single-cohort and meta-analysis are performed per-group. When running the meta-analysis, before performing the calculations, what ultimately happens is you read the score file and covariance matrix files across all cohorts and create a large score vector ( The current code relies heavily on indexing to match variants across the group file and per-cohort summary statistics and covariance matrix. Because of the reliance on indexing to align variants, it appears you had added the check of whether the variants in the score file followed the order of variants in the group file. When running SMMAT for the single-cohort analysis, the per-group analysis results (variant scores and covariance matrix) were appended to the output If we change the code to match on variant ID instead of relying on indices, I believe we can make the analysis work without needing to have access to the individual GDS files to confirm the order of variants. I’m not yet 100% certain, but even now with the current indexing-based code, I think it’s okay to remove that check as well. I hope my explanation made sense 😅. I wanted to share my thoughts to confirm with you whether I’m correct. I am going to work on adapting the code to match on variant ID instead of indexing so I can run my meta-analysis, because I cannot ask our collaborators to rerun all of their analyses😅. I'll let you know how this goes. Best wishes, |
Dear Han,
We are running a large meta-analysis and have collected intermediate files from several cohorts. I realised however that SMMAT.meta fails at the following check at specifically multiallelic sites, despite ensuring all cohorts use the same group file.
An example of where this fails (for a single cohort) is:
In this case index 823 comes before 822 which causes the error. I am guessing this is because SMMAT did not initially order variants according to ALT alleles at multiallelic sites.
Is there any way around this?
Edit:
I have just read about the issue here regarding SMMAT being designed for biallelics. Would love to know what you think anyway, and if there are (near) future plans to include multiallelic variants.
Thanks for your help in advance,
Grace
The text was updated successfully, but these errors were encountered: