Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2021_Saag_EastEuropean - SOP002 - EstCWC has broken genotype data #5

Closed
nevrome opened this issue Apr 16, 2021 · 14 comments
Closed

2021_Saag_EastEuropean - SOP002 - EstCWC has broken genotype data #5

nevrome opened this issue Apr 16, 2021 · 14 comments

Comments

@nevrome
Copy link
Member

nevrome commented Apr 16, 2021

Something is wrong with this specific individual, possibly the whole package.

trident forge stops with encountered illegal genotype Hom-Ref with Ref-Allele missing if this individual is included.

Any ideas, @AyGhal and @stschiff?

@AyGhal
Copy link
Contributor

AyGhal commented Apr 16, 2021

I don't know if it is only one individual, but the convertf issue I mentioned in our meeting was:
1 rs3094315 0.02013 752566 3 1
1 rs12124819 0.020242 776546 0 1
e.g. from this package, using convertf messes up Ref/Alt alleles. Which can also causes problem when using PLINK.
I am redoing this package, but maybe I should check all of them.

@nevrome
Copy link
Member Author

nevrome commented Apr 16, 2021

Aha! So this is linked to the issue you mentioned. Would be great if this could be fixed.

Don't worry about the rest of the packages though - we will have to automate this validation anyway (see #6)

@stschiff
Copy link
Member

Aha OK, thank you @AyGhal, that helps a lot. I will check whether my current handling of this is too strict. Perhaps I make a tacit assumption that the first allele would always be reference (didn't think so, though).

@stschiff
Copy link
Member

OK, I checked. I'm definitely not assuming anything. This is really a hard error because apparently in this genotype data there is a Hom-Ref genotype at a position where the first allele is missing (0). Of course, one way to deal with this without throwing an error is to simply mark the genotype in such a case as missing... but I'm worried that I'm missing something here...

@nevrome
Copy link
Member Author

nevrome commented Apr 16, 2021

You can reproduce the error with this code:

mkdir miniposeidon
trident fetch -d miniposeidon -f "*2020_Cassidy_IrishDynastic*,*2021_Saag_EastEuropean*"
trident forge -d miniposeidon -f "<SOP002>,<GNM1007>" -n testpac -o testpac

Error in the genotype data: "encountered illegal genotype Hom-Ref with Ref-Allele missing"

@stschiff
Copy link
Member

I just pushed an update to poseidon-hs (bumped version to 0.14.1), which should at least now give more helpful error messages also indicating which SNP caused the error. If you have time to explore this further, feel free, otherwise I'll have time tonight to check myself.

@stschiff
Copy link
Member

@AyGhal it's clear now that the data for this package is broken. I doubt that any of our main tools caused this, and I blame simply the combination of conversion steps that were necessary for creating this dataset. Could you generate a new dataset either using trident genoconvert or even directly use the new Plink output option of pileupcaller?

@AyGhal
Copy link
Contributor

AyGhal commented Apr 20, 2021

Okay, but is the second one also broken?

@stschiff
Copy link
Member

What do you mean by "second one"?

@AyGhal
Copy link
Contributor

AyGhal commented Apr 20, 2021

The one I prepared on Friday, I will share the link.

@stschiff
Copy link
Member

OK, I think that works now!

@stschiff
Copy link
Member

OK, so I've just tried to forge your new package under /projects1/paleorider/data_published/2021_Saag_EastEurope/genotyping/2021_Saag and that seems to work.

@stschiff
Copy link
Member

@AyGhal could you make sure that this new package is correctly in our local SDAG poseidon repo? And also make any necessary updates to the public repo (pull request...). Also, please update the package version. When all that is done I'll replace the version on the server. Thanks!

@nevrome
Copy link
Member Author

nevrome commented Apr 28, 2021

I guess that is solved now. 👍

@nevrome nevrome closed this as completed Apr 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants