-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Canto alleles match what's in Chado #2776
Comments
I started writing a script to match up Chado alleles and Canto alleles by comparing allele names, types and descriptions but then I thought of a more reliable plan. There are internal IDs for alleles in Canto that uniquely identify alleles within sessions. We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later. It will also make it easier to assign stable IDs to the alleles in Canto. |
Cope with alleles from PHAF files which don't have those IDs. Refs pombase/canto#2776
That's implemented now and committed. I'm running a local load which seems fine so far and I'll check the main load tomorrow. |
We discussed this and the consensus was 'no problem' because the descriptions are already addon on Chado loading. It makes sense therefore to correct them in Canto. |
It will help to do this issue first: |
After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado. I hope to apply this fix over the weekend after applying: #2642 |
fingers x'd. |
I've applied that script now. We'll probably need to run it again just before the final switch over to stable allele IDs to catch any alleles that have been added between now and then. See: #2770 |
Great! Hopefully no new probelm alleles will be added. |
While working on #2770 I noticed a bunch of alleles names in Canto sessions that are inconsistent with what's in Chado. Mostly these aren't a surprise.
The most common thing is allele where the type or description is "unknown" in Canto but the allele is merged with a gene with the same name in Chado. For example this allele in Canto is merged and ends up as
amino_acid_mutation
/ G345R in Chado:In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?
There are also some nonsense mutations that need fixing, for example:
I think now would be a good time to tidy up as many of these alleles as possible, before assigning unique identifiers.
The text was updated successfully, but these errors were encountered: