Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Canto alleles match what's in Chado #2776

Closed
kimrutherford opened this issue Sep 5, 2023 · 8 comments
Closed

Make Canto alleles match what's in Chado #2776

kimrutherford opened this issue Sep 5, 2023 · 8 comments
Assignees

Comments

@kimrutherford
Copy link
Member

While working on #2770 I noticed a bunch of alleles names in Canto sessions that are inconsistent with what's in Chado. Mostly these aren't a surprise.

The most common thing is allele where the type or description is "unknown" in Canto but the allele is merged with a gene with the same name in Chado. For example this allele in Canto is merged and ends up as amino_acid_mutation / G345R in Chado:

09b5fcafa2826d4a-1  myo2-E1 

In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?

There are also some nonsense mutations that need fixing, for example:

155dba22ce11dc4f:  SPAC17G6.05c nonsense mutation E644->stop 

I think now would be a good time to tidy up as many of these alleles as possible, before assigning unique identifiers.

@kimrutherford
Copy link
Member Author

I started writing a script to match up Chado alleles and Canto alleles by comparing allele names, types and descriptions but then I thought of a more reliable plan.

There are internal IDs for alleles in Canto that uniquely identify alleles within sessions. We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later. It will also make it easier to assign stable IDs to the alleles in Canto.

kimrutherford added a commit to pombase/pombase-legacy that referenced this issue Sep 6, 2023
kimrutherford added a commit to pombase/pombase-chado that referenced this issue Sep 7, 2023
kimrutherford added a commit to pombase/pombase-chado that referenced this issue Sep 7, 2023
Cope with alleles from PHAF files which don't have those IDs.

Refs pombase/canto#2776
@kimrutherford
Copy link
Member Author

We can store those internal IDs in Chado which will make it easy to match up Chado and Canto IDs later.

That's implemented now and committed. I'm running a local load which seems fine so far and I'll check the main load tomorrow.

@ValWood
Copy link
Member

ValWood commented Sep 7, 2023

In this sort of case is there any problem with setting the allele type and description in the Canto sessions where it's unknown?

We discussed this and the consensus was 'no problem' because the descriptions are already addon on Chado loading. It makes sense therefore to correct them in Canto.

@kimrutherford
Copy link
Member Author

@kimrutherford
Copy link
Member Author

After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado. I hope to apply this fix over the weekend after applying: #2642

@ValWood
Copy link
Member

ValWood commented Sep 22, 2023

fingers x'd.

kimrutherford added a commit that referenced this issue Sep 25, 2023
@kimrutherford
Copy link
Member Author

After careful testing, I now a have a script ready to go that will set allele details Canto that are missing using the details from Chado.

I've applied that script now. We'll probably need to run it again just before the final switch over to stable allele IDs to catch any alleles that have been added between now and then. See: #2770

@ValWood
Copy link
Member

ValWood commented Sep 25, 2023

Great! Hopefully no new probelm alleles will be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants