Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing variables from Cassavabase #1

Closed
nmenda opened this issue Sep 23, 2015 · 4 comments
Closed

Add missing variables from Cassavabase #1

nmenda opened this issue Sep 23, 2015 · 4 comments
Assignees

Comments

@nmenda
Copy link
Collaborator

nmenda commented Sep 23, 2015

77, 123, 224, and 256 are missing from the CO file
e.g.
http://www.cassavabase.org/chado/cvterm?action=view&cvterm_id=76806

@nmenda nmenda self-assigned this Sep 23, 2015
@nmenda
Copy link
Collaborator Author

nmenda commented Sep 23, 2015

actually there are more missing terms.
We have 257 in the latest file from cassavabase, but only 239 variable terms in the working copy on this repo.
Working on fixing this .
These terms must have gotten lost in one of the reformatting cycles.

@leova
Copy link

leova commented Sep 24, 2015

Good that you bring this issue up.
I think we should first be clear on the files involved in the issue. Let’s call:

  • File A: the "latest file from cassavabase", presumably, the file before the 2015 curation
  • File B: the TD in template v5 which structure has been discussed and approved and which curated content has been accepted by Afola
  • File C: OBO file converted by Marie from file B. This file is presently on the planteome github.

Is it correct that you identified a lack of 18 variables between file A and file C? (257 variables in file A and 239 variables in file C)?

What is file A?

If you mean that file A is the OBO on https://github.com/nextgencassava/cassava_ontology, I cannot understand because:

  • 1/ I count 256 terms among those 241 are variables (even though they are called traits in this file). Indeed CGIAR cassava trait ontology, agronomic trait, morphological trait, physiological trait, quality trait, stress trait, abiotic stress trait, biotic stress trait, bacterial disease, viral disease, fungal disease, insect damage, derives_from, method_of and scale_of are terms that are not variables. It would then mean that only 2 variables are missing (241-239=2)
  • 2 /The OBO does not include the term 0000256 that you identified as missing.
    By the way, I cannot find 0000256 anywhere (neither on http://www.cassavabase.org/chado/cvterm?action=view&cvterm_id=70760 nor on http://www.cropontology.org/rdf/CO_334:0000256, nor on any file I have locally).

As I could not look at the file A you meant, I was not able to derive the full list of missing terms. Nevertheless, I have worked on the other examples you gave (15, 77, 123, 224) and I have identified 2 causes of losing.

1/ the terms were not present in the original working curation file

On 04/03/2015, Afola sent an updated version of the cassava ontology to Elizabeth. He sent 2 versions of this ontology:

  • File A1: an excel TD under template version 4 with 242 Trait-Method-Scale triplets (remark: there are actually 143 triplets in the file but it includes "days to flower 109" that Afola replaced by "root constriction 109")
  • File A2: an OBO file with 248 variables -back then called traits

Leave aside CO_334:0000027 bacterial disease, CO_334:0000028 viral disease, CO_334:0000029 fungal disease, CO_334:0000030 insect damage, the OBO has 2 variables that are absent of the TD: CO_334:0000077 post-harvest physiological deterioration and CO_334:0000123 plant height with leaf.

At that time, I had assumed that these two files were equivalent so I worked on file A1, the excel TD and not on file A2, the OBO. This might accounts for the losing of 2 variables (CO_334:000077: post-harvest physiological deterioration and CO_334:0000123: plant height with leaf)

2/ the terms were lost during the curation/formatting process

I have looked for Ids that have been lost while curating, converting, exchanging files by comparing file A1 and file B (I saw no conversion issue between file B and file C). I have looked for ids that were present in file A1 and that disappeared in file B and found only 2 variables: CO_334:0000015 Harvest Index and CO_334:0000224 staygreen.

I have not checked so I cannot say when and why they got lost. But I apologize in advance if the losing of these 2 variables is my responsibility.

My conclusion

To the best of my knowledge and understanding, I can only make sense of this issue by saying that only CO_334:0000015 Harvest Index and CO_334:0000224 staygreen have been lost during the curation/formatting/conversions and that only CO_334:000077: post-harvest physiological deterioration and CO_334:0000123: plant height with leaf have been left out of the curation process.

Thanks for sharing more information that can help identify other missing variables.

@nmenda
Copy link
Collaborator Author

nmenda commented Sep 24, 2015

Leo,

I checked the versions again, and it looks like you are correct, and the only missing terms are 0000015, 0000077, 0000123, 0000224 ! We might have other terms that did not make it into the CO version on 4/3/15.
I will these 4 now and will try to add proper methods and scales.
If there are more variables that got lost in the cracks between April and now we will add them again to this OBO file.

@nmenda
Copy link
Collaborator Author

nmenda commented Sep 25, 2015

7fa2a63 closes this issue

@nmenda nmenda closed this as completed Sep 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants