Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the check for taxon constraint violations #3336

Merged
merged 10 commits into from
Aug 27, 2024
Merged

Conversation

gouttegd
Copy link
Collaborator

Since last November, we normally have a check in place to automatically detect any violations of the taxon constraints set forth in the ontology.

Unfortunately, since we migrated to ODK 1.5 back in March, that check has been neutered. This is because the ODK 1.5 defaults to stripping most annotation properties from the imported modules. One of the properties that ended up being stripped is OMO:0002000 (“defined by construct”), which is used by ROBOT‘s expand command to inject constructed axioms into the ontology wherever the annotation is used. In particular, the expansion of RO:0002175 (“present in taxon“) is a critical step in the aforementioned check for violations of taxon constraints. Without that property, RO:0002175 is, in effect, meaningless, and we cannot enforce any constraint that depends on it.

This PR updates the ODK configuration to stop it from stripping the OMO:0002000 annotation property when refreshing the imports, thereby restoring the enforcement of taxon constraints.

My sincerest apologies to the Uberon community for having missed that issue not once, but twice: when we implemented the annotation stripping in ODK 1.5 and when we first updated the imports here after the 1.5 migration.

Since version 1.5, the ODK import system strips by default almost all
annotation properties from the imported modules, except rdfs:label and
IAO:0000115. This is intended to reduce the size of the import modules,
on the rationale that most annotations are "not important" anyway.

Unfortunately, this results in the OMO:0002000 annotation property being
stripped from the RO import, and that property is definitely not a "not
important" one, since it defines the axioms that ROBOT's `expand`
command should inject into the ontology wherever that annotation is
used.

In particular, the expansion of RO:0002175 (`present_in_taxon`) is a
critical step of the check for taxon constraint violations. As a result
of OMO:0002000 being stripped, that expansion is not performed, and the
taxon constraints check is inoperant -- it has been inoperant ever since
we migrated to ODK 1.5, back in March 2024!

This commit explicitly declares which annotation properties should be
preserved by the ODK when refreshing the import modules, instead of
relying on the ODK default (rdfs:label and IAO:0000115), to make sure
that OMO:0002000 is among them.
This is to ensure we get the OMO:0002000 annotations back immediately,
without waiting for the next time the imports would normally be
refreshed.
@gouttegd gouttegd self-assigned this Aug 13, 2024
@gouttegd
Copy link
Collaborator Author

Unfortunately, in the 6 months since we did not have a functioning TC violations check, TC violations did find their way in the ontology.

First one:

Possible fix: Removing the following axiom:

spleen primordium part of some dorsal mesogastrium

Without knowing much about the development of the spleen, it seems like a bold claim to state that the spleen primordium is always part of the dorsal mesogastrium, in any species where a spleen (and therefore a spleen primordium) exists.

@gouttegd
Copy link
Collaborator Author

gouttegd commented Aug 13, 2024

Next violation:

Here I am skeptical about that property chain:

has potential to develop into o in taxon SubPropertyOf: in taxon

Just because a “vertebra cartilage element“ may develop into a vertebra should not necessarily imply, I believe, that such elements can only exist in taxa where vertebrae exist.

That chain does not exist in upstream RO, it is explicitly added in Uberon:

[Typedef]
id: in_taxon
name: in taxon
xref: RO:0002162
holds_over_chain: has_potential_to_develop_into in_taxon

@balhoff @cmungall Thoughts?

@gouttegd
Copy link
Collaborator Author

The last one has its cause in CL, I believe.

According to the taxon notes on placenta labyrinth, Uberon is correct here, so one of the CL assertions is wrong:

@matentzn
Copy link
Contributor

I am very sorry to have caused this - my mental reasoner should have been alert when answering the question: are there Annotation Properties that are needed for logical reasoning - as AP != logic was so ingrained in my mind it didnt make the leap.

@anitacaron
Copy link
Collaborator

anitacaron commented Aug 14, 2024

It seems the taxon constraint QC is still not working in the GitHub Action. Shouldn't this PR fail and post the comment with the violations?

@gouttegd
Copy link
Collaborator Author

It seems the taxon constraint QC is still not working in the GitHub Action. Shouldn't this PR fail and post the comment with the violations?

Indeed. There are other problems than the non-expansion of OMO:0002000, I am looking into it (I am doing some tests in #3337). From what I can tell for now, there are at least two more problems:

  • The violations I highlighted above (that I found by running the check locally on my machine with ROBOT 1.9.6) are not caught by ROBOT 1.9.5 (probably because 1.9.5 uses ELK 0.5 while 1.9.6 uses ELK 0.6). The CI check is done with ODK 1.5 which still uses ROBOT 1.9.5 (1.9.6 was not released yet when we put ODK 1.5 out back in March), so that alone is enough to explain why the CI check passes on this PR.

  • But even then, when running the test suite with ODK 1.5.2 (which includes ROBOT 1.9.6), the violations are caught (see log, but the workflow somehow does not fail. I am currently at a loss understanding why that is the case.

@gouttegd
Copy link
Collaborator Author

But even then, when running the test suite with ODK 1.5.2 (which includes ROBOT 1.9.6), the violations are caught (see log, but the workflow somehow does not fail. I am currently at a loss understanding why that is the case.

OK, that was a false problem, just a side effect of one of the changes I made in #3337 for testing.

So, the TC check now (with this PR) is, in fact, working again, it’s just that the ODK 1.5 that we use in CI doesn’t catch what ODK 1.5.2 is catching.

@anitacaron @matentzn Any objection to switch to ODK 1.5.2 in Uberon QC? Presumably Anita does the releases with 1.5.2 anyway, so it’d make sense to use 1.5.2 in QC as well.

Copy link

This PR violates some taxon constraints. Here is what the reasoner has to say:

http://purl.obolibrary.org/obo/UBERON_0001250 in taxon http://purl.obolibrary.org/obo/NCBITaxon_7955 SubClassOf Nothing

http://purl.obolibrary.org/obo/CL_2000062 in taxon http://purl.obolibrary.org/obo/NCBITaxon_9606 SubClassOf Nothing

http://purl.obolibrary.org/obo/UBERON_0011094 in taxon http://purl.obolibrary.org/obo/NCBITaxon_7764 SubClassOf Nothing

Axiom Impact

Axioms used 3 times

Axioms used 2 times

Axioms used 1 times

Ontologies used:

@balhoff
Copy link
Member

balhoff commented Aug 14, 2024

Next violation:

Here I am skeptical about that property chain:

has potential to develop into o in taxon SubPropertyOf: in taxon

Just because a “vertebra cartilage element“ may develop into a vertebra should not necessarily imply, I believe, that such elements can only exist in taxa where vertebrae exist.

That chain does not exist in upstream RO, it is explicitly added in Uberon:

[Typedef]
id: in_taxon
name: in taxon
xref: RO:0002162
holds_over_chain: has_potential_to_develop_into in_taxon

@balhoff @cmungall Thoughts?

I'm not sure, but @wdahdul may also have an opinion about this.

@gouttegd
Copy link
Collaborator Author

gouttegd commented Aug 19, 2024

The unsat involving CL’s placental villus capillary endothelial cell has been fixed in CL and a CL release with the fix is already available, so that particular issue will be automatically fixed in Uberon the next time we’ll refresh the imports.

As for the issue with the vertebra cartilage element (see comment), editors at the Uberon call of August 19th, 2024 agreed with the proposed removal of the property chain:

has potential to develop into o in taxon SubPropertyOf: in taxon

I plan to proceed with said removal unless someone objects to it in the next 3 weeks.

@cmungall
Copy link
Member

Keep the property chain. You can make an RO ticket for me to clarify the relation. Think of it as being there is genetic programming to make A develop into B. It is stronger than "may" (mays-as-existentials-considered-harmful).

The fix is to change the individual relationship to potential to develop into some UBERON:0010913 ! vertebral element.

@gouttegd
Copy link
Collaborator Author

Keep the property chain. You can make an RO ticket for me to clarify the relation.

Will do.

But if the property chain is correct, I believe it should belong in RO – that is, the property chain should still be removed from here and added to RO instead.

@gouttegd
Copy link
Collaborator Author

gouttegd commented Aug 20, 2024

The fix is to change the individual relationship to potential to develop into some UBERON:0010913 ! vertebral element.

There are two has_potential_to_develop_into relationships involved here:

  • 'vertebra cartilage element' has_potential_to_develop_into some 'vertebra' (1)
  • 'vertebra pre-cartilage condensation' has_potential_to_develop_into some 'vertebra' (2)

Any of the two is enough to make 'vertebra cartilage element' (which is present_in_taxon some 'Eptatretus burgeri') unsatisfiable, because they both link it (through 'vertebra') to 'vertebral column' (which is never_in_taxon some Myxinidae).

Changing both of these relationships to has_potential_to_develop_into some 'vertebral element' (as proposed) would indeed “fix” the unsat but I am not sure it would make much sense, because both 'vertebral cartilage element' and 'vertebra pre-cartilage condensation' are 'vertebral element' (they are both logically defined as “'vertebral element' that...”). So we would end up saying that something that is a 'vertebral element' has_potential_to_develop_into some 'vertebral element'.

So what I propose to do instead:

  • remove relationship (1) above entirely;
  • change relationship (2) into

'vertebra pre-cartilage condensation' has_potential_to_develop_into some 'vertebra cartilage element'

That is, we would go from this (current hierarchy)

  • 'vertebral element'
    • 'vertebra pre-cartilage condensation'
      • has_potential_to_develop_into some 'vertebra'
    • 'vertebra cartilage element'
      • develops_from some 'vertebra pre-cartilage condensation'
      • has_potential_to_develop_into some 'vertebra'
    • 'vertebra'
      • develops_from some 'vertebra cartilage element'

to this:

  • 'vertebral element'
    • 'vertebra pre-cartilage condensation'
      • has_potential_to_develop_into some 'vertebra cartilage element'
    • 'vertebra cartilage element'
      • develops_from some 'vertebra pre-cartilage condensation`
    • 'vertebra'
      • develops_from some 'vertebra cartilage element'

'vertebra cartilage element' violates a taxon constraint because it is
present_in_taxon some 'Eptatetrus burgeri' (inshore hagfish) but
has_potential_to_develop_into some vertebra, and vertebra is ultimately
(through 'vertebral column') never_in_taxon some Myxinidae (hagfishes).

We fix that by removing the has_potential_to_develop_into relationship
between 'vertebra cartilage element' and 'vertebra'.

This is however not enough, as 'vertebra cartilage element'
develops_from some 'vertebra pre-cartilage condensation' which also
has_potential_to_develop_into some vertebra, providing another path to
unsatisfiability.

We fix that by changing the target of the has_potential_to_develop_into
relationship of 'vertebra pre-cartilage condensation' from 'vertebra' to
'vertebra cartilage element'.
'red pulp of spleen' violates a taxon constraint because it is
present_in_taxon some 'Danio rerio' but is linked (through 'spleen
primordium', from which the spleen develops) to stomach, which is
never_in_taxon some 'Danio rerio'.

We fix that by removing the following axiom:

  'spleen primordium' part_of some 'dorsal mesogastrium'

That axiom is what links the existence of the spleen to the existence of
the stomach, and is probably too strong a claim. If a spleen can exist
in species without a stomach (as is the case in zebrafish), then the
spleen primordium cannot always be part of a component of the stomach.
'placental villus capillary endothelial cell' (from CL) was violating a
taxon constraint. This has been fixed in CL already, so here we import
the fix directly in our merged_import module (rather than refreshing the
entire module).
@gouttegd gouttegd merged commit 25213c6 into master Aug 27, 2024
1 check passed
@gouttegd gouttegd deleted the fix-taxon-constraints branch August 27, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants