New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Axiom ordering - sort or keep original order? #273

Closed
ignazio1977 opened this Issue Aug 14, 2014 · 28 comments

Comments

Projects
None yet
7 participants
@ignazio1977
Contributor

ignazio1977 commented Aug 14, 2014

Having chanced upon http://douroucouli.wordpress.com/2014/03/30/the-perils-of-managing-owl-in-a-version-control-system/ I wonder if the OWLAPI should alleviate the pain of diffs on Turtle/XML syntaxes.

The simplest solution I can think of is a counter on OWLObject that keeps track of the order in which the objects were created. Then, when sorting axioms, class expressions and what have you for output, use it together with the current criteria.

Example:

Ontology contains three equivalent axioms, one class assertion

During parsing, the axioms are numbered 1, 2, 3, 4

Add new equivalent axiom, numbered 5

Output order is
1, 2, 3, 5, 4

i.e., the new equivalent axiom is the last of the equivalent axioms list.

What are your thoughts? @matthewhorridge @cmungall anyone else?

@ansell

This comment has been minimized.

Show comment
Hide comment
@ansell

ansell Aug 15, 2014

Member

Given that OWLAPI Turtle and RDF/XML files are rendered based on categories (classes/individuals/etc.), the counter in that example may need to be localised to the category.

From a technical point of view, it shouldn't be difficult to implement a counter, as in practice all changes go through the OWLOntologyManager so there would just need to be an AtomicLong for each category, basically.

Member

ansell commented Aug 15, 2014

Given that OWLAPI Turtle and RDF/XML files are rendered based on categories (classes/individuals/etc.), the counter in that example may need to be localised to the category.

From a technical point of view, it shouldn't be difficult to implement a counter, as in practice all changes go through the OWLOntologyManager so there would just need to be an AtomicLong for each category, basically.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 15, 2014

Member

Not totally sure I understand the numbering strategy. Would the ordering be lost if it were roundtripped via a non-number preserving format?

My naive thoughts were that it would be possible to define a sort order on any set of constructs. For example:

  • Named Entities first, and order IRIs alphanumerically
  • anonymous expressions next. Say SomeValuesFrom < IntersectionOf < UnionOf ...
    • when ordering within SVF, order by PropertyExpressionFirst, then filler ...
Member

cmungall commented Aug 15, 2014

Not totally sure I understand the numbering strategy. Would the ordering be lost if it were roundtripped via a non-number preserving format?

My naive thoughts were that it would be possible to define a sort order on any set of constructs. For example:

  • Named Entities first, and order IRIs alphanumerically
  • anonymous expressions next. Say SomeValuesFrom < IntersectionOf < UnionOf ...
    • when ordering within SVF, order by PropertyExpressionFirst, then filler ...
@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Aug 15, 2014

Contributor

Yes, defining an ordering is good but it does not allow to preserve the existing structure, e.g., if an ontology file with the "wrong" ordering is read, the output will not play well with the previous version. It will work well with successive versions, but there would need to be an 'update' step.
It's basically the same problem you mention about the ordering being lost roundtripping with another tool. I'm not sure there's a catchall solution here.

Contributor

ignazio1977 commented Aug 15, 2014

Yes, defining an ordering is good but it does not allow to preserve the existing structure, e.g., if an ontology file with the "wrong" ordering is read, the output will not play well with the previous version. It will work well with successive versions, but there would need to be an 'update' step.
It's basically the same problem you mention about the ordering being lost roundtripping with another tool. I'm not sure there's a catchall solution here.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 15, 2014

Member

I guess I'm OK with that. But my bias is primarily to the VCS use case.

I can see how your proposal would be nice if people were hand-editing the files and there was some axiom ordering that was appealing to them & they wished to preserve it. But I think anyone hand-editing rdf/xml long term would be certifiable (we've all done it short term...)

Member

cmungall commented Aug 15, 2014

I guess I'm OK with that. But my bias is primarily to the VCS use case.

I can see how your proposal would be nice if people were hand-editing the files and there was some axiom ordering that was appealing to them & they wished to preserve it. But I think anyone hand-editing rdf/xml long term would be certifiable (we've all done it short term...)

@matthewhorridge

This comment has been minimized.

Show comment
Hide comment
@matthewhorridge

matthewhorridge Aug 15, 2014

Contributor

Would these be implemented as separate comparators?

Perhaps ontologies, axioms, class expressions etc. and the objects that they contain should preserve the order that they are supplied with. Sorting on rendering, or whenever required, could just use the appropriate comparator.

I would actually like to have a well defined sort order for things like creating a digest of a set of axioms (unless there is a better way of doing this).

Contributor

matthewhorridge commented Aug 15, 2014

Would these be implemented as separate comparators?

Perhaps ontologies, axioms, class expressions etc. and the objects that they contain should preserve the order that they are supplied with. Sorting on rendering, or whenever required, could just use the appropriate comparator.

I would actually like to have a well defined sort order for things like creating a digest of a set of axioms (unless there is a better way of doing this).

@sesuncedu

This comment has been minimized.

Show comment
Hide comment
@sesuncedu

sesuncedu Aug 16, 2014

Contributor

Sorting seems to give a big improvement in compression ratios (at least for FSS).

Contributor

sesuncedu commented Aug 16, 2014

Sorting seems to give a big improvement in compression ratios (at least for FSS).

@whitten

This comment has been minimized.

Show comment
Hide comment
@whitten

whitten Aug 16, 2014

Since each axiom is true and effectively ANDed together, and since AND is
idempotent, there should not be any "preferred" order for axioms. First
Order Logic requires that they all be treated as if they have no particular
order, so a topographic sort should work fine.

David Whitten
713-870-3834

On Fri, Aug 15, 2014 at 8:20 PM, Simon Spero notifications@github.com
wrote:

Sorting seems to give a big improvement in compression ratios (at least
for FSS).


Reply to this email directly or view it on GitHub
#273 (comment).

whitten commented Aug 16, 2014

Since each axiom is true and effectively ANDed together, and since AND is
idempotent, there should not be any "preferred" order for axioms. First
Order Logic requires that they all be treated as if they have no particular
order, so a topographic sort should work fine.

David Whitten
713-870-3834

On Fri, Aug 15, 2014 at 8:20 PM, Simon Spero notifications@github.com
wrote:

Sorting seems to give a big improvement in compression ratios (at least
for FSS).


Reply to this email directly or view it on GitHub
#273 (comment).

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Aug 16, 2014

Contributor

Of course the semantics of the ontologies is unaffected by the order of axioms.

The point of this change is purely to minimise changes to the text output, for the greater good of text based version control systems and other non OWL aware tooling.

Contributor

ignazio1977 commented Aug 16, 2014

Of course the semantics of the ontologies is unaffected by the order of axioms.

The point of this change is purely to minimise changes to the text output, for the greater good of text based version control systems and other non OWL aware tooling.

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 May 17, 2015

Contributor

Simon sorted some of the syntaxes, save for manchester (and the legacy ones, e.g., krss).

Contributor

ignazio1977 commented May 17, 2015

Simon sorted some of the syntaxes, save for manchester (and the legacy ones, e.g., krss).

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 6, 2015

Member

@sesuncedu which version of the owlapi are these fixes in? Useful to know for ensuring everyone's Protege is in sync

Member

cmungall commented Aug 6, 2015

@sesuncedu which version of the owlapi are these fixes in? Useful to know for ensuring everyone's Protege is in sync

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 7, 2015

Member

@ignazio1977 do you know?

Member

cmungall commented Aug 7, 2015

@ignazio1977 do you know?

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Aug 7, 2015

Contributor

Should be in all versions. I'll double check.

Contributor

ignazio1977 commented Aug 7, 2015

Should be in all versions. I'll double check.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Aug 7, 2015

Member

What does all versions mean? I'm trying to figure out which versions of protege support this, and whether we need a new protege build

Member

cmungall commented Aug 7, 2015

What does all versions mean? I'm trying to figure out which versions of protege support this, and whether we need a new protege build

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Aug 7, 2015

Contributor

All most recent versions: 3.5.2, 4.0.2 and version 5 master. It's included in the 4.1.0 release candidate as well.

From past experience, Protege 4.3 and 5 can be adapted to use 3.5.2 by dropping the 3.5.2 osgidistribution jar in the protege plugins folder.

Contributor

ignazio1977 commented Aug 7, 2015

All most recent versions: 3.5.2, 4.0.2 and version 5 master. It's included in the 4.1.0 release candidate as well.

From past experience, Protege 4.3 and 5 can be adapted to use 3.5.2 by dropping the 3.5.2 osgidistribution jar in the protege plugins folder.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Oct 7, 2015

Member

I am using Protege 5beta18 snapshot, it saves with this version of the owlapi:

<!-- Generated by the OWL API (version 3.5.3.20150903-2211) http://owlapi.sourceforge.net -->

Yet we still get spurious diffs, e.g.
oborel/obo-relations@f9e17bf

This is in RDF/XML. I'm going to re-open as my understanding was that the intent was to implement deterministic ordering for non-legacy syntaxes (unless rdf/xml is considered legacy...)

Feel free to re-close but let me know where this is fully implemented

Member

cmungall commented Oct 7, 2015

I am using Protege 5beta18 snapshot, it saves with this version of the owlapi:

<!-- Generated by the OWL API (version 3.5.3.20150903-2211) http://owlapi.sourceforge.net -->

Yet we still get spurious diffs, e.g.
oborel/obo-relations@f9e17bf

This is in RDF/XML. I'm going to re-open as my understanding was that the intent was to implement deterministic ordering for non-legacy syntaxes (unless rdf/xml is considered legacy...)

Feel free to re-close but let me know where this is fully implemented

@cmungall cmungall reopened this Oct 7, 2015

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Oct 8, 2015

Contributor

I seem to have missed a commit on 3.5.2 when I checked. My bad.

Contributor

ignazio1977 commented Oct 8, 2015

I seem to have missed a commit on 3.5.2 when I checked. My bad.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Oct 8, 2015

Member

OK, was that just for rdf/xml or does it affect all?

Member

cmungall commented Oct 8, 2015

OK, was that just for rdf/xml or does it affect all?

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Oct 8, 2015

Contributor

Not sure yet, looks like Turtle and RDF/XML

Contributor

ignazio1977 commented Oct 8, 2015

Not sure yet, looks like Turtle and RDF/XML

@sesuncedu

This comment has been minimized.

Show comment
Hide comment
@sesuncedu

sesuncedu Oct 9, 2015

Contributor

I just finished slouching in to Bethlehem so not really brain-enabled, but
I think the relevant code is in one of the base rdf renderers. (I know that
in version 4 it changed blank node ids for the rio writers (since I had to
adjust test cases)

On Thu, Oct 8, 2015 at 6:57 PM, Ignazio Palmisano notifications@github.com
wrote:

Not sure yet, looks like Turtle and RDF/XML


Reply to this email directly or view it on GitHub
#273 (comment).

Contributor

sesuncedu commented Oct 9, 2015

I just finished slouching in to Bethlehem so not really brain-enabled, but
I think the relevant code is in one of the base rdf renderers. (I know that
in version 4 it changed blank node ids for the rio writers (since I had to
adjust test cases)

On Thu, Oct 8, 2015 at 6:57 PM, Ignazio Palmisano notifications@github.com
wrote:

Not sure yet, looks like Turtle and RDF/XML


Reply to this email directly or view it on GitHub
#273 (comment).

@ignazio1977 ignazio1977 self-assigned this Oct 9, 2015

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Oct 9, 2015

Contributor

@cmungall I've fixed the issue, but one problem you'll see for that ontology is that the next save will still introduce random changes - the previous versions were not sorted. After that things should normalize.

I'll put a Protege build with the updated jar up for evaluation once I'm done.

Contributor

ignazio1977 commented Oct 9, 2015

@cmungall I've fixed the issue, but one problem you'll see for that ontology is that the next save will still introduce random changes - the previous versions were not sorted. After that things should normalize.

I'll put a Protege build with the updated jar up for evaluation once I'm done.

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Oct 9, 2015

Contributor

@sesuncedu one thing I'm not clear about is the change to RDFXMLRenderer

private void writeCommentForEntity(String msg, OWLEntity entity) {
    checkNotNull(entity, msg);
    String iriString = entity.getIRI().toString();
    String labelString = labelMaker.getShortForm(entity);
    String commentString = null;
    if (!iriString.equals(labelString)) {
        commentString = labelString;
    } else {
        commentString = iriString;
    }
    writer.writeComment(XMLUtils.escapeXML(commentString));
    }

If I interpret the results correctly, this will change the banner in XML files to use the (one of the) labels for the entity being written out. That sounds like a great idea to me, but it will also introduce a number of changes to existing ontologies. Was the intention to make this configurable?

Contributor

ignazio1977 commented Oct 9, 2015

@sesuncedu one thing I'm not clear about is the change to RDFXMLRenderer

private void writeCommentForEntity(String msg, OWLEntity entity) {
    checkNotNull(entity, msg);
    String iriString = entity.getIRI().toString();
    String labelString = labelMaker.getShortForm(entity);
    String commentString = null;
    if (!iriString.equals(labelString)) {
        commentString = labelString;
    } else {
        commentString = iriString;
    }
    writer.writeComment(XMLUtils.escapeXML(commentString));
    }

If I interpret the results correctly, this will change the banner in XML files to use the (one of the) labels for the entity being written out. That sounds like a great idea to me, but it will also introduce a number of changes to existing ontologies. Was the intention to make this configurable?

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Oct 10, 2015

Contributor

Now fixed in the version3 branch, I've used the ontology linked by @cmungall to verify and cherry picked the manchester syntax sorting as well.
The sorting test is now the same for version 3 and 4.

I've not enabled @sesuncedu's change to use a label in RDF/XML banner for entities, as this would introduce more changes in the output. I'm planning to add it and make it switchable.

Contributor

ignazio1977 commented Oct 10, 2015

Now fixed in the version3 branch, I've used the ontology linked by @cmungall to verify and cherry picked the manchester syntax sorting as well.
The sorting test is now the same for version 3 and 4.

I've not enabled @sesuncedu's change to use a label in RDF/XML banner for entities, as this would introduce more changes in the output. I'm planning to add it and make it switchable.

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@Public-Health-Bioinformatics

This comment has been minimized.

Show comment
Hide comment
@Public-Health-Bioinformatics

Public-Health-Bioinformatics Jan 18, 2017

I couldn't quite tell from the thread, so could someone summarize the sort order employed now in OWLAPI/Protege >= 5.0.0-beta-18? Is it deterministic down to the triplet? Does it parallel being able to sort an XML document by tag name, and then by attribute name and value, then content, or some-such? It sounds like OWLAPI sorts before it writes out to various formats, which sounds great.

In other words, now we do have diff'able ontology output via OWLAPI and Protege, with no caveats?

I appreciate all the work done on this!

Public-Health-Bioinformatics commented Jan 18, 2017

I couldn't quite tell from the thread, so could someone summarize the sort order employed now in OWLAPI/Protege >= 5.0.0-beta-18? Is it deterministic down to the triplet? Does it parallel being able to sort an XML document by tag name, and then by attribute name and value, then content, or some-such? It sounds like OWLAPI sorts before it writes out to various formats, which sounds great.

In other words, now we do have diff'able ontology output via OWLAPI and Protege, with no caveats?

I appreciate all the work done on this!

@sesuncedu

This comment has been minimized.

Show comment
Hide comment
@sesuncedu

sesuncedu Jan 18, 2017

Contributor
Contributor

sesuncedu commented Jan 18, 2017

@ignazio1977

This comment has been minimized.

Show comment
Hide comment
@ignazio1977

ignazio1977 Jan 18, 2017

Contributor

To the extent that it can be tested, it is deterministic and tested to stay so. As @sesuncedu said, this is not an absolute absolute, due to a few things. However, blank node ids are generated in sequence when parsing and are used in sorting blank nodes, so corner cases should be fairly uncommon.

Node identity comes after a number of other factors; ordering is implemented as follows:

  • declarations first; entities are grouped by type (annotation-object-data-properties, datatypes, classes, named individuals, I believe it's the sequence)
  • Axioms ordered by type (the exact sequence is embedded in the axiom type declaration, and matches the strategy used for hashcode computation.
  • General containment axioms come last; they are ordered again by axiom type.

Sequences of axioms or any other OWL objects are sorted by type first, then by values of contained properties/expressions, down to IRI (alphabetical) when necessary. Most of the time this is enough to have stable order.

Contributor

ignazio1977 commented Jan 18, 2017

To the extent that it can be tested, it is deterministic and tested to stay so. As @sesuncedu said, this is not an absolute absolute, due to a few things. However, blank node ids are generated in sequence when parsing and are used in sorting blank nodes, so corner cases should be fairly uncommon.

Node identity comes after a number of other factors; ordering is implemented as follows:

  • declarations first; entities are grouped by type (annotation-object-data-properties, datatypes, classes, named individuals, I believe it's the sequence)
  • Axioms ordered by type (the exact sequence is embedded in the axiom type declaration, and matches the strategy used for hashcode computation.
  • General containment axioms come last; they are ordered again by axiom type.

Sequences of axioms or any other OWL objects are sorted by type first, then by values of contained properties/expressions, down to IRI (alphabetical) when necessary. Most of the time this is enough to have stable order.

@cmungall

This comment has been minimized.

Show comment
Hide comment
@cmungall

cmungall Jan 18, 2017

Member

Everything has been working perfectly for me for the last year or so.

Member

cmungall commented Jan 18, 2017

Everything has been working perfectly for me for the last year or so.

@Public-Health-Bioinformatics

This comment has been minimized.

Show comment
Hide comment
@Public-Health-Bioinformatics

Public-Health-Bioinformatics Jan 18, 2017

Great, thanks for this feedback.

Public-Health-Bioinformatics commented Jan 18, 2017

Great, thanks for this feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment