Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use IRIs uniformly in HETS #1596

Closed
mcodescu opened this issue Mar 14, 2016 · 14 comments · Fixed by #1763
Closed

use IRIs uniformly in HETS #1596

mcodescu opened this issue Mar 14, 2016 · 14 comments · Fixed by #1763

Comments

@mcodescu
Copy link
Collaborator

mcodescu commented Mar 14, 2016

We currently have IRIs defined in OWL2, in Common.IRI and moreover we have CASL identifiers. This complicates things, e.g. when translating from OWL to CASL or when using Common.IRIs for OWL2. Ideally, there should be just one type of identifiers, that we could create by extending Common.IRI with some mixfix annotations to cover CASL identifiers as well. We should also have a convention, documented in the wiki, about implicit values for fields in the IRI type. As a result, all logics (including CASL and OWL2) should use the new Common.IRI type.

Hets modules to look at: Common.IRI (datatype IRI), OWL2.AS (datatype QName, same as IRI, but different from IRI in Common.IRI), Common.Id (datatype Id). These need to be integrated into a uniform datatype. Later on, look at the parsers: Common.IRI, OWL2.Parse, Common.Token (e.g. parseId).

@tillmo
Copy link
Contributor

tillmo commented Sep 18, 2017

What happens with fully qualified names f:s->t, infixes __+__ and compound IDs List[Elem]? First idea: just include them verbatim. Does the IRI syntax allow this?
Some answers: in compound IDs, only List could be an IRI, not Elem. If Elem is later instantiated by an IRI, only the local part of that IRI will be substituted for Elem. This of course can lead to name clashes.
Special symbols like : would be escaped with their hex code (while still being nicely displayed in Ontohub).

@tillmo
Copy link
Contributor

tillmo commented Sep 20, 2017

The problem is that in the standard http://www.ietf.org/rfc/rfc3987.txt brackets [] are forbidden in IRIs, while some symbols like + and : are allowed (at least in the local part). However, all unicode characters beyond ASCII are allowed, too. Hence we could replace square brackets by some unicode brackets. Alternatively, round brackets () are allowed, too (but these are already used for application to arguments), as well as curly braces {} (but these already are used for DOL structured OMS).

Possible solution: use, when displaying the OMS (i.e. for "show theory"):
Id〚X﹐Y〛for Id[X,Y]
f:s×t→u for f:s*t->u
× for *
/ for /
\ for \
< for <
> for >
? for ?
: for :
# for #
^for ^
❙ for |
The CASL SIGNs _+-&=!.$@~¡¿÷£©±¶§¹²³·¢◦¬μ are legal in IRIs.

Another question is whether all CASL SIGNs should be legal in full DOL IRIs. Probably it would be better to allow them only in unprefixed identifiers.

@tillmo
Copy link
Contributor

tillmo commented Sep 25, 2017

For mapping CASL IDs to IRIs, if there is no empty prefix, we need a default. This could be the IRI of the library and should be consistent with Ontohub. However, in Ontohub, we need to disambiguate between OMS, mappings, symbols, axiom names etc. (see ontohub/ontohub-backend#9 (comment)). We could append a the entity kind OMS, symbol etc. to the IRI, which is quite unusual, but from a Hets perspective this would work better than prepending it, because Hets needs to work not only with Ontohub, but also with other sources on the web. We could allow the omission of the appended entity kind if there is no overloading, i.e. if a resolution to a unique enitty is possible. There could be a redirection mechanism from the default to the version with the appended entity kind. For example, in a library http://ontohub.org/user/pizza-repo/pizza-library, a symbol PeperoniPizza in OMS pizza-oms would be expanded to the loc/id (IRI)

http://ontohub.org/user/pizza-repo/pizza-library//pizza-oms//PeperoniPizza//symbol

but there also would be the default loc/id

http://ontohub.org/user/pizza-repo/pizza-library//pizza-oms//PeperoniPizza

Note that this mechanism is different from the usual DOL prefixing mechanism. For example, if we had

prefix : http://ontohub.org/user/pizza-repo/pizza-library//

a symbol PeperoniPizza would be expanded to the loc/id

http://ontohub.org/user/pizza-repo/pizza-library//PeperoniPizza

which does not follow the DOL/Hets convention that a loc/id should include both the library, the OMS and the entity name. The general form of a loc-id therefore is

:iri-of-library//:oms//:entity-kind
:iri-of-library//:oms//:entity-name//:entity-kind
:iri-of-native_document//:entity-name//:entity-kind

and in Ontohub, this specialises to

http://ontohub.org/:user/:repo/:path-to-library//:oms//:entity-kind
http://ontohub.org/:user/:repo/:path-to-library//:oms//:entity-name//:entity-kind
http://ontohub.org/:user/:repo/:path-to-native_document//:entity-name//:entity-kind

Alternatively, we could use the usual web convention to prefix each name with a kind:

:iri-of-library//oms//:oms
:iri-of-library//oms//:oms//symbol//:symbol-name
:iri-of-native_document//symbol//:symbol-name

Note that we also have to process native OWL, RDFa documents etc. The symbols there have IRIs that do not follow the above mechanism. But these symbols do not follow link data principles anyway. We provide alternative IRIs (loc/ids) for these symbols in the above form, which follow linked data principles.

@tillmo
Copy link
Contributor

tillmo commented Sep 25, 2017

Another question is whether we use the IRI of the location of the document or the IRI in the library declaration within the document. (This is comparable to: IRI of the location of an OWL document versus IRI declared in the Ontology declaration.) It seems that we should use the latter, because this is the explicitly declared - even if this breaks linked data principles in case where the two IRIs differ.

@BerndKB
Copy link

BerndKB commented Sep 26, 2017

Since the problem of incomplete character sets seems to occur with several people, we might try
"‚" - 1738 - U+201A single lower-9 quotation mark
that displays well her, in Protégé and in my TextEdit editor.
I have not found a really FAT comma ...

@BerndKB
Copy link

BerndKB commented Oct 11, 2017

I encountered another Problem, perhaps only with Protégé (?): if we use the following names
C[XˌY]
D〚X‚Y〛
E〚X❜Y〛
then the full IRI (with the "Show Full IRI" option) will show correctly, but without this option Protégé displays
C[XˌY]
‚Y〛
❜Y〛
i.e. the text before the resp. "comma" is taken to be part of the URI prefix.
If it stays that way, it will be extremely confusing for the user.

@tillmo
Copy link
Contributor

tillmo commented Oct 11, 2017

yes, I can confirm this. The problem is the comma. Only ˌ seems to work. So for example F〚XˌY〛works for me.

@BerndKB
Copy link

BerndKB commented Oct 11, 2017

Yes, F〚XˌY〛works for me also.

@clange
Copy link
Contributor

clange commented Nov 5, 2017

Some thoughts while I'm reading this. I have bit yet read the comments to the end. @BerndKB BTW good to "see" you here. I was also at ISWC (just the main conference); didn't manage to talk to you there, but I heard some of my colleagues did.

Another alternative character for enclosing type arguments might be <...> (think of Java or C++ generics), but this is even less allowed in IRIs than other bracket characters, probably because it's commonly used to enclose complete IRIs.

@clange
Copy link
Contributor

clange commented Nov 5, 2017

The discussion about prepending the entity kind to IRIs reminds me of another possible solution: following the approach of punning in OWL 2, i.e. allowing the use of the same name for different entities having different entity kinds, where the kind of an entity is determined from the syntactic context.

@clange
Copy link
Contributor

clange commented Nov 5, 2017

In any case I would not mess with the mechanism of "prefix expansion by concatenation", which DOL reuses from the specification of CURIEs in RDFa, and which occurs similarly in related standards, including SPARQL and Turtle. The beauty of this mechanism is its simplicity. Or, on the contrary, if we wanted to deviate from this mechanism, we should be bold instead of half-hearted and do away with it completely, i.e. devise a powerful mechanism for abbreviating long identifiers that doesn't have to respect the restrictions of any existing standard (except maybe RFC 3987). As an analogy, compare the URI syntax of MMT, which, IIRC, has its own approach to relative paths, which is not supported by any other implementation but is self-contained and makes a lot of sense for the MMT use cases.

@clange
Copy link
Contributor

clange commented Nov 5, 2017

On whether or not to follow linked data principles, one can also take inspiration from MMT's approach of deliberately not following them. Not following them might of course cause confusion because DOL intends to be compatible with languages whose best practice is to follow them, and DOL should also be inviting to the users of such languages.

@clange
Copy link
Contributor

clange commented Nov 5, 2017

Done with my comments. No straightforward solution I could offer, but I hope you'll find my input useful at least.

@tillmo
Copy link
Contributor

tillmo commented Nov 5, 2017

many thanks for your comments. Do you have the impression that we mess with the mechanism of "prefix expansion by concatenation"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants