WordNet part-of-speech vs Ontolex part-of-speech #58

simongray · 2021-09-22T08:59:14Z

Scanning through the Turtle file, I noticed that you define your own POS relations and classes rather than use the lexinfo:partOfSpeech relation which is heavily used in the Ontolex specification, which I understand that @jmccrae helped bring to life. I'm unsure why this is the case?

In the Ontolex specification it is specifically stated that

the model abstracts from specific linguistic theory or category systems used to describe the linguistic properties of lexical entries and their syntactic behavior, encouraging reuse of existing data category systems or linguistic ontologies.

I think that this is an excellent ideal as it makes integration of existing datasets mostly a matter of merging sets of triples. The second best option would be having some kind of derived lexinfo relation triple which can be inferred via equivalent/subclass relations.

Unfortunately, the GWA schema's partOfSpeech relation and PartOfSpeech class are proprietary and not linked to any external definitions. I have used Ontolex as the basis for the new version of DanNet, so my part-of-speech tags are all defined using lexinfo:partOfSpeech relation rather than wn:partOfSpeech.

How do you suggest we bridge this gap? The way I see it, either version 1.2 of the schema removes this bit and datasets use lexinfo:partOfSpeech directly -OR- a direct equivalency to lexinfo:partOfSpeech is established in the schema -- preferably the first as it simplifies things.

I could also add both wn and lexinfo relations for all LexicalEntry classes in the new DanNet, but that's both confusing and a messy fix IMO. Better to fix the schema than work around its flaws. Having competing standards for this is not a great situation.

The relevant part of the schema:

:partOfSpeech a owl:ObjectProperty ;
  rdfs:domain ontolex:LexicalEntry ;
  rdfs:range :PartOfSpeech ;
  rdfs:label "part of speech"@en ;
  rdfs:comment "The syntactic class of the entry, e.g., noun, verb"@en .

:PartOfSpeech a owl:Class ;
  rdfs:label "part of speech"@en ;
  rdfs:comment "The syntactic class of the entry, e.g., noun, verb"@en ;
  owl:oneOf (
    :noun :verb :adjective :adverb :adjective_satellite :named_entity 
    :conjunction :adposition :other_pos :unknown_pos ) .

:noun a :PartOfSpeech ;
  rdfs:label "noun"@en.
:verb a :PartOfSpeech ;
  rdfs:label "verb"@en .
:adjective a :PartOfSpeech ;
  rdfs:label "adjective"@en .
:adverb a :PartOfSpeech ;
  rdfs:label "adverb"@en .
:adjective_satellite a :PartOfSpeech ;
  rdfs:label "adjective satellite"@en .
:named_entity a :PartOfSpeech ;
  rdfs:label "named entity"@en .
:conjunction a :PartOfSpeech ;
  rdfs:label "conjunction"@en .
:adposition a :PartOfSpeech ;
  rdfs:label "adposition"@en .
:other_pos a :PartOfSpeech ;
  rdfs:label "other pos"@en .
:unknown_pos a :PartOfSpeech ;
  rdfs:label "unknown pos"@en .

The text was updated successfully, but these errors were encountered:

jmccrae · 2021-09-22T10:24:53Z

We chose this because there were some values, e.g., adjective_satellite that aren't in LexInfo and wouldn't really make sense to add. We wanted to avoid mixing namespaces also (e.g., lexinfo:noun with wn:adjective_satellite)

However, adding some owl:sameAs links to LexInfo would be very useful!

simongray · 2021-09-22T11:37:19Z

Ok, then I will make an attempt at that for 1.2.

simongray · 2021-09-27T14:14:09Z

Having taken a look at it, there doesn't seem to be a way to make it compatible in a satisfactory way. Defining owl:sameAs doesn't map the actual relations, just the instances.

Lexinfo unfortunately has a definite range (the lexinfo:PartOfSpeech class) so I'm unsure whether it's even possible to define the wn:partOfSpeech as a sub-property of lexinfo:PartOfSpeech while extending its range to encompass both PartOfSpeech classes. It doesn't seem like it will be possible.

jmccrae · 2021-09-27T14:24:59Z

Not sure I get what the issue is here. I would assume that wn:PartOfSpeech ⊑ lexinfo:PartOfSpeech for both the class and the property?

simongray · 2021-09-27T14:33:31Z

I am not OWL expert, that is probably the main issue ;-)

What you're saying is true. I guess owl:unionOf could be used to define the 1.2 wn:partOfSpeech range to be both wn:PartOfSpeech and lexinfo:PartOfSpeech? Still, not using lexinfo:partOfSpeech directly does make it harder to integrate directly with other Ontolex datasets.

jmccrae · 2021-09-27T14:46:42Z

If you add a subclass axiom between wn:PartOfSpeech and lexinfo:PartOfSpeech then there is not need for a unionOf statement, every wn:PartOfSpeech is also a lexinfo:PartOfSpeech so wn:PartOfSpeech ⊔ lexinfo:PartOfSpeech ≡ lexinfo:PartOfSpeech.

That is if we add

wn:partOfSpeech owl:subPropertyOf lexinfo:partOfSpeech .
wn:PartOfSpeech owl:subClassOf lexinfo:PartOfSpeech .
wn:noun owl:sameAs lexinfo:noun . # etc.

Then if we have

X wn:partOfSpeech wn:noun

We then infer

X lexinfo:partOfSpeech lexinfo:noun

simongray · 2021-09-28T08:11:54Z

Ah, that makes a lot of sense. I was stuck thinking that of course the subProperty can't extend the range of it's parent property, but yeah... if we define everything as subclasses it will technically not be doing that. Thanks a lot for explaining in this detail.

I do wonder what to do about the owl:oneOf relation from the wn:PartOfSpeech currently defined in the schema. I believe that the set of POS tags in lexinfo is more extensive - or at least not finite - so I wonder if it makes sense to also remove this restriction if, say, you wanted to use a lexinfo:PartOfSpeech as the object of a wn:partOfSpeech relation...? What are your thoughts on this?

...

As for the actual inference, I think I will also have to take another look at the Prolog-like logic rule DSL used in Apache Jena to make sure it applies the same kind of reasoning you just described in practice. The default level of OWL inference was a bit too comprehensive (and therefore slow) on the DanNet data, so to make it snappier I basically removed all statements that didn't infer inverse relationships ;-)

jmccrae · 2021-09-28T11:27:46Z

I do wonder what to do about the owl:oneOf relation from the wn:PartOfSpeech currently defined in the schema. I believe that the set of POS tags in lexinfo is more extensive - or at least not finite - so I wonder if it makes sense to also remove this restriction if, say, you wanted to use a lexinfo:PartOfSpeech as the object of a wn:partOfSpeech relation...? What are your thoughts on this?

It would not be compatible with this schema to use a value of part-of-speech other than ones specified. We need this to ensure interoperability. (Of course we are open to proposals for new values)

simongray · 2021-09-29T07:33:12Z

I'm sorry @jmccrae, but I have couple of remaining questions for things that are still not quite clear to me...

Can you tell me what the distinction is between e.g. the capitalised and all-lowercase versions of POS tags in lexinfo, e.g. Adjective vs adjective? It is not immediately clear to me.
Since you're involved with writing both this schema and the lexinfo one, how come you don't just add e.g. adjective_satellite to lexinfo and use lexinfo directly? Is it because adjective_satellite is non-standard...?

jmccrae · 2021-09-29T07:45:52Z

Can you tell me what the distinction is between e.g. the capitalised and all-lowercase versions of POS tags in lexinfo, e.g. Adjective vs adjective? It is not immediately clear to me.

Essentially capitalised names are for classes and lower case for values. So Adjective is a subclass of LexicalEntry, while adjective is the value of part-of-speech property. The following equivalence basically holds

X rdf:type ontolex:LexicalEntry and X lexinfo:partOfSpeech lexinfo:adjective <=> X rdf:type lexinfo:Adjective

Since you're involved with writing both this schema and the lexinfo one, how come you don't just add e.g. adjective_satellite to lexinfo and use lexinfo directly? Is it because adjective_satellite is non-standard...?

Exactly

simongray mentioned this issue Sep 28, 2021

Infer ontolex:partOfSpeech from wn:partOfSpeech kuhumcst/DanNet#17

Open

simongray closed this as completed Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WordNet part-of-speech vs Ontolex part-of-speech #58

WordNet part-of-speech vs Ontolex part-of-speech #58

simongray commented Sep 22, 2021 •

edited

Loading

jmccrae commented Sep 22, 2021

simongray commented Sep 22, 2021

simongray commented Sep 27, 2021

jmccrae commented Sep 27, 2021

simongray commented Sep 27, 2021

jmccrae commented Sep 27, 2021

simongray commented Sep 28, 2021

jmccrae commented Sep 28, 2021

simongray commented Sep 29, 2021

jmccrae commented Sep 29, 2021

WordNet part-of-speech vs Ontolex part-of-speech #58

WordNet part-of-speech vs Ontolex part-of-speech #58

Comments

simongray commented Sep 22, 2021 • edited Loading

jmccrae commented Sep 22, 2021

simongray commented Sep 22, 2021

simongray commented Sep 27, 2021

jmccrae commented Sep 27, 2021

simongray commented Sep 27, 2021

jmccrae commented Sep 27, 2021

simongray commented Sep 28, 2021

jmccrae commented Sep 28, 2021

simongray commented Sep 29, 2021

jmccrae commented Sep 29, 2021

simongray commented Sep 22, 2021 •

edited

Loading