Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntactic behaviour should be better modelled #8

Closed
jmccrae opened this issue Nov 6, 2019 · 15 comments · Fixed by #29
Closed

Syntactic behaviour should be better modelled #8

jmccrae opened this issue Nov 6, 2019 · 15 comments · Fixed by #29
Milestone

Comments

@jmccrae
Copy link
Member

jmccrae commented Nov 6, 2019

@fcbond

Why don't we add syntactic behavior to senses (and possibly synsets), which is where it is in PWN.
It should not be on the lexical entry, ...

@jmccrae
Copy link
Member Author

jmccrae commented Jul 30, 2020

I would propose the following

  • The tag <SyntacticBehaviour> can now also appear under the <Lexicon> tag
  • <Sense> and <LexicalEntry> can refer to syntactic behaviours by ID

For example

<Lexicon>
  <SyntacticBehaviour id="transitive" subcategorizationFrame="Someone %s something"/>
  <LexicalEntry id="ewn-do-v" syntacticBehaviour="transitive">
     ...
    <!-- Ideally we either indicate syntactic behaviour on the entry OR the sense... no need to do both -->
    <Sense id="sense1" syntacticBehaviour="transitive"/>
  </LexicalEntry>
</Lexicon>

@jmccrae jmccrae added this to the v1.1 milestone Jul 30, 2020
@fcbond
Copy link
Member

fcbond commented Jul 31, 2020

I think <Synset> and <Sense> should have syntacticBehaviour, not <Sense> and <LexicalEntry>,

otherwise I agree (although can we call it subCat to make it easier to fit things in our screens)?

@lmorgadodacosta
Copy link

Correct me if I am wrong, but a single sense should be able to have multiple values for SyntacticBehaviour.
See, for example, here: 'give' in 02199590-v (OMW).
This being the case, wouldn't it be preferable to use nested elements instead an attribute?

@1313ou
Copy link

1313ou commented Jul 31, 2020

Use IDREFS (note the S), meaning a sense can have multiple verb frames

@jmccrae
Copy link
Member Author

jmccrae commented Jul 31, 2020

Yes, I was proposing using IDREFS to give multiple links.

We could certainly use subCat as the attribute name... shorter can be better

@goodmami
Copy link
Member

Two things here:

  1. (emphasis added)

    The tag <SyntacticBehaviour> can now also appear under the <Lexicon> tag

    The <LexicalEntry> element can now (in Improve representation of sense subcategorizations #29) take a subcat attribute. Why should we continue to allow <SyntacticBehaviour> elements to be defined in <LexicalEntry> elements? If you're concerned about backward compatibility, can we at least deprecate the old pattern (e.g., document it as such, tools can generate a warning) and then properly remove it in the future?

  2. Actually, what is the purpose of allowing subcat frames on both lexical entries and senses? Is the intuition that a frame on a lexical entry is shared by all its senses? If so, instead of adding this layer of interpretation onto the data, why don't we just be explicit and specify the frames on senses only?

@fcbond
Copy link
Member

fcbond commented Aug 28, 2020 via email

@arademaker
Copy link
Member

Why not only in senses to avoid extra confusion? Even if all senses of a given synset have the same syntactic behavior

@goodmami
Copy link
Member

[...] or synsets (where all senses in the synset share the same syntactic behaviour).

Whether on <LexicalEntry> or <Synset>, this kind of interpretation needs to be implemented by the software and isn't explicit in the data.

To be more precise, here's what I (and @arademaker, it seems) are proposing (subcat only on <Sense>) for LMF:

--- a/WN-LMF-1.0.dtd
+++ b/WN-LMF-1.0.dtd
@@ -2,7 +2,7 @@
 <!ELEMENT LexicalResource (Lexicon+)>
 <!ATTLIST LexicalResource
     xmlns:dc CDATA #FIXED "http://purl.org/dc/elements/1.1/">
-<!ELEMENT Lexicon (LexicalEntry+, Synset*)>
+<!ELEMENT Lexicon (LexicalEntry+, Synset*, SyntacticBehaviour*)>
 <!ATTLIST Lexicon
     id ID #REQUIRED
     label CDATA #REQUIRED
@@ -29,7 +29,7 @@
     status CDATA #IMPLIED
     note CDATA #IMPLIED
     confidenceScore CDATA "1.0">
-<!ELEMENT LexicalEntry (Lemma, Form*, Sense*, SyntacticBehaviour*)>
+<!ELEMENT LexicalEntry (Lemma, Form*, Sense*)>
 <!ATTLIST LexicalEntry
     id ID #REQUIRED
     dc:contributor CDATA #IMPLIED
@@ -83,7 +83,8 @@
     note CDATA #IMPLIED
     confidenceScore CDATA #IMPLIED
     lexicalized (true|false) "true"
-    adjposition (a|ip|p) #IMPLIED>
+    adjposition (a|ip|p) #IMPLIED
+    subcat IDREFS #IMPLIED>
 <!ELEMENT Synset (Definition*, ILIDefinition?, SynsetRelation*, Example*)>
 <!ATTLIST Synset
     id ID #REQUIRED
@@ -211,6 +212,7 @@
     confidenceScore CDATA #IMPLIED>
 <!ELEMENT SyntacticBehaviour EMPTY>
 <!ATTLIST SyntacticBehaviour
+  id ID #REQUIRED
   subcategorizationFrame CDATA #REQUIRED
   senses IDREFS #IMPLIED>
 <!ELEMENT Count (#PCDATA)>

@jmccrae
Copy link
Member Author

jmccrae commented Sep 1, 2020

There are other models like OntoLex/LMF, which model syntactic behaviour solely on the entry level. However, for the moment, I only know of wordnets that model this on the sense level so we can introduce this modelling in v1.1. If there is a demand for modelling at the entry level too later, we can easily add this.

I have updated the PR.

@jmccrae
Copy link
Member Author

jmccrae commented Sep 1, 2020

NB. Small note on @goodmami's version. I think to keep backwards compatibility we should still allow <SyntacticBehaviour> to appear under <LexicalEntry>

@goodmami
Copy link
Member

goodmami commented Sep 1, 2020

I think to keep backwards compatibility we should still allow <SyntacticBehaviour> to appear under <LexicalEntry>

Fair enough. Is this equivalent to putting it under <Lexicon>? That is, it only introduces a syntactic behavior that we can refer to, and doesn't carry any meaning about it being associated with the <LexicalEntry>?

And relatedly, do we have a process for breaking backward compatibility (e.g., "deprecate, then remove after a year")? If we keep everything backward compatible, the format will accumulate a lot of cruft.

@jmccrae
Copy link
Member Author

jmccrae commented Oct 5, 2020

On backwards compatibility, I think we should go by version numbering. e.g., 1.x is fully backwards compatible with 1.y (where x > y) but 2.0 can introduce breaking changes.

@fcbond
Copy link
Member

fcbond commented Oct 6, 2020 via email

@jmccrae
Copy link
Member Author

jmccrae commented Apr 20, 2021

Closed by #38

@jmccrae jmccrae closed this as completed Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants