Fix WordNet 3.0 gloss inconsistencies #160

genericallyterrible · 2021-09-10T15:48:36Z

@fcbond, @stevenbird There are several consistency issues with the gloss portions of WordNet 3.0 making parsing difficult. Would it be possible for us to manually fix these issues without breaking word associations as seen with the problems currently facing the update to WordNet 3.1?

fcbond · 2021-09-11T02:59:40Z

G'day, we have tried to fix these in the new English wordnet, and there is a good python interface: https://github.com/globalwordnet/english-wordnet https://pypi.org/project/wn/ I think it makes more sense to move to this than try to port backfixes to 3.0

…

On Sat, Sep 11, 2021 at 12:08 AM John Merkel ***@***.***> wrote: @fcbond <https://github.com/fcbond>, @stevenbird <https://github.com/stevenbird> There are several consistency issues with the gloss portions of WordNet 3.0 making parsing difficult <nltk/nltk#2527 (comment)>. Would it be possible for us to manually fix these issues without breaking word associations as seen with the problems currently facing the update to WordNet 3.1 <#18>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#160>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIPZRXGQNEWLPYE7MBQ2A3UBISF5ANCNFSM5DZTMIJA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Francis Bond <http://www3.ntu.edu.sg/home/fcbond/> Division of Linguistics and Multilingual Studies Nanyang Technological University

goodmami · 2021-09-13T21:12:24Z

Would it be possible for us to manually fix these issues without breaking word associations [...]

Replying specifically to this: It is incredibly difficult to alter WNDB data without breaking things, as the synset IDs are byte-offsets in the file, so any modified gloss has to have the same number of bytes as before. Secondly, we're not allowed to change the Princeton WordNet data and still call it as such (it would have to be called the "NLTK Wordnet of English" or something).

the problems currently facing the update to WordNet 3.1?

That issue was closed 2 years ago, which suggests to me that there are no plans to add WordNet 3.1 to the NLTK. There was an attempt at adding next-generation wordnet support to, or alongside, the NLTK (see https://github.com/nltk/wordnet), and it included WordNet 3.1 data as an option. Development stalled, however, so I took over the effort (and package name on PyPI) with an entirely new module, which Francis has linked above.

stevenbird · 2021-09-14T01:59:58Z

@goodmami, thanks for the update. This sounds like a more sustainable option. How easily could a user of the NLTK wordnet package port their code to use your package? Does it include the similarity metrics?

fcbond · 2021-09-14T02:15:56Z

Hi, in general I think it is quite easy to port the code. The documentation has some notes on migration from the current interface: https://wn.readthedocs.io/en/latest/guides/nltk-migration.html It does have the similarity metrics. https://wn.readthedocs.io/en/latest/api/wn.similarity.html @goodmami did a lot of work :-).

…

On Tue, Sep 14, 2021 at 10:05 AM Steven Bird ***@***.***> wrote: @goodmami <https://github.com/goodmami>, thanks for the update. This sounds like a more sustainable option. How easily could a user of the NLTK wordnet package port their code to use your package? Does it include the similarity metrics? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#160 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIPZRQSAL3N6H26SESJEHLUB2UCTANCNFSM5DZTMIJA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Francis Bond <http://www3.ntu.edu.sg/home/fcbond/> Division of Linguistics and Multilingual Studies Nanyang Technological University

goodmami · 2021-09-14T04:50:42Z

Thanks, @fcbond!

@stevenbird, Wn has the similarity metrics, information content (it even reads the wordnet_ic files from nltk_data), Morphy, etc. Some absent features that may be desired are looking things up by sense keys (e.g., eat%2:34:02::; workaround) or the NLTK's shorthand synset identifiers (feed.v.06). If you wish to discuss a plan for deprecating the NLTK's wordnet module in favor of Wn, we should open separate issues to track the necessary changes to the code, data, documentation, and book.

Back to the current issue: in the modern WN-LMF format for wordnets, Definition and Example elements are structurally separate, having been split from WNDB's combined "gloss" line in the format-conversion process. That process, however, may not account for the inconsistencies noted by @genericallyterrible, who did a nice and thorough analysis in nltk/nltk#2527. So as to not let that effort go to waste, it might be good to compare it with the WNDB-to-LMF converter. The relevant code is here.

genericallyterrible mentioned this issue Sep 10, 2021

Quote author names mixed up in wordnet definitions nltk/nltk#2527

Open

goodmami mentioned this issue Nov 2, 2021

Extract wordnet into a separate package nltk/nltk#2423

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WordNet 3.0 gloss inconsistencies #160

Fix WordNet 3.0 gloss inconsistencies #160

genericallyterrible commented Sep 10, 2021

fcbond commented Sep 11, 2021 via email

goodmami commented Sep 13, 2021

stevenbird commented Sep 14, 2021

fcbond commented Sep 14, 2021 via email

goodmami commented Sep 14, 2021

Fix WordNet 3.0 gloss inconsistencies #160

Fix WordNet 3.0 gloss inconsistencies #160

Comments

genericallyterrible commented Sep 10, 2021

fcbond commented Sep 11, 2021 via email

goodmami commented Sep 13, 2021

stevenbird commented Sep 14, 2021

fcbond commented Sep 14, 2021 via email

goodmami commented Sep 14, 2021