Add multilingual wordnet #9

stevenbird · 2013-10-28T23:16:39Z

@francisbond is contributing the Open Multilingual Wordnet to NLTK (http://www.casta-net.jp/~kuribayashi/multi/).

We need to settle on a short name to use: multiwordnet?

fcbond · 2013-11-05T22:53:45Z

There is an Italian project called 'MultiWordNet' so I would like to avoid just 'multiwordnet'. How about omw?

stevenbird · 2013-11-06T01:27:36Z

OK. We're often writing "from nltk import wordnet as wn", and so wn has gained some currency as an abbreviation for WordNet.

We could have omwn. But in a world where openness is the unmarked case, we could have mwn.

Do either of these appeal or would you still prefer omw?

fcbond · 2013-11-06T01:33:42Z

G'day,

OK. We're often writing "from nltk import wordnet as wn", and so wn has

gained some currency as an abbreviation for WordNet.

We could have omwn. But in a world where openness is the unmarked case, we
could have mwn.

Do either of these appeal or would you still prefer omw?

I alos like to thing of openness as the default, but 'mwn' is still a bit
close to Multiwordnet. I guess omwn is ok, although I have a slight
preference for 'omw'. 'wngrid' is another possibility: this is the name
chosen by the global wordnet association, and we are now the current
implementation.

Francis Bond http://www3.ntu.edu.sg/home/fcbond/
Division of Linguistics and Multilingual Studies
Nanyang Technological University

stevenbird · 2013-11-06T02:26:54Z

OK, omw it is then, thanks.

stevenbird · 2014-05-03T10:07:02Z

The list of languages in the supplied omw corpus is as follows. I think fre is spurious (a copy of fra) and we seem to be missing ind even though it is mentioned in the documentation.

als cmn eng fin fre ita mcr nor por
arb dan fas fra heb jpn msa pol tha

@fcbond would you please advise.

fcbond · 2014-05-04T21:55:24Z

The current list is as follows:

langs = ("eng", "ind", "zsm", "jpn", "tha",
"cmn", "qcn",
"fas", "arb", "heb", "ita", "por",
"nob", "nno", "dan", "swe",
"fra", "fin", "ell",
"glg", "cat", "spa", "eus",
"als", "pol", "slv")

We use qcn for traditional Chinese (and the slightly differently designed NTU, Taiwan Chinese Wordnet).

We will try to upload a new omw.zip sometime today.

t = dd(lambda: dd(unicode))

thing, lang, = label

t['eng']['eng'] = 'English'
t['eng']['ind'] = 'Inggeris'
t['eng']['zsm'] = 'Inggeris'
t['ind']['eng'] = 'Indonesian'
t['ind']['ind'] = 'Bahasa Indonesia'
t['ind']['zsm'] = 'Bahasa Indonesia'
t['zsm']['eng'] = 'Malaysian'
t['zsm']['ind'] = 'Bahasa Malaysia'
t['zsm']['zsm'] = 'Bahasa Malaysia'
t['msa']['eng'] = 'Malay'

t["swe"]["eng"] = "Swedish";
t["ell"]["eng"] = "Greek";
t["cmn"]["eng"] = "Chinese (simplified)";
t["qcn"]["eng"] = "Chinese (traditional)";
t['eng']['cmn'] = u'英语'
t['cmn']['cmn'] = u'汉语'
t['qcn']['cmn'] = u'漢語'
t['cmn']['qcn'] = u'汉语'
t['qcn']['qcn'] = u'漢語'
t['jpn']['cmn'] = u'日语'
t['jpn']['qcn'] = u'日语'

t['als']['eng'] = 'Albanian'
t['arb']['eng'] = 'Arabic'
t['cat']['eng'] = 'Catalan'
t['dan']['eng'] = 'Danish'
t['eus']['eng'] = 'Basque'
t['fas']['eng'] = 'Farsi'
t['fin']['eng'] = 'Finnish'
t['fra']['eng'] = 'French'
t['glg']['eng'] = 'Galician'
t['heb']['eng'] = 'Hebrew'
t['ita']['eng'] = 'Italian'
t['jpn']['eng'] = 'Japanese'
t['mkd']['eng'] = 'Macedonian'
t['nno']['eng'] = 'Nynorsk'
t['nob']['eng'] = u'Bokmål'
t['pol']['eng'] = 'Polish'
t['por']['eng'] = 'Portuguese'
t['slv']['eng'] = 'Slovene'
t['spa']['eng'] = 'Spanish'
t['tha']['eng'] = 'Thai'

franquattri · 2014-09-23T11:15:07Z

Hi, got the same problem that somebody posted on Quora some months ago:
"I can call:
from nltk.corpus import sinica_treebank

but when i call
from nltk.corpus import omw
The result is: cannot import name omw
No module named omw. "

I checked the downloader and the omw is installed. I am using Python 2.7.
Other modules work fine.
Any clues? Thanks in advance.

franquattri · 2014-10-03T07:54:48Z

One just needed to read the NLTK cookbook more accurately. You don't need to import the module 'omw', but you can recall it directly by simply importing wordnet (wn). More under: http://www.nltk.org/howto/wordnet.html

alvations · 2014-10-21T19:15:38Z

A user reported missing spanish lemmas from OMW: http://stackoverflow.com/questions/26474731/missing-spanish-wordnet-from-nltk/26494099#26494099

DarrenCook · 2014-10-22T09:18:14Z

@franquattri It would be useful if the howto showed full installation instructions. On Ubuntu 14.04, with the data URL fixed (http://askubuntu.com/a/527408/93794), I have wordnet and omw installed (I see them under ~/nltk_data/corpora), but when I follow through http://www.nltk.org/howto/wordnet.html a lot of the examples fail, in particular wn.langs() fails with "AttributeError: 'WordNetCorpusReader' object has no attribute 'langs'".
Is that manual for a specific version?

franquattri · 2014-10-22T09:32:55Z

Hi Darren, The manual has been updated to the NLTK 3.0 version but it should work fine with the previous NLTK versions too. I'm working with Windows, Python 2.7 and iPython (which I suggest also for Unicode matters) Both attempts work for me:
from nltk.corpus import wordnet as wn
wn.langs() and

from nltk.corpus import wordnet as wn
sorted(wn.langs()) # as showed here http://www.nltk.org/howto/wordnet.html

Can you be more specific about the examples that fail?

alvations · 2014-10-22T09:33:51Z

@DarrenCook, there are discrepancies between the API, the documentation and the nltk_data but i'm sure the OMW team will fix it and the documentation will follow shortly.

Please note that catalan seem to be missing from the wn.langs() although it's in the MCR.

>>> import nltk
>>> nltk.__version__
'3.0.0'
>>> nltk.download('omw')
[nltk_data] Downloading package omw to /home/alvas/nltk_data...
[nltk_data]   Package omw is already up-to-date!
True

>>> from nltk.corpus import wordnet as wn
>>> wn.langs()
[u'als', u'arb', u'cmn', u'dan', u'eng', u'fas', u'fin', u'fra', u'fre', u'heb', u'ita', u'jpn', u'cat', u'eus', u'glg', u'spa', u'ind', u'zsm', u'nno', u'nob', u'pol', u'por', u'tha']
>>> exit()
alvas@ubi:~$ cd ~/nltk_data/corpora/omw/
alvas@ubi:~/nltk_data/corpora/omw$ ls
als  cmn  eng  fin  fre  ita  mcr  nor  por     tha
arb  dan  fas  fra  heb  jpn  msa  pol  README

alvas@ubi:~/nltk_data/corpora/omw$ cd mcr/
alvas@ubi:~/nltk_data/corpora/omw/mcr$ ls
LICENSE     wn-data-cat.tab  wn-data-glg.tab  wn-data-spa.tab.gz
mcr2tab.py  wn-data-eus.tab  wn-data-spa.tab

DarrenCook · 2014-10-22T09:40:19Z

nltk.version
'2.0b9'

Is that too old?

(apt-get install python-nltk tells me "python-nltk is already the newest version.")

Working through the examples, the first one that fails is "print(wn.synset('dog.n.01').definition())", which says "TypeError: 'str' object is not callable". The three commands before that worked fine.

alvations · 2014-10-22T09:47:05Z

Using pip install -U nltk would update to 3.0.0. apt-get is still holding the older version.

With regards to accessing synsets from the wordnet API in NLTK, i think the major change would be nltk/nltk@ba8ab7e

Possibly you'll find errors from nltk.download() too, if you're using the apt-get branch of NLTK, see http://askubuntu.com/questions/527388/python-nltk-on-ubuntu-12-04-lts-nltk-downloadbrown-results-in-html-error-40

See also:
Change Log: https://github.com/nltk/nltk/blob/develop/ChangeLog
API Changes: https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0

franquattri · 2014-10-22T10:01:11Z

@DarrenCook you sure you have installed NLTK correctly? you can take a look here: http://www.nltk.org/install.html

To find out which nltk version you have:
import nltk
nltk.version

to update NLTK / modules (for windows) > Command Prompt > python -m pip install -upgrade SomePackage

Are you using the WN version that comes with NLTK (WN 3.0) or the newest release (i.e.have you imported it in NLTK)? There might be some issues for that reason as well.

DarrenCook · 2014-10-22T10:36:40Z

Thanks @alvations and Francesca for your help. These two commands got everything working:

sudo apt-get install python-pip
sudo pip install -U nltk

@franquattri I think I may have downloaded the latest wordnet, while having the 2.0b9 of nltk installed, so maybe that was the issue.

franquattri · 2014-10-23T13:04:51Z

Hi,
does Anybody know of multilingual framenets (apart from the English FrameNet) that can be searched with nltk?

bryant1410 · 2015-12-19T14:40:56Z

This is already done, doesn't it?

stevenbird · 2015-12-24T23:09:39Z

Thanks @bryant1410. Yes, this is resolved.

nicoleljc1227 · 2017-04-13T13:24:09Z

i download cow from http://globalwordnet.org/wordnets-in-the-world/ to process Chinese. How can i use cow in python?
for example, from nltk.corpus import wordnet as wn then how can i use cow?

fcbond · 2017-04-13T13:28:29Z

cow is already included in omw (open multilingual wordnet), so if you download that from the normal download interface, you can access cow with lang='cmn': e.g. for Japanese wn.synsets('dog')[0].lemmas(lang='jpn') [Lemma('dog.n.01.イヌ'), Lemma('dog.n.01.ドッグ'), Lemma('dog.n.01.洋犬'), Lemma('dog.n.01.犬')]

…

On Thu, Apr 13, 2017 at 9:24 AM, nicoleljc1227 ***@***.***> wrote: i download cow from http://globalwordnet.org/wordnets-in-the-world/ to process Chinese. How can i use cow in python? for example, from nltk.corpus import wordnet as wn then how can i use cow? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABD8xvdE1LZQNWx7VvZpiW5VZ6aToWJrks5rviH6gaJpZM4BJSpt> .

-- Francis Bond <http://www3.ntu.edu.sg/home/fcbond/> Division of Linguistics and Multilingual Studies Nanyang Technological University

tvrbanec · 2020-05-10T17:28:35Z

Can we use wn.synsets('dog')[0].lemmas(lang='jpn') in a way of using more than one language, ie wn.synsets('dog')[0].lemmas(lang='jpn, ita')?

ghost assigned stevenbird Oct 28, 2013

stevenbird closed this as completed Nov 6, 2013

stevenbird reopened this May 3, 2014

stevenbird closed this as completed Dec 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multilingual wordnet #9

Add multilingual wordnet #9

stevenbird commented Oct 28, 2013

fcbond commented Nov 5, 2013

stevenbird commented Nov 6, 2013

fcbond commented Nov 6, 2013

stevenbird commented Nov 6, 2013

stevenbird commented May 3, 2014

fcbond commented May 4, 2014

franquattri commented Sep 23, 2014

franquattri commented Oct 3, 2014

alvations commented Oct 21, 2014

DarrenCook commented Oct 22, 2014

franquattri commented Oct 22, 2014

alvations commented Oct 22, 2014

DarrenCook commented Oct 22, 2014

alvations commented Oct 22, 2014

franquattri commented Oct 22, 2014

DarrenCook commented Oct 22, 2014

franquattri commented Oct 23, 2014

bryant1410 commented Dec 19, 2015

stevenbird commented Dec 24, 2015

nicoleljc1227 commented Apr 13, 2017

fcbond commented Apr 13, 2017 via email

tvrbanec commented May 10, 2020

Add multilingual wordnet #9

Add multilingual wordnet #9

Comments

stevenbird commented Oct 28, 2013

fcbond commented Nov 5, 2013

stevenbird commented Nov 6, 2013

fcbond commented Nov 6, 2013

stevenbird commented Nov 6, 2013

stevenbird commented May 3, 2014

fcbond commented May 4, 2014

thing, lang, = label

franquattri commented Sep 23, 2014

franquattri commented Oct 3, 2014

alvations commented Oct 21, 2014

DarrenCook commented Oct 22, 2014

franquattri commented Oct 22, 2014

alvations commented Oct 22, 2014

DarrenCook commented Oct 22, 2014

alvations commented Oct 22, 2014

franquattri commented Oct 22, 2014

DarrenCook commented Oct 22, 2014

franquattri commented Oct 23, 2014

bryant1410 commented Dec 19, 2015

stevenbird commented Dec 24, 2015

nicoleljc1227 commented Apr 13, 2017

fcbond commented Apr 13, 2017 via email

tvrbanec commented May 10, 2020