![logo](wn-logo-rotate.svg)

# Wn Demonstration

This is a demonstration of the [Wn](https://github.com/goodmami/wn/) library for working with wordnets in Python. To run this notebook locally, you will need to use [Python 3.6](https://www.python.org/) or higher and install the `wn` and `jupyter` packages, and download some wordnet data:

* Linux/macOS

  ```console
  $ python3 -m pip install wn jupyter
  $ python3 -m wn download pwn:3.0 omw ewn:2020
  ```
  
* Windows

  ```console
  > py -3 -m pip install wn jupyter
  > py -3 -m wn download pwn:3.0 omw ewn:2020
  ```

Now you should be able to import the `wn` package:

In [1]:
import wn

## Primary Queries

A **primary query** of the database is when basic parameters such as word forms, parts of speech, or public identifiers (e.g., synset IDs) are used to retrieve basic wordnet entities. You can perform these searches via module-level functions such as [wn.words()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.words), [wn.senses()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.senses), and [wn.synsets()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.synsets):

In [2]:
wn.words('Malacca')

[Word('ewn-Malacca-n')]

In [3]:
wn.synsets('Malacca')

[Synset('ewn-08985168-n')]

### Filtering by Language / Lexicon

Once you've added multiple wordnets, however, you will often get many results for such queries. If that's not clear, then the following will give you some idea(s):

In [4]:
wn.words('idea')

[Word('pwn-idea-n'),
 Word('finwn-lex50473'),
 Word('iwn-lex23768'),
 Word('slkwn-lex16202'),
 Word('zsmwn-lex71794'),
 Word('glgwn-lex1079'),
 Word('euswn-lex10715'),
 Word('spawn-lex29439'),
 Word('itawn-lex15594'),
 Word('polwn-lex24963'),
 Word('catwn-lex37718'),
 Word('ewn-idea-n')]

You can filter down the results by language, but that may not be enough if you have multiple wordnets for the same language (e.g., the [Princeton WordNet](https://wordnet.princeton.edu/) of English and the [English WordNet](https://en-word.net/)):

In [5]:
wn.words('idea', lang='en')

[Word('pwn-idea-n'), Word('ewn-idea-n')]

The [wn.lexicons()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.lexicons) function can show which lexicons have been added for a language:

In [6]:
wn.lexicons(lang='en')

(<Lexicon pwn:3.0 [en]>, <Lexicon ewn:2020 [en]>)

You can use the `id:version` string to restrict queries to a particular lexicon:

In [7]:
wn.words('idea', lexicon='pwn:3.0')

[Word('pwn-idea-n')]

But it can become tedious to enter these specifiers each time. Instead, a [wn.Wordnet](https://wn.readthedocs.io/en/latest/api/wn.html#the-wordnet-class) object can be used to make the language/lexicon filters persistent:

In [8]:
pwn = wn.Wordnet(lexicon='pwn:3.0')
pwn.words('idea')

[Word('pwn-idea-n')]

### Filtering by Word Form and Part of Speech

Even within a single lexicon a word may return multiple results:

In [9]:
pwn.words('pencil')

[Word('pwn-pencil-n'), Word('pwn-pencil-v')]

You can restrict results by part of speech, as well. E.g., to get the verbal sense of *pencil* (e.g., *to pencil in an appointment*), use the `pos` filter:

In [10]:
pwn.words('pencil', pos='v')

[Word('pwn-pencil-v')]

This works for getting senses and synsets, too:

In [11]:
pwn.senses('pencil')

[Sense('pwn-pencil-n-hs-03908204'),
 Sense('pwn-pencil-n-hs-03908456'),
 Sense('pwn-pencil-n-hs-13863020'),
 Sense('pwn-pencil-n-hs-14796748'),
 Sense('pwn-pencil-v-hs-01688604')]

In [12]:
pwn.senses('pencil', pos='v')

[Sense('pwn-pencil-v-hs-01688604')]

In [13]:
pwn.synsets('pencil', pos='v')

[Synset('pwn-01688604-v')]

The wordform itself is just a filter on the results. Leaving it off, you can get all results for a particular part of speech:

In [14]:
len(pwn.words(pos='v'))

11531

Or all results, regardless of the part of speech:

In [15]:
len(pwn.words())

158828

## Secondary Queries

**Secondary queries** are used when you want to get additional information from a retrieved entity, such as the forms of a word or the definition of a synset. They are also used for finding links between entities, such as the senses of a word or the relations of a sense or synset.

In [16]:
pencil = pwn.words('pencil', pos='v')[0]
pencil.lemma()

'pencil'

In [17]:
pencil.forms()

['pencil', 'pencilled', 'pencilling']

In [18]:
pencil.pos

'v'

In [19]:
pencil.senses()

[Sense('pwn-pencil-v-hs-01688604')]

In [20]:
pencil.senses()[0].synset()

Synset('pwn-01688604-v')

In [21]:
pencil.synsets()  # shorthand for the above

[Synset('pwn-01688604-v')]

In [22]:
pencil_ss = pencil.synsets()[0]
pencil_ss.definition()

'write, draw, or trace with a pencil; "he penciled a figure"'

In [23]:
pencil_ss.examples()  # depends on the wordnet data; see below

[]

In [24]:
wn.synsets('pencil', pos='v', lexicon='ewn:2020')[0].examples()  # EWN splits examples from definitions

['"he penciled a figure"']

In [25]:
pencil_ss.hypernyms()

[Synset('pwn-01690294-v')]

In [26]:
pencil_ss.hypernyms()[0].lemmas()

['draw']

## Taxonomy Queries

A common usage of wordnets is exploring the taxonomic structure via hypernym and hyponym relations. These operations thus have some more dedicated functions. For instance, path functions show the synsets from the starting synset to some other synset or the taxonomic root, such as [Synset.hypernym_paths()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.hypernym_paths):

In [27]:
for path in pencil_ss.hypernym_paths():
    for i, ss in enumerate(path):
        print('  ' * i, ss, ss.lemmas())

 Synset('pwn-01690294-v') ['draw']
   Synset('pwn-01686132-v') ['represent', 'interpret']
     Synset('pwn-01619354-v') ['re-create']
       Synset('pwn-01617192-v') ['make', 'create']


Paths do not include the starting synset, so the length of the path (i.e., number of edges) is the length of the list of synsets. The length from a synset to the root is called the *depth*. However, as some synsets have multiple paths to the root, there is not always one single depth. Instead, the [Synset.min_depth()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.min_depth) and [Synset.max_depth()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.max_depth) methods find the lengths of the shortest and longest paths.

In [28]:
dog = pwn.synsets('dog', pos='n')[0]
len(dog.hypernym_paths())  # two paths

2

In [29]:
dog.min_depth(), dog.max_depth()

(8, 13)

It is also possible to find paths between two synsets by their lowest common hypernym (also called *least common subsumer*). Here I compare the verbs *pencil* and *pen*:

In [30]:
pen_ss = pwn.synsets('pen', pos='v')[0]
for path in pen_ss.hypernym_paths():
    for i, ss in enumerate(path):
        print('  ' * i, ss, ss.lemmas())

 Synset('pwn-01697816-v') ['create verbally']
   Synset('pwn-01617192-v') ['make', 'create']


In [31]:
pencil_ss.lowest_common_hypernyms(pen_ss)

[Synset('pwn-01617192-v')]

In [32]:
for ss in pencil_ss.shortest_path(pen_ss):
    print(ss, ss.lemmas())

Synset('pwn-01690294-v') ['draw']
Synset('pwn-01686132-v') ['represent', 'interpret']
Synset('pwn-01619354-v') ['re-create']
Synset('pwn-01617192-v') ['make', 'create']
Synset('pwn-01697816-v') ['create verbally']
Synset('pwn-01698271-v') ['write', 'indite', 'compose', 'pen']


## Interlingual Queries

In Wn, each wordnet (lexicon) added to the database is given its own, independent structure. All queries that traverse across wordnets make use of the Interlingual index (ILI) on synsets.

In [33]:
pencil_ss = pwn.synsets('pencil', pos='n')[0]  # for this we'll use the nominal sense
pencil_ss.definition()

'a thin cylindrical pointed writing implement; a rod of marking substance encased in wood'

To get the corresponding words, senses, or synsets in some other lexicon, use the [Word.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Word.translate), [Sense.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Sense.translate), and [Synset.translate()](https://wn.readthedocs.io/en/latest/api/wn.html#wn.Synset.translate) functions. Of these, the function on the sense is the most natural, as it translates a specific meaning of a specific word, although all translations go through the synsets. As a word may have many senses, translating a word returns a mapping of each sense to its list of translations.

In [34]:
pencil_ss.translate(lang='it')[0].lemmas()

['lapis', 'matita']

In [35]:
pencil_ss.translate(lexicon='jpnwn')[0].lemmas()

['鉛筆', 'ペンシル', '木筆']

In [36]:
pwn.words('pencil', pos='n')[0].translate(lexicon='jpnwn')

{Sense('pwn-pencil-n-hs-03908204'): [Word('jpnwn-lex80679'),
  Word('jpnwn-lex82279'),
  Word('jpnwn-lex82280')],
 Sense('pwn-pencil-n-hs-03908456'): [Word('jpnwn-lex82279')],
 Sense('pwn-pencil-n-hs-13863020'): [],
 Sense('pwn-pencil-n-hs-14796748'): [Word('jpnwn-lex80679')]}

Interlingual synsets are also used to traversing relations from another wordnet. For instance, many of the lexicons in the [Open Multilingual Wordnet](https://lr.soh.ntu.edu.sg/omw/omw) were created using the *expand* method where only words were translated on top of Princeton Wordnet synsets. All relations (hypernyms, hyponyms, etc.) then depend on those from the PWN. In Wn, a [Wordnet](https://wn.readthedocs.io/en/latest/api/wn.html#the-wordnet-class) object may be instantiated with an `expand` parameter which selects lexicons containing such relations. By default, all lexicons are used (i.e., `expand='*'`), but you can tell Wn to not use any expand lexicons (`expand=''`) or to use a specific lexicon (`expand='pwn:3.0'`). By being specific, you can better control the behaviour of your program, e.g., for experimental reproducibility.

In [37]:
wn.Wordnet(lexicon='jpnwn').synsets('鉛筆')[0].hypernyms()  # by default, any other installed lexicon may be used

[Synset('jpnwn-04608567-n')]

In [38]:
wn.Wordnet(lexicon='jpnwn', expand='').synsets('鉛筆')[0].hypernyms()  # disable interlingual query expansion

[]

In [39]:
wn.Wordnet(lexicon='jpnwn', expand='pwn:3.0').synsets('鉛筆')[0].hypernyms()  # specify the expand set

[Synset('jpnwn-04608567-n')]