# Wordnet

Estonian WordNet API provides means to query Estonian WordNet. WordNet is a network of synsets, in which synsets are collections of synonymous words and are connected to other synsets via relations.

For using Estonian WordNet, you need to get Wordnet database files. There are two possibilities:

* If you create a new instance of `Wordnet` and database files are missing, you'll be prompted with a question asking for permission to download the files;
* Alternatively, you can pre-download database files as manually via `download` function:

```python
from estnltk import download
download('wordnet')
```

First, let's import the module and create a WordNet object:

In [1]:
from estnltk.wordnet import Wordnet

In [2]:
wn = Wordnet()

## Synsets

The most common use for the API is to query synsets. Synsets can be queried in several ways. 

The Wordnet class is iterable, so to get all synset objects, we can iterate over it.

In [3]:
for i, synset in enumerate(wn):
    if i == 25:
        break
    print(synset)

Synset('korda seadma.v.03')
Synset('korraldamine.n.03')
Synset('küsima.v.02')
Synset('küsimine.n.02')
Synset('kallutama.v.01')
Synset('laskma.v.04')
Synset('laskmine.n.03')
Synset('nõus olema.v.01')
Synset('informeerima.v.01')
Synset('informeerimine.n.02')
Synset('seletama.v.02')
Synset('seletamine.n.02')
Synset('väljendama.v.03')
Synset('väljendamine.n.04')
Synset('kõnelema.v.03')
Synset('avaldama.v.04')
Synset('avaldamine.n.02')
Synset('mõtlema.v.02')
Synset('hoolduspere.n.01')
Synset('häälitsema.v.01')
Synset('valimistulemus.n.01')
Synset('kirjutama.v.02')
Synset('kirjutamine.n.02')
Synset('fikseerima.v.02')
Synset('registreerimine.n.02')


We can specify pos to get all synsets with words of specific type:

In [4]:
verbs = wn.synsets_with_pos('v')
for i, synset in enumerate(verbs):
    if i == 25:
        break
    print(synset)

Synset('aadeldama.v.01')
Synset('aaderdama.v.01')
Synset('aasama.v.01')
Synset('aasima.v.01')
Synset('aasima.v.02')
Synset('aatlema.v.01')
Synset('abandoonima.v.01')
Synset('abhorreerima.v.01')
Synset('abielluma.v.01')
Synset('abielu sõlmima.v.01')
Synset('abiks olema.v.01')
Synset('abistama.v.01')
Synset('ablakteerima.v.01')
Synset('ablakteerima.v.02')
Synset('ablastuma.v.01')
Synset('aboneerima.v.01')
Synset('abordeerima.v.01')
Synset('aborteerima.v.01')
Synset('aborteeruma.v.01')
Synset('abortima.v.02')
Synset('abortima.v.03')
Synset('absolutiseerima.v.01')
Synset('absolveerima.v.01')
Synset('absorbeerima.v.01')
Synset('absorbeeruma.v.01')


This returns all the synsets of which part of speech is “verb”. We can also query synsets by providing a lemma:

In [5]:
wn['laulma']

["Synset('laulma.v.01')", "Synset('laulma.v.02')"]

or provide both a lemma and pos:

In [6]:
wn['laulma', 'v']

["Synset('laulma.v.01')", "Synset('laulma.v.02')"]

In [7]:
wn[('laulma', 'v')]

["Synset('laulma.v.01')", "Synset('laulma.v.02')"]

The previous options return a list of synsets. However, it is also possible to query for a synset by its position in the list. For example, if you only want the second synset with the lemma 'laulma', you can specify it like this (this option will return a synset object):

In [8]:
wn['laulma', 2]

"Synset('laulma.v.02')"

It's also possible to retrieve a synset's details, like name and pos:

In [9]:
synset = wn['laulma'][0]
print(synset.name)
print(synset.pos)

laulma.v.01
v


We can also query the definition and examples:

In [10]:
synset.definition

'hääleelundite abil teat. rütmi ja kõrgusega helide jada kuuldavale tooma (EKSS)'

In [11]:
synset.examples

['Vaikselt, tasa, kõvasti, ilusasti, kõlavalt, kähinal, heleda häälega laulma (EKSS).']

And we can get all lemmas of the synset:

In [12]:
print( wn['laulja'][0] )
print( wn['laulja'][0].lemmas )

Synset('laulja.n.01')
['laulja', 'vokalist']


Finally, you can also retrieve a Synset object by its name:

In [13]:
wn.get_synset_by_name('laulja.n.01')

"Synset('laulja.n.01')"

Note: if you need to retrieve a large number of Synset objects by their names, then iterating over the Wordnet class and simply picking synsets by their names may be more efficient than using get_synset_by_name.

## Relations

We can also query related synsets. There are relations, for which there are specific methods:

In [14]:
synset.hypernyms

["Synset('häälitsema.v.01')"]

In [15]:
synset.hyponyms

["Synset('aiduraidutama.v.01')",
 "Synset('lillutama.v.02')",
 "Synset('joodeldama.v.01')",
 "Synset('leelotama.v.01')",
 "Synset('kõõrutama.v.02')",
 "Synset('kaasitama.v.01')",
 "Synset('joiguma.v.01')",
 "Synset('helletama.v.01')",
 "Synset('üles laulma.v.01')",
 "Synset('ümisema.v.02')",
 "Synset('tremoleerima.v.01')"]

In [16]:
synset.holonyms

[]

In [17]:
synset.meronyms

[]

In [18]:
synset.member_holonyms

[]

More specific relations can be queried with a universal method:

In [19]:
synset.get_related_synset("involved_agent")

["Synset('laulja.n.01')"]

We can also find all ancestors of a synset using a specified relation:

In [20]:
wn["jalats"][0].closure("hyponym")

["Synset('juust.n.02')",
 "Synset('koturn.n.01')",
 "Synset('itšiig.n.01')",
 "Synset('spordijalats.n.01')",
 "Synset('võimlemissuss.n.01')",
 "Synset('rattaking.n.01')",
 "Synset('golfiking.n.01')",
 "Synset('jooksuking.n.01')",
 "Synset('tennis.n.01')",
 "Synset('botas.n.01')",
 "Synset('kets.n.01')",
 "Synset('jalgpallijalats.n.01')",
 "Synset('uisusaabas.n.01')",
 "Synset('suusasaabas.n.01')",
 "Synset('mäesuusasaabas.n.01')",
 "Synset('balletisuss.n.01')",
 "Synset('kaloss.n.01')",
 "Synset('soome suss.n.01')",
 "Synset('papu.n.01')",
 "Synset('vahetusjalats.n.01')",
 "Synset('plätu.n.01')",
 "Synset('saabas.n.01')",
 "Synset('matkasaabas.n.01')",
 "Synset('sukksaabas.n.01')",
 "Synset('mägisaabas.n.01')",
 "Synset('vildik.n.01')",
 "Synset('kirsa.n.01')",
 "Synset('kroomik.n.01')",
 "Synset('kalavinsk.n.01')",
 "Synset('kummik.n.01')",
 "Synset('kalamehesaabas.n.01')",
 "Synset('suss.n.02')",
 "Synset('ratsasaabas.n.01')",
 "Synset('kamass.n.01')",
 "Synset('unta.n.01')",
 "Synse

Finally, you can use the method all_relation_types to list all relation types available in this Wordnet:

In [21]:
wn.all_relation_types()

['hypernym',
 'similar',
 'domain_topic',
 'hyponym',
 'is_caused_by',
 'role',
 'state_of',
 'also',
 'holo_part',
 'location',
 'mero_location',
 'mero_part',
 'causes',
 'target_direction',
 'involved_location',
 'involved_agent',
 'instrument',
 'mero_substance',
 'involved',
 'holonym',
 'meronym',
 'be_in_state',
 'agent',
 'involved_instrument',
 'has_domain_topic',
 'mero_member',
 'mero_portion',
 'holo_member',
 'holo_substance',
 'antonym',
 'subevent',
 'holo_location',
 'patient',
 'involved_patient',
 'holo_portion',
 'is_subevent_of',
 'other',
 'involved_source_direction',
 'involved_target_direction',
 'in_manner',
 'has_domain_region',
 'involved_result',
 'instance_hyponym',
 'domain_region',
 'classified_by']

## Similarities

We can measure distance or similarity between two synsets in several ways. For calculating similarity, we provide path, Leacock-Chodorow and Wu-Palmer similarities:

In [22]:
synset = wn['aprill'][0]
target_synset = wn['mai'][0]

In [23]:
wn.path_similarity(synset, target_synset)

0.3333333333333333

In [24]:
wn.lch_similarity(synset, target_synset)

2.159484249353372

In [25]:
wn.wup_similarity(synset, target_synset)

0.8333333333333334

In addition, we can also find the closest common ancestor via hypernyms:

In [26]:
wn.lowest_common_hypernyms(synset, target_synset)

["Synset('kalendrikuu.n.01')"]

## EstWN versions

By default, WordNet object uses the latest EstWN version that is available in estnltk_resources.

In [27]:
wn.version

'2.6.0'

Use `ResourceView` to browse which versions are available:

In [28]:
from estnltk import ResourceView
ResourceView('wordnet')

name,description,license,downloaded
estwordnet_2023-07-20,Database files for Estonian Wordnet API. This resource is based on Estonian Wordnet version 2.6.0. The original source for creating the database files: https://github.com/estnltk/estnltk-model-training/blob/main/wordnet/data_extraction.ipynb (size: 26M),CC BY-SA 4.0,True
estwordnet_2020-06-30,Database files of Estonian Wordnet version 2.3.2. This resource was created by Birgit Sõrmus via XML wordnet conversion utilities from: https://github.com/estnltk/estnltk/blob/version_1.6/estnltk/wordnet/data_import/data_extract.ipynb (size: 22M),CC BY-SA 4.0,False


If you want to use a different version of EstWN, then use the `download` function to get the resource and then initialize WordNet object with `version` parameter to use specific EstWN version:

```python
from estnltk import download

# Download version '2.3.2'
download('estwordnet_2020-06-30')

# Initialize estwn version '2.3.2'
wn_old = Wordnet(version='2.3.2')
```