# Example of AI Usage

When you search the problem "How to lookup a word in an English dictionary using Python"
The Gemini in google search will tell you three ways, (1) PyDictionary (2) WordNet (3) Custom Dictionary. However, the PyDictionary is not good, and doesn't work well, please ignore.

# [Using WordNet](https://wordnet.princeton.edu)

In [1]:
%%bash
conda activate cs5293-1
pip install nltk

bash: line 1: conda: command not found


Collecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl.metadata (2.9 kB)
Collecting click (from nltk)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting joblib (from nltk)
  Downloading joblib-1.4.2-py3-none-any.whl.metadata (5.4 kB)
Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading click-8.1.8-py3-none-any.whl (98 kB)
Downloading joblib-1.4.2-py3-none-any.whl (301 kB)
Installing collected packages: joblib, click, nltk
Successfully installed click-8.1.8 joblib-1.4.2 nltk-3.9.1


In [None]:
# download the wordnet using nltk 

import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /Users/jcao/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [9]:
from nltk.corpus import wordnet as wn
# The WordNet corpus reader gives access to the Open Multilingual WordNet, 
# using ISO-639 language codes. 
# These languages are not loaded by default, but only lazily, when needed.
wn.langs()

['eng']

In [17]:
# download the Open Multilingual WordNet
nltk.download('omw-1.4')
wn.synset('spy.n.01').lemma_names('jpn')


[nltk_data] Downloading package omw-1.4 to /Users/jcao/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


['いぬ',
 'まわし者',
 'スパイ',
 '回し者',
 '回者',
 '密偵',
 '工作員',
 '廻し者',
 '廻者',
 '探',
 '探り',
 '犬',
 '秘密捜査員',
 '諜報員',
 '諜者',
 '間者',
 '間諜',
 '隠密']

In [18]:
# Now the wordnet is downloaded and we can access the words in different languages
sorted(wn.langs())

['als',
 'arb',
 'bul',
 'cat',
 'cmn',
 'dan',
 'ell',
 'eng',
 'eus',
 'fin',
 'fra',
 'glg',
 'heb',
 'hrv',
 'ind',
 'isl',
 'ita',
 'ita_iwn',
 'jpn',
 'lit',
 'nld',
 'nno',
 'nob',
 'pol',
 'por',
 'ron',
 'slk',
 'slv',
 'spa',
 'swe',
 'tha',
 'zsm']

In [6]:
print("Number of words (lemmas) in English WordNet:", len(list(wn.words())))

Number of words (lemmas) in English WordNet: 147306


In [31]:
#test_word = "7788" # non-existent word
test_word = "spy" # existent word
eg_synsets = wn.synsets(test_word)
print(f"Number of senses for the word {test_word}: {len(eg_synsets)}")
# checking if the word exists in the wordnet
if len(eg_synsets) > 0:
    print(f"The test word {test_word} exists, and the first sense: {eg_synsets[0].definition()}")

Number of senses for the word spy: 6
The test word spy exists, and the first sense: (military) a secret agent hired by a state to obtain information about its enemies or by a business to obtain industrial secrets from competitors


In [32]:
# first sense (most common usage for look)
eg_sense_1 = eg_synsets[0]
print(eg_sense_1)

Synset('spy.n.01')


In [33]:
# Let's see what is in this sense

# lemma
print('Lemma:', eg_sense_1.lemmas()[0].name())

# POS
print('POS:', eg_sense_1.pos())

# Definition
print("Definition:", eg_sense_1.definition())

# Example Usage
print("Example Usage:", '; '.join(eg_sense_1.examples()))

Lemma: spy
POS: n
Definition: (military) a secret agent hired by a state to obtain information about its enemies or by a business to obtain industrial secrets from competitors
Example Usage: 


In [13]:
# Other Languages usable through wn?

print("Languages available in WN:", ', '.join(wn.langs()))

Languages available in WN: eng


More Languages?

In [14]:
%%bash
pip install pyiwn

Collecting pyiwn
  Downloading pyiwn-0.0.5-py3-none-any.whl.metadata (778 bytes)
Downloading pyiwn-0.0.5-py3-none-any.whl (12 kB)
Installing collected packages: pyiwn
Successfully installed pyiwn-0.0.5


## [Hindi WordNet]( https://github.com/cfiltnlp/pyiwn)

In [15]:
# Hindi Wordnet: https://github.com/cfiltnlp/pyiwn

import pyiwn

wn_h = pyiwn.IndoWordNet()

2025-01-20:10:35:28,555 INFO     [helpers.py:20] Downloading IndoWordNet data of size ~31 MB...


[██████████████████████████████████████████████████]

2025-01-20:10:35:30,981 INFO     [helpers.py:43] Extracting /Users/jcao/iwn_data.tar.gz into /Users/jcao...





2025-01-20:10:35:31,459 INFO     [helpers.py:48] Removing temporary zip file from /Users/jcao/iwn_data.tar.gz
2025-01-20:10:35:31,461 INFO     [helpers.py:51] IndoWordNet data successfully downloaded at /Users/jcao/iwn_data
2025-01-20:10:35:44,719 INFO     [iwn.py:43] Loading hindi language synsets...


In [16]:
print('Number of words (lemmas) in Hindi WordNet:', len(wn_h.all_words()))

Number of words (lemmas) in Hindi WordNet: 105458


In [17]:
# synsets for "language" called "bhaasha" in Hindi

bhaasha_synsets = wn_h.synsets('भाषा')

In [18]:
bhaasha_synsets

[Synset('वचन.noun.2934'),
 Synset('सरस्वती.noun.3499'),
 Synset('भाषा.noun.5489'),
 Synset('हिंदी.noun.10893'),
 Synset('अभियोग-पत्र.noun.30944'),
 Synset('भाषा.noun.40836'),
 Synset('भाषा.noun.40837'),
 Synset('भाषा.noun.40838'),
 Synset('भाषा.noun.40839')]