**Heteronyms** are the words that have same spelling but mean different things when pronounced differently. 


- Recall the word *lead* from the lectures. It can refer to the metal lead or the act of leadership. The two pronounciations have different meanings.

- For machine translation systems or text to speech systems, the ability to identify the correct sense of the word is crucial.




Let us have a look at this example:

https://translate.google.com/?sl=en&tl=hi&text=She%20wished%20she%20could%20desert%20him%20in%20the%20desert.%0A&op=translate

Example taken from: http://www-personal.umich.edu/~cellis/heteronym.html


In [2]:
# Import SpaCy library
!pip install spacy
import spacy 

Collecting spacy
  Downloading spacy-3.5.3-cp39-cp39-macosx_10_9_x86_64.whl (6.9 MB)
[K     |████████████████████████████████| 6.9 MB 6.7 MB/s eta 0:00:01
[?25hCollecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.8-cp39-cp39-macosx_10_9_x86_64.whl (107 kB)
[K     |████████████████████████████████| 107 kB 65.9 MB/s eta 0:00:01
[?25hCollecting pydantic!=1.8,!=1.8.1,<1.11.0,>=1.7.4
  Downloading pydantic-1.10.9-cp39-cp39-macosx_10_9_x86_64.whl (2.9 MB)
[K     |████████████████████████████████| 2.9 MB 66.9 MB/s eta 0:00:01
Collecting spacy-legacy<3.1.0,>=3.0.11
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl (29 kB)
Collecting pathy>=0.10.0
  Downloading pathy-0.10.1-py3-none-any.whl (48 kB)
[K     |████████████████████████████████| 48 kB 20.2 MB/s eta 0:00:01
Collecting thinc<8.2.0,>=8.1.8
  Downloading thinc-8.1.10-cp39-cp39-macosx_10_9_x86_64.whl (867 kB)
[K     |████████████████████████████████| 867 kB 42.0 MB/s eta 0:00:01
[?25hCollecting spacy-loggers<2.0.0,>=1.0

2023-06-08 15:28:51.078737: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [4]:
!spacy download en_core_web_sm
# Load pre-trained SpaCy model for performing basic 
# NLP tasks such as POS tagging, parsing, etc.
model = spacy.load("en_core_web_sm")

Collecting en-core-web-sm==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl (12.8 MB)
[K     |████████████████████████████████| 12.8 MB 7.0 MB/s eta 0:00:01
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.5.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [14]:
#Use the model to process the input sentence
tokens = model("The speakers are pathetic")

In [15]:
# Print the tokens and their respective PoS tags.
for token in tokens:
    print(token.text, "--", token.pos_, "--", token.tag_)

The -- DET -- DT
speakers -- NOUN -- NNS
are -- AUX -- VBP
pathetic -- ADJ -- JJ


Note here that in the above example, the two instances of *desert* have different PoS tags and hence, the text to speech system can use this information to generate the correct pronounciation. 

The above task is a specific example of the larger NLP problem called Word Sense Disambiguation (WSD). For words that have more than one meaning, WSD is the problem of identifying the correct meaning of the word based on the context in which the word is used.



Note that this technique will not work when the different meanings have the same PoS tags.

https://translate.google.com/?sl=en&tl=hi&text=The%20bass%20swam%20around%20the%20bass%20drum%20on%20the%20ocean%20floor.&op=translate

In [None]:
# Let's take a new example.
tokens = model("The bass swam around the bass drum on the ocean floor")
for token in tokens:
    print(token.text, "--", token.pos_, "--", token.tag_)

The -- DET -- DT
bass -- NOUN -- NN
swam -- PROPN -- NNP
around -- ADP -- IN
the -- DET -- DT
bass -- NOUN -- NN
drum -- NOUN -- NN
on -- ADP -- IN
the -- DET -- DT
ocean -- NOUN -- NN
floor -- NOUN -- NN
