# Question 351

## Description

Word sense disambiguation is the problem of determining which sense a word takes on in a particular setting, if that word has multiple meanings. For example, in the sentence "I went to get money from the bank", bank probably means the place where people deposit money, not the land beside a river or lake.

Suppose you are given a list of meanings for several words, formatted like so:

```json
{
    "word_1": ["meaning one", "meaning two", ...],
    ...
    "word_n": ["meaning one", "meaning two", ...]
}
```

Given a sentence, most of whose words are contained in the meaning list above, create an algorithm that determines the likely sense of each possibly ambiguous word.

## Solution 

Word sense disambiguation is a non-trivial task that has been approached in many ways, including supervised learning, dictionary-based methods, and more. Here's a simple context-based algorithm to tackle the problem:

Algorithm

* For each ambiguous word in the sentence, we'll extract a context window - a fixed number of words before and after the target word.
* For each sense of the ambiguous word, we'll measure how many words from its definition are in the context window.
* The sense with the highest overlap with the context window is the most likely meaning.

In [1]:

def disambiguate_sentence(sentence, word_meanings, window_size = 3):
    words = sentence.split()
    results = {}
    
    for index, word in enumerate(words):
        if word in word_meanings:
            # define the context window
            start = max(0, index - window_size)
            end = min(len(words), index + window_size + 1)
            context = set(words[start:end])
            
            # find overlap for each sense
            best_sense = None
            max_overlap = -1
            
            for sense in word_meanings[word]:
                sense_words = set(sense.split())
                overlap = len(context.intersection(sense_words))
                
                if overlap > max_overlap:
                    max_overlap = overlap
                    best_sense = sense
            
            results[word] = best_sense
    
    return results


meanings = {
    "bank": ["a financial institution", "sides of a river"],
    "bark": ["sound of a dog", "covering of a tree"]
}

sentence = "I went to get money from the bank"
print(disambiguate_sentence(sentence, meanings))  # Expected: {'bank': 'a financial institution'}


{'bank': 'a financial institution'}
