# POS-Lemma

> Specifying the `Part-of-Speech` (POS) of a word to the WordNetLemmatizer makes it more efficient. Run the code below to see the difference.

In [1]:
from nltk.stem import WordNetLemmatizer, wordnet

lemmatizer = WordNetLemmatizer()

print("Without POS tag %s %s" % (":", lemmatizer.lemmatize("loving")))
print("With POS tag %s %s" % (":", lemmatizer.lemmatize("loving", pos = "v")))

Without POS tag : loving
With POS tag : love


🧑🏻‍🎓 Understanding the `pos_tag` from `nltk`.

Run the following cells:

In [2]:
from nltk import pos_tag
import nltk
nltk.download('averaged_perceptron_tagger') # i added

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/laurameyer/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [3]:
noun = "love"
adjective = "big"
adverb = "lovely"
verb = "loving"

In [4]:
# pos_tag([noun])

In [5]:
.upper() # --> N for noun

SyntaxError: invalid syntax (3825136496.py, line 1)

In [None]:
pos_tag([adjective])[0][1][0].upper() # --> J for adjective

In [None]:
pos_tag([adverb])[0][1][0].upper() # --> R for adverb

In [None]:
pos_tag([verb])[0][1][0].upper() # --> V for verb

❓ **Question** ❓

Create a function that lemmatizes your text, taking into account the associated POS tags. 

💡 Hint: The `WordNetLemmatizer` requires the POS tags to be specified in a certain form, different from the tags outputed by `nltk.pos_tag`. You will need to map them to the correct form.

In [None]:
# ------
# Map a POS tag to a format WordNetLemmatizer accepts:
# ------

from nltk.corpus import wordnet

def get_wordnet_pos(word):
    '''returns the POS tag in a format understood
    by the WordNetLemmatizer'''
    # YOUR CODE HERE
    postag = pos_tag([word])[0][1][0]
    
    if postag.startswith('J'):
        return wordnet.ADJ
    elif postag.startswith('V'):
        return wordnet.VERB
    elif postag.startswith('N'):
        return wordnet.NOUN
    elif postag.startswith('R'):
        return wordnet.ADV
    else:
            # As default pos in lemmatization is Noun
        return wordnet.NOUN


# ------
# Lemmatize
# ------

from nltk.tokenize import word_tokenize


"""Valid options are `"n"` for nouns,
            `"v"` for verbs, `"a"` for adjectives, `"r"` for adverbs and `"s"`
            for satellite adjectives.
"""
def pos_lemma(text):
    # YOUR CODE HERE
    tokens = word_tokenize(text)
    pos_tokens = [get_wordnet_pos(token) for token in tokens]
    lemmas = []
    for token, pos_token in zip(tokens, pos_tokens):
        print(token, pos_token)
        lemmas.append(WordNetLemmatizer().lemmatize(token, pos=pos_token))
    return lemmas
    

👇 Try your function:

In [None]:
sentence = "I am loving Paris"

In [None]:
# YOUR CODE HERE
pos_lemma(sentence)

🏁 Congratulations. With this minichallenge, you've raised some self-awareness about to find the root of a word, no matter if this is a noun, an adjective, an adverb or a verb.

💾 Don't forget to `git add / commit / push`