# NLP
A field of AI that focuses on the interaction between computers and humans through natural language. The ultimate goal of NLP is to enable computers to understand, interpret and generate human languages in a way that is both meaningful and useful.

## Applications of NLP
- Search Engines
- Chatbot
- Language Translation

# Regular Expressions
- `.` - matches any charecter except a newline
- `\w` - matches any word charecter(alphanumaric-equivalent to `[a-zA-Z0-9_]`)
- `\d` - matches any digit(`[0-9])
- `\s` - matches any whitespace character

# Stemming
A text normalization technique used to reduce words to their base/root form. It simplify text data by reducing derived words to a common base form so that they can be analyzed as a single item.

Stemming algorithms typically remove common word suffixes(int, ly, ed) to transform a word into its root form.

__Example:__ `running` -> `run`, `better` -> `bet`

In [None]:
!pip install nltk

In [22]:
from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer
words=["running", "ran","runner","happily","happiness","better","cats"]

## PorterStemmer

In [20]:
porter=PorterStemmer()
for word in words:
    print(f"{word} -> {porter.stem(word)}")

running -> run
ran -> ran
runner -> runner
happily -> happili
happiness -> happi
better -> better
cats -> cat


## SnowballStemmer

In [24]:
snowball=SnowballStemmer(language='english')
for word in words:
    print(f"{word}->{snowball.stem(word)}")

running->run
ran->ran
runner->runner
happily->happili
happiness->happi
better->better
cats->cat


## LancasterStemmer

In [25]:
lancaster=LancasterStemmer()
for word in words:
    print(f"{word}->{lancaster.stem(word)}")

running->run
ran->ran
runner->run
happily->happy
happiness->happy
better->bet
cats->cat


# Lemmatization
A text normalization technique used to reduce words to their base form but unlike stemming, it considers the context and morphological analysis of words, aiming to reduce words to their meaningful root forms.

__Example:__ `running` -> `run`, `better` -> `good`

In [31]:
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
lemmatizer = WordNetLemmatizer()

In [40]:
def get_wordnet_pos(treebank_tag):
    if treebank_tag=="J":
        return wordnet.ADJ
    elif treebank_tag=="V":
        return wordnet.VERB
    elif treebank_tag=="N":
        return wordnet.NOUN
    elif treebank_tag=="R":
        return wordnet.ADV
    else:
        return wordnet.NOUN

In [48]:
test_words=[('running', 'v'), ('ran', 'v'), ('runner', 'n'), ('happily', 'r'), ('happiness', 'n'), ('better', 'a'), ('cats', 'n')]
for word, pos in test_words:
    lemmatized_word = lemmatizer.lemmatize(word, get_wordnet_pos(pos.upper()))
    print(f"{word} -> {lemmatized_word}")

running -> run
ran -> run
runner -> runner
happily -> happily
happiness -> happiness
better -> better
cats -> cat
