## <h1 align="center">Natural Language Processing Basis Operations</h1>
<br>

### Table of Content:
    1. Introduction
    2. Word Tokenization
    3. Sentence Tokenization 
    4. Removing stop words
    5. Stemming of text
    6. Lemmatizing
    7. Part of speech Tagging
    8. Reference

### 1. Introduction
    Natural language processing (NLP) is a branch of artificial intelligence that helps computers understand, interpret and manipulate human language. 

### 2. Word Tokenization

In [10]:
#Words Tokenization
from nltk.tokenize import word_tokenize
sentence= "Sun rises in the east"
print(word_tokenize(sentence))

['Sun', 'rises', 'in', 'the', 'east']


### 3. SentenceTokenization

In [11]:
#sentences Tokenization
from nltk.tokenize import sent_tokenize
example_text= "Sun rises in the east. Sun sets in the west."
print(sent_tokenize(example_text))

['Sun rises in the east.', 'Sun sets in the west.']


### 4. Removing Stop words from sentence
    Stop words are a set of commonly used words in a language. Examples- "a", "the", "is", "are" and etc.

In [12]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_text="This is an example showing off stop word filtration."

stop_words=set(stopwords.words('english'))
words = word_tokenize(example_text)
filtered_sentence =[w for w in words if not w in stop_words]
print(filtered_sentence)

['This', 'example', 'showing', 'stop', 'word', 'filtration', '.']


### 5. Stemming of Text
* Stemming is a process where words are reduced to a root word

In [3]:
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
ps = PorterStemmer()
example_words= ["python","pythoner","pythoning","pythoned"]
for w in example_words:
  print(ps.stem(w), end=" ")

print("\n")
text="It is very important to be pythonly while you are pythoning with python"
words=word_tokenize(text)
for t in words:
  print(ps.stem(t))

python python python python 

It
is
veri
import
to
be
pythonli
while
you
are
python
with
python


### 6. Lemmatizing
1. Stemming and Lemmatization both generate the root form of the inflected words. The difference is that stem might not be an actual word whereas, lemma is an actual language word.

2. Stemming follows an algorithm with steps to perform on the words which makes it faster. Whereas, in lemmatization, you used WordNet corpus and a corpus for stop words as well to produce lemma which makes it slower than stemming. You also had to define a parts-of-speech to obtain the correct lemma.

In [14]:
import nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
lemmatizer =WordNetLemmatizer()
print(lemmatizer.lemmatize("lowest",pos='a'))

[nltk_data] Error loading wordnet: <urlopen error [Errno -3] Temporary
[nltk_data]     failure in name resolution>
low


### 7. Part of speech Tagging
* POS Tagging simply means labeling words with their appropriate Part-Of-Speech.

In [15]:
from nltk.tokenize import word_tokenize
example_text="Sun rises in the east"
words = word_tokenize(example_text)
def POS_Tagging():
  try:
    for i in words:
      #print(words)
      tagged =nltk.pos_tag(words) 
    print(tagged)
  except Exception as e :
    print(str(e))

POS_Tagging()

[('Sun', 'NNP'), ('rises', 'VBZ'), ('in', 'IN'), ('the', 'DT'), ('east', 'NN')]


### 8. Reference
    1. https://www.nltk.org/
    2. https://github.com/nltk/nltk
    

### <h1 align="center"> Thank You!!</h1>

<h5 align="Right"> Rajan Sahu</h5>