# Natural Language Processing with NLTK

## Introduction
Natural Language Processing (NLP) is a field of Artificial Intelligence focused on enabling machines to understand, interpret, and generate human language. This notebook introduces NLP concepts using the Natural Language Toolkit (NLTK), one of Python's most popular NLP libraries.


## Basics of NLTK
### Installation and Setup
Install the NLTK library and download the required datasets.
```python
# Install NLTK
!pip install nltk

# Import and download NLTK datasets
import nltk
nltk.download('all')


## Key Features of NLTK
- Text Preprocessing - Tokenization, stemming, lemmatization, etc.
    - Tokenization - Breaking text into words, sentences, etc.
    - Stemming - Reducing words to their root form.
    - Lemmatization - Reducing words to their base form.
- Text Analysis - POS tagging, parsing, and semantic analysis.
    - POS Tagging - Assigning parts of speech to words.
    - Parsing - Analyzing the grammatical structure of sentences.
    - Semantic Analysis - Understanding the meaning of words.
- Applications - Sentiment analysis, language modeling, etc.
    - Sentiment Analysis - Determining the sentiment of text.
    - Language Modeling - Predicting the next word in a sentence.
    - Named Entity Recognition - Identifying named entities in text.


## **Basic NLP Tasks with NLTK**
#### **Tokenization**
Tokenization is the process of breaking text into words or sentences.

In [7]:
from nltk.tokenize import word_tokenize, sent_tokenize

# Example text
text = "NLTK makes it simple to process text data. Let's explore its features!"

# Word Tokenization
words = word_tokenize(text)
print("Words:", words)

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentences:", sentences)


Words: ['NLTK', 'makes', 'it', 'simple', 'to', 'process', 'text', 'data', '.', 'Let', "'s", 'explore', 'its', 'features', '!']
Sentences: ['NLTK makes it simple to process text data.', "Let's explore its features!"]


#### **Stemming**
Stemming reduces words to their root forms.

In [8]:
from nltk.stem import PorterStemmer

# Initialize the stemmer
stemmer = PorterStemmer()

# Example words
words = ["running", "runner", "easily", "quickly"]

# Apply stemming
stems = [stemmer.stem(word) for word in words]
print("Stemmed Words:", stems)


Stemmed Words: ['run', 'runner', 'easili', 'quickli']


#### **Semantic Analysis (Synonyms and Antonyms)**
Semantic analysis deals with understanding the meaning of words.

In [9]:
from nltk.corpus import wordnet

# Synonyms and Antonyms of 'happy'
synonyms = []
antonyms = []

for syn in wordnet.synsets("happy"):
    for lemma in syn.lemmas():
        synonyms.append(lemma.name())
        if lemma.antonyms():
            antonyms.append(lemma.antonyms()[0].name())

print("Synonyms:", set(synonyms))
print("Antonyms:", set(antonyms))


Synonyms: {'happy', 'well-chosen', 'glad', 'felicitous'}
Antonyms: {'unhappy'}


## **Sentiment Analysis**
#### **Preprocessing Text for Sentiment Analysis**

In [10]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Example text
text = "I absolutely love using NLTK for natural language processing. It's amazing!"

# Remove Stop Words
stop_words = set(stopwords.words('english'))
words = word_tokenize(text)
filtered_words = [word for word in words if word.lower() not in stop_words]
print("Filtered Words:", filtered_words)


Filtered Words: ['absolutely', 'love', 'using', 'NLTK', 'natural', 'language', 'processing', '.', "'s", 'amazing', '!']


#### **Sentiment Analysis with NLTK's SentimentIntensityAnalyzer**

In [None]:
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

# Initialize the sentiment analyzer
sia = SentimentIntensityAnalyzer()

# Example text
text = "I absolutely love using NLTK for natural language processing. It's amazing!"

# Analyze sentiment
sentiment = sia.polarity_scores(text)
print("Sentiment Analysis:", sentiment)


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\sreeh\AppData\Roaming\nltk_data...


Sentiment Analysis: {'neg': 0.0, 'neu': 0.387, 'pos': 0.613, 'compound': 0.9019}


: 