                                                  # Natural Language Processing                                                 

### 🌐 Introduction to Natural Language Processing (NLP) in Machine Learning Using NLTK

Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) and Machine Learning (ML) that focuses on enabling machines to understand, interpret, and generate human language. It bridges the gap between human communication and computer understanding.

---

### 🔍 What is NLP?

NLP combines linguistics and computer science to process and analyze large amounts of natural language data. Common tasks include:

* **Text classification** (e.g., spam detection)
* **Sentiment analysis**
* **Named Entity Recognition (NER)**
* **Machine translation**
* **Speech recognition**
* **Text summarization**

---

### 🧠 NLP in Machine Learning

In ML, NLP is used to train models that can make predictions or extract information from text. The pipeline typically involves:

1. **Text Preprocessing**
2. **Feature Extraction** (e.g., Bag of Words, TF-IDF)
3. **Model Training** (e.g., Naive Bayes, SVM, LSTM)
4. **Evaluation and Prediction**

---

### 📦 Introduction to NLTK (Natural Language Toolkit)

**NLTK** is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources along with libraries for:

* Tokenization
* Lemmatization
* POS tagging
* Parsing
* WordNet access

Install NLTK:

```bash
pip install nltk
```

---

### ✅ Basic NLP Tasks Using NLTK

```python
import nltk
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

text = "Natural Language Processing makes it possible for machines to understand human language."

# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)

# Stopword Removal
filtered = [word for word in tokens if word.lower() not in stopwords.words('english')]
print("Filtered Tokens:", filtered)

# Lemmatization
lemmatizer = WordNetLemmatizer()
lemmatized = [lemmatizer.lemmatize(word) for word in filtered]
print("Lemmatized:", lemmatized)
```

---

### 📌 Why Use NLTK?

* Beginner-friendly
* Extensive documentation and corpora
* Excellent for prototyping NLP workflows
* Good integration with other ML libraries (e.g., Scikit-learn)

---




# **NLTK modules** and their **functionalities**:

| **Module**                    | **Functionality**                                  | **Example Functions**                       | **Use Case**                                         |
| ----------------------------- | -------------------------------------------------- | ------------------------------------------- | ---------------------------------------------------- |
| `nltk.tokenize`               | Splits text into sentences or words (tokenization) | `word_tokenize()`, `sent_tokenize()`        | Preprocessing input text                             |
| `nltk.corpus`                 | Access to text corpora and lexical resources       | `stopwords.words()`, `names.words()`        | Working with language datasets                       |
| `nltk.stem`                   | Reduces words to their stem/root form              | `PorterStemmer()`, `LancasterStemmer()`     | Normalizing words (e.g., "running" → "run")          |
| `nltk.stem.WordNetLemmatizer` | Converts words to their lemma (dictionary form)    | `lemmatize()`                               | Semantic normalization (more accurate than stemming) |
| `nltk.probability`            | Support for frequency distributions                | `FreqDist()`                                | Word frequency analysis                              |
| `nltk.tag`                    | Assigns POS (Part of Speech) tags to tokens        | `pos_tag()`                                 | Syntax analysis                                      |
| `nltk.chunk`                  | Groups tokens into meaningful phrases              | `ne_chunk()`                                | Named Entity Recognition (NER)                       |
| `nltk.parse`                  | Syntax parsing of sentences                        | Various parsers                             | Tree-based parsing                                   |
| `nltk.classify`               | Text classification using machine learning         | `NaiveBayesClassifier`, `SklearnClassifier` | Sentiment analysis, spam detection                   |
| `nltk.translate`              | Tools for machine translation and BLEU scoring     | `bleu_score`                                | Translation evaluation                               |
| `nltk.draw`                   | Visualization of parse trees and relationships     | `tree.draw()`, `dispersion_plot()`          | NLP visualizations                                   |
| `nltk.metrics`                | Evaluation metrics for NLP models                  | `edit_distance()`, `accuracy()`             | Model evaluation                                     |
| `nltk.sentiment`              | Pre-built tools for sentiment analysis             | `SentimentIntensityAnalyzer`                | Polarity scoring (positive/negative)                 |

---

