```{contents}
```
## NLP

**Natural Language Processing (NLP)** is a field of AI that enables machines to understand, interpret, and generate human language (text or speech).
It connects **linguistics, computer science, and machine learning** to process natural language.

---

### Why NLP is Important

* Human communication is mostly through **language** (text, speech, chat, documents).
* Computers naturally understand **numbers**.
* NLP bridges this gap → converting **language → numerical representations** → machine learning → useful tasks (translation, chatbots, summarization, etc.).

---

### Core Concepts in NLP

1. **Text Preprocessing**

   * Tokenization (splitting text into words/sentences)
   * Stopword removal (removing common words like *is, the, and*)
   * Stemming / Lemmatization (reducing words to root form)
   * Lowercasing, punctuation removal, handling emojis/special chars

2. **Feature Representation**

   * Bag of Words (BoW)
   * TF-IDF (Term Frequency–Inverse Document Frequency)
   * Word Embeddings (Word2Vec, GloVe, FastText)
   * Contextual embeddings (ELMo, BERT, GPT, etc.)

3. **Language Models**

   * Statistical models (n-grams, Markov chains)
   * Neural models (RNN, LSTM, GRU)
   * Transformer-based models (BERT, GPT, T5, LLaMA, etc.)

4. **Core NLP Tasks**

   * Text classification (spam detection, sentiment analysis)
   * Named Entity Recognition (NER) (extract names, dates, organizations)
   * Part-of-Speech (POS) tagging
   * Machine Translation (Google Translate, DeepL)
   * Question Answering & Chatbots
   * Summarization (extractive, abstractive)
   * Text generation (GPT models, story generation)

5. **Speech-related NLP**

   * Speech-to-Text (ASR – Automatic Speech Recognition)
   * Text-to-Speech (TTS – Siri, Alexa voices)

---

### NLP Workflow

1. Collect & clean text data
2. Preprocess (tokenize, normalize, remove noise)
3. Convert to numeric vectors (TF-IDF, embeddings)
4. Train ML/DL model (e.g., classification, sequence modeling)
5. Evaluate (accuracy, F1-score, BLEU score, ROUGE score depending on task)
6. Deploy (API, chatbot, search engine, recommendation system, etc.)

---

### Challenges in NLP

* Ambiguity (e.g., “bank” → riverbank or financial bank?)
* Sarcasm & irony detection
* Multilingual processing
* Domain-specific jargon
* Low-resource languages (few datasets available)

---

### Applications of NLP

* Chatbots & virtual assistants (ChatGPT, Alexa, Siri)
* Sentiment analysis (Twitter, reviews)
* Document summarization (news, research papers)
* Search engines (Google, Bing)
* Fraud detection in finance
* Healthcare text mining (clinical notes, prescriptions)

---

```{dropdown} Click here for Sections
```{tableofcontents}