<a href="https://colab.research.google.com/github/harshkumar999/EDA-Project-Workflow/blob/main/What_is_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Natural Language Processing (NLP)**

**NLP** is a branch of AI that helps machines to understand, interpret, and generate human language.

NLP has two components:

1. **NLU** (Natural Language Understanding)
2. **NLG** (Natural Language Generation)

`Corpus` is a collection of texts or documents. `Document` is a single unit of text, such as an article, paragraph, or webpage, in a corpus.

# **NLP Applications**

- **Product Insights**: Mining consumer reviews can reveal insights like the most loved feature, the most hated feature, improvements required, and reviews of competitors's products.

- **Security monitoring**: NLP helps in monitoring and extracting information from news articles and reports for national security purposes.

- **Pattern Identification**: Extract features like telephone numbers, email addresses, and other specific patterns from text using pattern matching.

- **Document Clustering**: Group similar documents into meaningful categories (e.g., news sections).

- **Spam Filtering**: Automatically detect and filter out spam emails.

- **Speech Recognition**: Convert spoken language into text and vice-versa.

- **E-commerce Personalization**: Recommend products based on user preferences.

- **Sentiment Analysis**: Determine the sentiment (positive, negative, neutral) of text.

- **Text Summarization**: Creates short summaries of long documents.

- **Topic Modeling**: Finding the most relevant documents based on a topic.

- **Text Generation**: Write new text automatically.

- **Compliance and Risk Management**: Check if documents follow rules and manage risks.

- **Entity Extraction**: Find important names or places in text.

- **Synthetic Data Generation**: Create fake data that looks real.

- **Intelligent Chatbots**: Build chatbots that can talk and help customers.



# NLP Applications

- **Product Insights** : Analyze customer reviews to find out:
    - What people like or dislike
    - What could be improved
    - What people think of competitors' products

- **Security monitoring** : NLP helps in monitoring and extracting information from news articles and reports for national security purposes.

- **Pattern Recognition** : Finds patterns in text, like:
  - Phone numbers
  - Email addresses

- **Document Clustering** : Groups similar documents together, like:
  - Categorizing news articles (sports, politics, etc.)

- **Spam Detection** : Automatically detects and blocks spam emails.

- **Speech to Text** : Converts spoken words into text and vice versa.

- **Product Recommendations** : Suggests products to customers based on their past behavior or preferences.

- **Sentiment Analysis** : Determines if a piece of text is positive, negative, or neutral (e.g., reviews).

- **Text Summarization** : Creates short summaries of long documents, keeping only the important information.

- **Topic Modeling** : Identifies and groups documents by topic.  

- **Text Generation** : Write new text automatically.

- **Compliance and Risk Management** : Check if documents follow rules and manage risks.

- **Entity Extraction** : Extracts key details like names and places from text.

- **Synthetic Data Generation** : Create fake data that looks real.

- **Intelligent Chatbots**: Build chatbots that can talk and help customers.


# **NLP Levels**

## **1. Lexical Analysis: Tokenizing the Text**
Lexical analysis is the first step in NLP. It involves breaking the text into smaller units known as tokens, typically words or phrases.

### **Example:**
**Sentence**: "The dog barked loudly."

- **Tokens**:  
  `["The", "dog", "barked", "loudly"]`

---

## **2. Syntactic Analysis: Analyzing Sentence Structure**
Syntactic analysis involves examining the grammatical structure of the sentence. It identifies parts of speech and the sentence's syntactic structure.

### **Example:**
**Sentence**: "The dog barked loudly."

| Word    | Part of Speech |
|---------|----------------|
| The     | Article        |
| dog     | Noun           |
| barked  | Verb           |
| loudly  | Adverb         |

- **Sentence Structure**:  
  **Subject** (`The dog`) + **Verb** (`barked`) + **Adverb** (`loudly`)

---

## **3. Semantic Analysis: Understanding Meaning**
Semantic analysis goes beyond syntax to interpret the meaning of the sentence. It focuses on the context and what the sentence conveys.

### **Example:**
**Sentence**: "The dog barked loudly."

- **Meaning**: The dog made a loud noise.
- **Interpretation**: The action is the dog producing a loud sound.

---

## **4. Discourse Integration: Contextual Understanding**
Discourse integration connects individual sentences to form a coherent context, improving the understanding of the conversation or paragraph as a whole.

### **Example:**

- **Sentence 1**: "Shelly went to the store."
- **Sentence 2**: "She bought some milk."

In this case, "She" refers to Shelly, and the second sentence logically follows from the first.

---

## **5. Pragmatic Analysis: Identifying Intent**
Pragmatics focuses on understanding the speaker's intent or the purpose behind the sentence. It considers the context and how language is used beyond its literal meaning.

### **Example:**
**Sentence**: "Could you help me with this task?"

- **Intent**: The speaker is politely requesting assistance, not giving a command.


# **NLU (Natural Language Understanding)**  
NLU focuses on analyzing and interpreting human language, understanding its meaning, structure, and context.

`Analysis Tasks`:

**Semantic Tasks**:
- **Entity Extraction:** Extracts key details like names and places from text.  
- **Text Classification**: Categorizes text into predefined labels (e.g., spam vs. non-spam).  
- **Sentiment Analysis**: Detects sentiment (positive, negative, neutral).  
- **Topic Modeling**: Identifies and groups documents by topic.    
- **Similarity/Relatedness**: Measures the similarity between words or texts.

**Syntactic Tasks**:
- **Part-of-Speech Tagging**: Labels words with their grammatical role (noun, verb, etc.).  
- **Chunking**: Groups words into meaningful units (e.g., noun phrases).  
- **Dependency Parsing**: Analyzes grammatical relationships between words in a sentence.

**NLU Applications**:
- **Document Classification**: Classifies documents (e.g., spam detection, sentiment analysis).  
- **Topic Modeling**: Identifies and groups documents by topic.  
- **Document Recommendation**: Suggests relevant documents based on content.

---

# **NLG (Natural Language Generation)**  
NLG focuses on generating human-like text or speech from data. It aims to create meaningful and logical responses or outputs.

`**Generation Tasks**`:
- **Question/Answering**: Provides answers to user queries (e.g., chatbots).  
- **Text Generation**: Creates new text or predicts the next word.  
- **Machine Translation**: Translates text from one language to another.

**NLG Applications**:
- **Image Captioning**: Generates descriptive captions for images.  
- **Text Summarization**: Creates concise summaries of longer documents.  
- **Machine Translation**: Translates text from one language to another using models like GNMT or PBMT.  
- **Chatbots**: Provides real-time responses to user queries.

---

**Key Differences**:
- **NLU**: Focuses on understanding and interpreting language (analyzing meaning, structure, and context).
- **NLG**: Focuses on generating language (creating text or speech from data).
