### Q1: Compare and contrast NLTK and spaCy in terms of features, ease of use, and performance.

**Answer:**
- **NLTK (Natural Language Toolkit):**
  - Rich collection of linguistic resources (corpora, lexical databases).
  - Great for teaching, research, and prototyping.
  - Provides granular control over tokenization, parsing, POS tagging.
  - Slower performance compared to modern libraries.
  - Steeper learning curve due to verbose APIs.

- **spaCy:**
  - Industrial-strength NLP library optimized for production.
  - Fast, efficient, and memory-friendly.
  - Built-in pipelines for tokenization, lemmatization, POS tagging, NER.
  - Easy integration with deep learning frameworks (TensorFlow, PyTorch).
  - Less focused on linguistic theory, more on practical applications.

**Comparison:**  
NLTK is academic and resource-rich but slower, while spaCy is modern, fast, and production-ready.

---

### Q2: What is TextBlob and how does it simplify common NLP tasks like sentiment analysis and translation?

**Answer:**
- **TextBlob** is a Python library built on top of NLTK and Pattern.
- Provides a simple API for common NLP tasks:
  - **Sentiment Analysis** → Returns polarity (-1 to 1) and subjectivity (0 to 1).
  - **Translation** → Uses Google Translate API for multilingual support.
  - **Tokenization, POS tagging, noun phrase extraction** with minimal code.
- **Simplification:** Abstracts complex NLP pipelines into easy-to-use functions, making it beginner-friendly.

---

### Q3: Explain the role of Stanford NLP in academic and industry NLP projects.

**Answer:**
- **Stanford NLP (CoreNLP):**
  - Developed by Stanford University, widely used in academia.
  - Provides state-of-the-art models for POS tagging, dependency parsing, sentiment analysis, NER.
  - Java-based but has Python wrappers for integration.
- **Academic Role:** Benchmark tool for linguistic research and NLP experiments.
- **Industry Role:** Used in production systems requiring robust linguistic analysis, especially where accuracy is prioritized over speed.

---

### Q4: Describe the architecture and functioning of a Recurrent Neural Network (RNN).

**Answer:**
- **Architecture:**
  - Sequential model designed for time-series or text data.
  - Each unit takes input at time `t` and hidden state from time `t-1`.
  - Hidden state acts as memory, capturing context from previous steps.
- **Functioning:**
  - Processes sequences one step at a time.
  - Updates hidden state using weights and activation functions.
  - Outputs predictions based on current input and past context.
- **Limitation:** Struggles with long-term dependencies due to vanishing/exploding gradients.

---

### Q5: What is the key difference between LSTM and GRU networks in NLP applications?

**Answer:**
- **LSTM (Long Short-Term Memory):**
  - Uses three gates: input, forget, output.
  - Maintains a cell state for long-term memory.
  - More complex, higher computational cost.
- **GRU (Gated Recurrent Unit):**
  - Uses two gates: reset and update.
  - Combines hidden and cell state into one.
  - Simpler, faster, fewer parameters.
- **Key Difference:**  
  LSTM is more powerful for capturing long dependencies, while GRU is computationally efficient and often performs similarly with less complexity.

---

### Q10: You are working on a chatbot for a mental health platform. Explain how you would leverage LSTM or GRU networks along with libraries like spaCy or Stanford NLP to understand and respond to user input effectively. Detail your architecture, data preprocessing pipeline, and any ethical considerations.

**Answer:**
- **Architecture:**
  - Input text → Preprocessing (spaCy/Stanford NLP) → Embedding (Word2Vec/GloVe/BERT) → LSTM/GRU → Dense layers → Response generation.
- **Data Preprocessing Pipeline:**
  - Tokenization, lemmatization, stopword removal (spaCy).
  - Named Entity Recognition for identifying sensitive terms (e.g., emotions, medical conditions).
  - Embedding for semantic representation.
- **Model:**
  - LSTM/GRU captures sequential context of user queries.
  - Output layer classifies intent (e.g., “seeking help”, “expressing emotion”).
  - Response generator maps intent to empathetic replies.
- **Ethical Considerations:**
  - Ensure privacy and confidentiality of user data.
  - Avoid giving medical diagnoses; provide supportive, non-clinical responses.
  - Include escalation mechanism to connect users with human professionals when needed.
  - Bias mitigation in training data to avoid harmful stereotypes.

---


In [2]:
# Q6
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Download required resources (run once)
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger_eng') # Changed to specific English tagger
nltk.download('punkt_tab')

# Sample text
text = "Natural Language Processing is fascinating and widely used in AI applications."

# Tokenization
tokens = word_tokenize(text)
print("Tokens:", tokens)

# POS Tagging
pos_tags = pos_tag(tokens)
print("POS Tags:", pos_tags)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


Tokens: ['Natural', 'Language', 'Processing', 'is', 'fascinating', 'and', 'widely', 'used', 'in', 'AI', 'applications', '.']
POS Tags: [('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('is', 'VBZ'), ('fascinating', 'VBG'), ('and', 'CC'), ('widely', 'RB'), ('used', 'VBN'), ('in', 'IN'), ('AI', 'NNP'), ('applications', 'NNS'), ('.', '.')]


In [3]:
# Q7
import spacy

# Load English model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = "Apple is planning to open a new office in Mumbai by 2026."

# Process text
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(f"{ent.text} → {ent.label_}")


Apple → ORG
Mumbai → GPE
2026 → DATE


In [4]:
#Q8
from textblob import TextBlob

# Sample text
text = "I love learning about NLP, but sometimes debugging models can be frustrating."

# Create TextBlob object
blob = TextBlob(text)

# Sentiment analysis
print("Polarity:", blob.sentiment.polarity)   # Range: -1 (negative) to +1 (positive)
print("Subjectivity:", blob.sentiment.subjectivity)  # Range: 0 (objective) to 1 (subjective)


Polarity: 0.04999999999999999
Subjectivity: 0.75


In [5]:
# Q9

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample dataset
texts = ["I love NLP", "NLP is challenging", "I dislike debugging"]
labels = [1, 1, 0]  # 1 = positive, 0 = negative

# Tokenization
tokenizer = Tokenizer(num_words=1000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
X = pad_sequences(sequences, maxlen=5)

# Convert labels to numpy array
y = np.array(labels)

# Build RNN model
model = Sequential([
    Embedding(input_dim=1000, output_dim=32, input_length=5),
    SimpleRNN(32),
    Dense(1, activation='sigmoid')
])

# Compile model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train model
model.fit(X, y, epochs=10, verbose=1)

# Evaluate
print("Model training complete. Ready for predictions!")



Epoch 1/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4s/step - accuracy: 0.3333 - loss: 0.7154
Epoch 2/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - accuracy: 0.6667 - loss: 0.6971
Epoch 3/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step - accuracy: 0.6667 - loss: 0.6798
Epoch 4/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 38ms/step - accuracy: 0.6667 - loss: 0.6634
Epoch 5/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - accuracy: 0.6667 - loss: 0.6477
Epoch 6/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 39ms/step - accuracy: 0.6667 - loss: 0.6327
Epoch 7/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 37ms/step - accuracy: 0.6667 - loss: 0.6183
Epoch 8/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step - accuracy: 0.6667 - loss: 0.6044
Epoch 9/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m