Question 1: Compare and contrast NLTK and spaCy in terms of features, ease of use, and performance.

Answer:NLTK is a research-oriented NLP library that provides extensive tools for text processing, linguistics research, and education. It offers fine-grained control but requires more manual coding. spaCy is an industrial-strength NLP library focused on speed, ease of use, and production readiness, offering pretrained pipelines for tokenization, parsing, and named entity recognition. NLTK is better for learning and experimentation, while spaCy is preferred for real-world applications due to higher performance and optimized pipelines.

Question 2: What is TextBlob and how does it simplify common NLP tasks like sentiment analysis and translation?

Answer:TextBlob is a high-level NLP library built on top of NLTK and Pattern that simplifies common NLP tasks such as sentiment analysis, part-of-speech tagging, noun phrase extraction, and translation. It provides an easy-to-use API that allows developers to perform complex NLP operations with minimal code.

Question 3: Explain the role of Stanford NLP in academic and industry NLP projects.

Answer:Stanford NLP provides state-of-the-art tools such as CoreNLP that support tasks like parsing, named entity recognition, sentiment analysis, and dependency parsing. It is widely used in academic research for its linguistic accuracy and in industry for building robust NLP systems, especially when deep linguistic analysis is required.

Question 4: Describe the architecture and functioning of a Recurrent Neural Network (RNN).

Answer:A Recurrent Neural Network is a neural network designed to process sequential data by maintaining a hidden state that captures information from previous time steps. At each step, the network takes the current input and the previous hidden state to produce a new hidden state. This structure allows RNNs to model temporal dependencies in text, speech, and time-series data.

Question 5: What is the key difference between LSTM and GRU networks in NLP applications?

Answer:LSTM networks use three gates (input, forget, and output) and a separate cell state to capture long-term dependencies, while GRU networks use two gates (update and reset) and combine cell and hidden states. GRUs are simpler and faster to train, whereas LSTMs are more expressive and effective for very long sequences.

In [1]:
#Question 6: Sentiment analysis using TextBlob

# Install TextBlob
!pip install -q textblob

from textblob import TextBlob

text = """I had a great experience using the new mobile banking app.
The interface is intuitive, and customer support was quick to resolve my issue.
However, the app did crash once during a transaction, which was frustrating."""

blob = TextBlob(text)
print("Polarity:", blob.sentiment.polarity)
print("Subjectivity:", blob.sentiment.subjectivity)

Polarity: 0.21742424242424244
Subjectivity: 0.6511363636363636


In [2]:
#Question 7: Tokenization and frequency distribution using NLTK

import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

# Required downloads for Python 3.12+
nltk.download('punkt')
nltk.download('punkt_tab')

text = """Natural Language Processing (NLP) is a fascinating field that combines
linguistics, computer science, and artificial intelligence. It enables machines
to understand, interpret, and generate human language. Applications of NLP
include chatbots, sentiment analysis, and machine translation."""

tokens = word_tokenize(text.lower())
freq_dist = FreqDist(tokens)

print(freq_dist.most_common(10))

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...


[(',', 6), ('and', 3), ('.', 3), ('language', 2), ('nlp', 2), ('natural', 1), ('processing', 1), ('(', 1), (')', 1), ('is', 1)]


[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


In [5]:
#Question 8: Basic LSTM text classification using Keras

import tensorflow as tf
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

# Dataset
texts = [
    "I love this project",
    "This is an amazing experience",
    "I hate waiting in line",
    "This is the worst service",
    "Absolutely fantastic"
]

labels = np.array([1, 1, 0, 0, 1], dtype="float32")

# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(sequences, padding="post")

# Convert input to tensor (Keras 3 requirement)
X = tf.convert_to_tensor(padded, dtype=tf.int32)

# Model
model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=8),
    LSTM(16),
    Dense(1, activation="sigmoid")
])

model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

# Train
model.fit(X, labels, epochs=10, verbose=1)


Epoch 1/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.6000 - loss: 0.6927
Epoch 2/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.6000 - loss: 0.6913
Epoch 3/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - accuracy: 0.6000 - loss: 0.6900
Epoch 4/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.6000 - loss: 0.6886
Epoch 5/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step - accuracy: 0.6000 - loss: 0.6872
Epoch 6/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 61ms/step - accuracy: 0.6000 - loss: 0.6857
Epoch 7/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62ms/step - accuracy: 0.6000 - loss: 0.6843
Epoch 8/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - accuracy: 0.6000 - loss: 0.6829
Epoch 9/10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

<keras.src.callbacks.history.History at 0x7aa77aa3ccb0>

In [4]:
#Question 9: spaCy pipeline for tokenization, lemmatization, and NER

# Install spaCy and model
!pip install -q spacy
!python -m spacy download en_core_web_sm

import spacy

nlp = spacy.load("en_core_web_sm")

text = """Homi Jehangir Bhaba was an Indian nuclear physicist who played a key role
in the development of India’s atomic energy program. He was the founding director
of the Tata Institute of Fundamental Research and helped establish the Atomic
Energy Commission of India."""

doc = nlp(text)

print("Tokens and Lemmas:")
for token in doc:
    print(token.text, "->", token.lemma_)

print("\nNamed Entities:")
for ent in doc.ents:
    print(ent.text, "-", ent.label_)

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m82.6 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
Tokens and Lemmas:
Homi -> Homi
Jehangir -> Jehangir
Bhaba -> Bhaba
was -> be
an -> an
Indian -> indian
nuclear -> nuclear
physicist -> physicist
who -> who
played -> play
a -> a
key -> key
role -> role

 -> 

in -> in
the -> the
development -> development
of -> of
India -> India
’s -> ’s
atomic -> atomic
energy -> energy
program -> 

Question 10: NLP-based chatbot for a mental health platform

Answer:I would preprocess user input using spaCy for tokenization, lemmatization, and entity recognition to extract intent and emotional cues. An LSTM or GRU model would be trained on labeled conversational data to capture contextual dependencies in user messages. The model architecture would include embeddings, recurrent layers, and a classification or response generation layer. Ethical considerations would include data privacy, bias mitigation, anonymization of user data, and ensuring the chatbot does not provide harmful or misleading advice, instead escalating critical cases to human professionals.