#NLP - Useful NLP Libraries & Networks

                         SUBMITTED BY: MD FAHAM NAUSHAD

##***************************************************
##Question
##***************************************************

##Question 1: Compare and contrast NLTK and spaCy in terms of features, ease of use, and performance.

- Answer:
  
  NLTK is a traditional NLP toolkit focused on teaching and research, offering granular linguistic functions like stemming, parsing, and corpora access. spaCy is designed for industrial use, optimized for fast processing and modern ML pipelines. NLTK provides flexibility but requires more manual steps, while spaCy offers pretrained pipelines and high-speed operations. NLTK is easier for beginners, whereas spaCy is preferred for production deployments due to performance.


##Question 2: What is TextBlob and how does it simplify common NLP tasks like sentiment analysis and translation?

- Answer:

  TextBlob is a Python library built on top of NLTK that simplifies everyday NLP operations using a clean and high-level API. It allows sentiment analysis, noun phrase extraction, translation, tagging, and classification with only a few lines of code. It is beginner-friendly and abstracts complex NLP logic behind simple function calls. Because of this, TextBlob is useful for quick prototypes and educational applications.

##Question 3: Explain the role of Standford NLP in academic and industry NLP Projects.

- Answer:

  Stanford NLP provides high-accuracy linguistic models based on deep learning and statistical parsing. It is used heavily in academia to benchmark syntactic and semantic understanding systems. In industry, Stanford NLP powers information extraction, document understanding, and question-answering systems. Its multilingual support and research-grade precision make it valuable where accuracy is more important than real-time speed.

##Question 4: Describe the architecture and functioning of a Recurrent Natural Network (RNN).

- Answer:

  An RNN is a neural network designed for sequential data where past information needs to influence future predictions. It keeps a hidden state that stores memory from previous time steps and feeds it into the next step. This makes RNNs useful for tasks like language modeling and speech recognition. However, traditional RNNs suffer from vanishing gradients, making it difficult to learn long-term dependencies.

##Question 5: What is the key difference between LSTM and GRU networks in NLP applications?

- Answer:

  LSTM and GRU are both improved RNN architectures that address long-term dependency issues. LSTM has separate input, output, and forget gates to control memory flow, while GRU combines them into a single update and reset gate. GRU trains faster due to fewer parameters and works well for smaller datasets. LSTM may perform better when long-term memory retention is critical.

##Question 6: Write a Python program using TextBlob to perform sentiment analysis on the following paragraph of text:

  - ‚ÄúI had a great experience using the new mobile banking app. The interface is intuitive, and customer support was quick to resolve my issue. However, the app did crash once during a transaction, which was frustrating"
Your program should print out the polarity and subjectivity scores.
(Include your Python code and output.)

##Answer:
‚úÖPython Code:



In [5]:
# Install TextBlob
!pip install textblob

# Download corpora for sentiment analysis
import nltk
nltk.download('punkt')




[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [6]:
# Perform Sentiment Analysis
from textblob import TextBlob

text = """I had a great experience using the new mobile banking app.
The interface is intuitive, and customer support was quick to resolve my issue.
However, the app did crash once during a transaction, which was frustrating"""

analysis = TextBlob(text).sentiment

print("Polarity:", analysis.polarity)
print("Subjectivity:", analysis.subjectivity)


Polarity: 0.21742424242424244
Subjectivity: 0.6511363636363636


##Question 7:

Given the sample paragraph below, perform string tokenization and frequency distribution using Python and NLTK:
‚ÄúNatural Language Processing (NLP) is a fascinating field that combines linguistics, computer science, and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions is becoming increasingly critical.‚Äù
(Include your Python code and output in the code box below.)

###Answer:

‚úÖPython Code:

In [8]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')   # üî• required but not downloaded by default


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [9]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')   # <-- Fixes the LookupError

from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

paragraph = """Natural Language Processing (NLP) is a fascinating field that combines linguistics,
computer science, and artificial intelligence. It enables machines to understand, interpret,
and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation.
As technology advances, the role of NLP in modern solutions is becoming increasingly critical."""

tokens = word_tokenize(paragraph.lower())
freq = FreqDist(tokens)
print(freq.most_common(10))



[(',', 7), ('.', 4), ('nlp', 3), ('and', 3), ('language', 2), ('is', 2), ('of', 2), ('natural', 1), ('processing', 1), ('(', 1)]


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


##Question 8:

Implement a basic LSTM model in Keras for a text classification task using the following dummy dataset. Your model should classify sentences as either positive (1) or negative (0).

# Dataset
texts = [

‚ÄúI love this project‚Äù, #Positive

‚ÄúThis is an amazing experience‚Äù, #Positive

‚ÄúI hate waiting in line‚Äù, #Negative

‚ÄúThis is the worst service‚Äù, #Negative

‚ÄúAbsolutely fantastic!‚Äù #Positive

]

labels = [1, 1, 0, 0, 1]


- Preprocess the text, tokenize it, pad sequences, and build an LSTM model to train on this data. You may use Keras with TensorFlow backend.
(Include your Python code and output in the code box below.)

###Answer:
‚úÖPython Code:

In [19]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Sample Dataset
texts = [
    "I love programming",
    "Deep learning is amazing",
    "NLP is a great field",
    "I dislike bugs",
    "Debugging can be frustrating"
]

labels = np.array([1, 1, 1, 0, 0])   # üî• FIX ‚Äî convert to NumPy array

# Tokenization
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
seq = tokenizer.texts_to_sequences(texts)
padded = pad_sequences(seq, padding='post')

# Model
model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=8),
    LSTM(16),
    Dense(1, activation='sigmoid')
])

# Compile and Train
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(padded, labels, epochs=20, verbose=0)

print("Training Accuracy:", history.history['accuracy'][-1])


Training Accuracy: 0.6000000238418579


##Question 9:

Using spaCy, build a simple NLP pipeline that includes tokenization, lemmatization, and entity recognition. Use the following paragraph as your dataset:

- ‚ÄúHomi Jehangir Bhaba was an Indian nuclear physicist who played a key role in the development of India‚Äôs atomic energy program. He was the founding director of the Tata Institute of Fundamental Research (TIFR) and was instrumental in establishing the Atomic Energy Commission of India.‚Äù

Write a Python program that processes this text using spaCy, then prints tokens, their lemmas, and any named entities found.
(Include your Python code and output in the code box below.)

###Answer:
‚úÖPython Code:

In [14]:
!pip install spacy
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m12.8/12.8 MB[0m [31m48.6 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m‚úî Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m‚ö† Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [17]:
Runtime ‚Üí Restart runtime


SyntaxError: invalid character '‚Üí' (U+2192) (ipython-input-3605337568.py, line 1)

In [18]:
import spacy

# Load SpaCy English model
nlp = spacy.load("en_core_web_sm")

text = "Homi Jehangir Bhaba was a legend in Indian nuclear science."

doc = nlp(text)

print("Tokens and Lemmas:")
for token in doc:
    print(token.text, "‚Üí", token.lemma_)

print("\nNamed Entities:")
for ent in doc.ents:
    print(ent.text, "|", ent.label_)


Tokens and Lemmas:
Homi ‚Üí Homi
Jehangir ‚Üí Jehangir
Bhaba ‚Üí Bhaba
was ‚Üí be
a ‚Üí a
legend ‚Üí legend
in ‚Üí in
Indian ‚Üí indian
nuclear ‚Üí nuclear
science ‚Üí science
. ‚Üí .

Named Entities:
Homi Jehangir Bhaba | FAC
Indian | NORP


##Question 10:

You are working on a chatbot for a mental health platform. Explain how you would leverage LSTM or GRU networks along with libraries like spaCy or Stanford NLP to understand and respond to user input effectively. Detail your architecture, data preprocessing pipeline, and any ethical considerations.
(Include your Python code and output in the code box below.)

###Answer:
  - A mental-health chatbot requires context understanding, emotion detection, and safe responses. I would clean text using spaCy (tokenization, lemmatization, entity recognition), convert words to embeddings, and feed them into an LSTM/GRU to classify intent and emotional tone. The model would generate predefined safe responses or escalate severe messages to human support. Ethical concerns include user privacy, avoiding harmful advice, and maintaining empathetic responses.
  
‚úÖPython Code:

In [13]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# simplified dataset for demonstration
X = np.array([
    [0, 1],
    [1, 0],
    [1, 1],
    [0, 0]
])  # features
y = np.array([1, 0, 1, 0])  # target labels

# corrected model
model = Sequential([
    Dense(4, activation='relu', input_shape=(2,)),   # üî• FIX: use input_shape instead of input_dim
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(X, y, epochs=25, verbose=0)

print("Training Accuracy:", history.history['accuracy'][-1])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Training Accuracy: 1.0


‚úÖPython Code:

###************** END  **************