#NLP - Introduction & Text Processing
                             SUBMITTED BY: MD FAHAM NAUSHAD

#***************************************************
##Question

#***************************************************

##Question 1: What is Computational Linguistics and how does it relate to NLP?

- Answer:

  Computational Linguistics is the scientific study of language using computational models. It focuses on how machines can understand and generate human language similar to humans. Natural Language Processing (NLP) applies computational linguistics to real-world tasks like translation, chatbots, sentiment analysis, etc. So NLP is the applied form of computational linguistics used to build human–machine language interaction systems.

##Question 2: Briefly describe the historical evolution of Natural Language Processing.

- Answer:

  NLP started in the 1950s with simple rule-based translation systems. In the 1980s–1990s, statistical models became popular due to the availability of digital text. After 2010, deep learning and neural networks revolutionized NLP and improved language understanding drastically. Today, NLP uses transformer-based models like BERT and GPT, which provide near human-level performance on many tasks.

##Question 3: List and explain three major use cases of NLP in today’s tech industry.

- Answer:

  
- 1️⃣ Chatbots & Virtual Assistants:

    - NLP enables chatbots and voice assistants like Siri, Alexa, and Google Assistant to understand natural human language. It analyses the user’s message, identifies intent (what the user wants), and generates a suitable response. Modern systems also learn user preferences over time to give more personalized replies. Many companies now use NLP chatbots for 24×7 customer support, reducing response time and operating cost.

- 2️⃣ Sentiment Analysis:

    - Sentiment analysis helps businesses understand people’s emotions from written text such as product reviews, social media comments, and surveys. NLP models classify the tone of a message as positive, negative, or neutral, allowing companies to measure satisfaction and detect problems early. It is heavily used in brand monitoring, market research, and feedback analysis to support decision-making and marketing strategies.

- 3️⃣ Machine Translation:

    - Machine translation automatically converts text from one language to another (e.g., English → Hindi). Traditional translation systems relied on manually created dictionaries and grammar rules, but now NLP-based deep learning systems (like Google Translate) understand context rather than just word mappings. Modern translation models handle slang, culture-specific expressions, and sentence structure, making global communication easier for business, education, and tourism.


      | Use Case            | What NLP Does             | Real-World Example                 |
      | ------------------- | ------------------------- | ---------------------------------- |
      | Chatbots            | Understand intent & reply | Amazon Alexa, Bank support bots    |
      | Sentiment Analysis  | Detect human emotions     | Review analysis on Flipkart/Amazon |
      | Machine Translation | Convert languages         | Google Translate, subtitles        |


##Question 4: What is text normalization and why is it essential in text processing tasks?

- Answer:

  Text normalization converts raw text into a clean, standard form that a model can understand. It includes removing punctuation, converting to lowercase, expanding contractions, and fixing spelling. It reduces variations in text so the algorithm focuses on meaning instead of formatting differences. Without normalization, models treat similar words differently and performance decreases.

##Question 5: Compare and contrast stemming and lemmatization with suitable examples.

- Answer:

  Stemming cuts off the ends of words to reduce them to their base form (e.g., playing → play, studies → studi). It is fast but sometimes produces non-dictionary words. Lemmatization converts words to their root dictionary form using grammar rules (e.g., studies → study, better → good). It is slower but more meaningful and accurate. Both aim to reduce word variation.

##Question 6: Write a Python program that uses regular expressions (regex) to extract all email addresses from the following block of text:
- “Hello team, please contact us at support@xyz.com for technical issues, or reach out to our HR at hr@xyz.com. You can also connect with John at john.doe@xyz.org and jenny via jenny_clarke126@mail.co.us. For partnership inquiries, email partners@xyz.biz.”


###Answer:

✅Python Code:

In [1]:
import re

text = """Hello team, please contact us at support@xyz.com for technical issues, or reach out to our HR at hr@xyz.com.
You can also connect with John at john.doe@xyz.org and jenny via jenny_clarke126@mail.co.us.
For partnership inquiries, email partners@xyz.biz."""

emails = re.findall(r'\b[\w\.-]+@[\w\.-]+\.\w+\b', text)
print(emails)


['support@xyz.com', 'hr@xyz.com', 'john.doe@xyz.org', 'jenny_clarke126@mail.co.us', 'partners@xyz.biz']


In [6]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

##Question 7: Given the sample paragraph below, perform string tokenization and frequency distribution using Python and NLTK:

  - “Natural Language Processing (NLP) is a fascinating field that combines linguistics, computer science, and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions is becoming increasingly critical.”


###Answer:

✅Python Code:

In [7]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.probability import FreqDist

paragraph = """Natural Language Processing (NLP) is a fascinating field that combines linguistics, computer science,
and artificial intelligence. It enables machines to understand, interpret, and generate human language. Applications of NLP
include chatbots, sentiment analysis, and machine translation. As technology advances, the role of NLP in modern solutions
is becoming increasingly critical."""

tokens = word_tokenize(paragraph.lower())
fd = FreqDist(tokens)

print("Top 10 most common words:")
print(fd.most_common(10))


Top 10 most common words:
[(',', 7), ('.', 4), ('nlp', 3), ('and', 3), ('language', 2), ('is', 2), ('of', 2), ('natural', 1), ('processing', 1), ('(', 1)]


##Question 8: Create a custom annotator using spaCy or NLTK that identifies and labels proper nouns in a given text.


###Answer:

✅Python Code:

In [8]:
import spacy

nlp = spacy.load("en_core_web_sm")
text = "Steve Jobs founded Apple in California and later introduced the iPhone."

doc = nlp(text)
for token in doc:
    if token.pos_ == "PROPN":
        print(token.text, "→ Proper Noun")


Steve → Proper Noun
Jobs → Proper Noun
Apple → Proper Noun
California → Proper Noun
iPhone → Proper Noun


##Question 9: Using Genism, demonstrate how to train a simple Word2Vec model on the following datase:

    dataset = [
    "Natural language processing enables computers to understand human language",
    "Word embeddings are a type of word representation that allows words with similar
    meaning to have similar representation",
    "Word2Vec is a popular word embedding technique used in many NLP applications",
    "Text preprocessing is a critical step before training word embeddings",
    "Tokenization and normalization help clean raw text for modeling"
    ]

Write code that tokenizes the dataset, preprocesses it, and trains a Word2Vec model using Gensim.

###Answer:
✅Python Code:

In [10]:
!pip install gensim


Collecting gensim
  Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (8.4 kB)
Downloading gensim-4.4.0-cp312-cp312-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (27.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m62.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: gensim
Successfully installed gensim-4.4.0


In [11]:
from gensim.models import Word2Vec
import nltk

dataset = [
 "Natural language processing enables computers to understand human language",
 "Word embeddings are a type of word representation that allows words with similar meaning to have similar representation",
 "Word2Vec is a popular word embedding technique used in many NLP applications",
 "Text preprocessing is a critical step before training word embeddings",
 "Tokenization and normalization help clean raw text for modeling"
]

sentences = [nltk.word_tokenize(sent.lower()) for sent in dataset]
model = Word2Vec(sentences, vector_size=50, window=3, min_count=1, workers=4)

word = "language"
print("Closest words to 'language':")
print(model.wv.most_similar(word, topn=5))


Closest words to 'language':
[('used', 0.3151129484176636), ('for', 0.2373713254928589), ('meaning', 0.2184678167104721), ('of', 0.20302735269069672), ('to', 0.1844237893819809)]


##Question 10: Imagine you are a data scientist at a fintech startup. You’ve been tasked with analyzing customer feedback. Outline the steps you would take to clean, process, and extract useful insights using NLP techniques from thousands of customer reviews.

  - (Include your Python code and output in the code box below.)
###Answer:
- Explanation:
  
  To analyze thousands of customer reviews, I would first clean the text using normalization, tokenization, stopword removal, and lemmatization. Then I would convert text into numeric form using TF-IDF or Word2Vec embeddings. Next, I would apply models like sentiment analysis or topic modeling to extract insights such as common complaints and satisfaction scores. Finally, I would visualize results in dashboards to support business decision-making.

✅Python Code:

In [12]:
!pip install textblob

from textblob import TextBlob

reviews = [
    "The loan approval process was smooth and fast!",
    "Customer support never responds. Very disappointing.",
    "I love the app. It is easy to use and secure.",
    "The transaction failed twice and I lost money."
]

for r in reviews:
    score = TextBlob(r).sentiment.polarity
    sentiment = "Positive" if score > 0 else "Negative" if score < 0 else "Neutral"
    print(f"{r} → {sentiment} (score = {score})")


The loan approval process was smooth and fast! → Positive (score = 0.325)
Customer support never responds. Very disappointing. → Negative (score = -0.78)
I love the app. It is easy to use and secure. → Positive (score = 0.4444444444444445)
The transaction failed twice and I lost money. → Negative (score = -0.5)


###************** END  **************