##### NLTK (Natural Language Toolkit)

    - How do you tokenize a sentence into words using NLTK?
    
pip install nltk

Import the function and tokenize a sentence:

from nltk.tokenize import word_tokenize

sentence = "Hello! How are you doing today?"
words = word_tokenize(sentence)

print(words)


    - How do you remove stopwords from a text using NLTK?
    pip install nltk

Download the stopwords list (only once):
import nltk

nltk.download('stopwords')

Remove stopwords from text:


from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

sentence = "This is an example sentence to demonstrate stopword removal."
words = word_tokenize(sentence)  # Tokenize the sentence

stop_words = set(stopwords.words('english'))  # Get the list of stopwords

filtered_words = [word for word in words if word.lower() not in stop_words]  

print(filtered_words)

    - How do you perform stemming in NLTK? Provide an example.

    Stemming is the process of reducing words to their root form. In NLTK, we use the PorterStemmer to do this.

    pip install nltk

    from nltk.stem import PorterStemmer

    from nltk.tokenize import word_tokenize

    ps = PorterStemmer()

words = ["running", "flies", "easily", "happiness", "studies"]


stemmed_words = [ps.stem(word) for word in words]

print(stemmed_words)


    - What is the difference between stemming and lemmatization in NLTK?
    Stemming and lemmatization are both techniques to reduce words to their
     base form, but they differ in their approach and results:
     1. Stemming:
Approach: Stemming simply removes prefixes or suffixes from words to reduce them to their root form, often based on rules or patterns.
Result: The result might not always be a valid word in the language, just a "stem."
Example:
"running" → "run"
"flies" → "fli"
"better" → "better"
Tools in NLTK: PorterStemmer, SnowballStemmer
2. Lemmatization:
Approach: Lemmatization uses a dictionary and context to convert words to their base form (called a "lemma"), ensuring the result is a valid word in the language.
Result: The output is a valid dictionary word, and the process is context-sensitive.
Example:
"running" → "run" (because it’s a verb)
"flies" → "fly" (plural noun → singular noun)
"better" → "good" (comparative → base form)
Tools in NLTK: WordNetLemmatizer


    - How do you perform part-of-speech (POS) tagging using NLTK?
  Part-of-Speech (POS) tagging involves identifying the grammatical category
  (noun, verb, adjective, etc.) of each word in a sentence. In NLTK, you can easily perform POS tagging using the built-in pos_tag function.

Steps to Perform POS Tagging:

Install NLTK (if not installed):

pip install nltk

Import necessary components and perform POS tagging:

import nltk

from nltk.tokenize import word_tokenize

from nltk import pos_tag


# Make sure to download necessary NLTK resources (only once)

nltk.download('punkt')  # Tokenizer

nltk.download('averaged_perceptron_tagger')  # POS tagger


sentence = "NLTK is a powerful tool for text processing."

words = word_tokenize(sentence)  # Tokenize the sentence

pos_tags = pos_tag(words)  # Perform POS tagging


print(pos_tags)
    
    

##### spaCy
    
    -How do you load an English language model in spaCy?

    install spaCy (if not installed):


pip install spacy

Download the English language model


python -m spacy download en_core_web_sm

en_core_web_sm: A small model (fast, but less accurate).

Other models: en_core_web_md (medium size), en_core_web_lg (large size).
Load the model in your code:

import spacy

# Load the English language model

nlp = spacy.load("en_core_web_sm")


# Process a text

doc = nlp("spaCy is an amazing library for NLP!")



# Iterate through the processed tokens

for token in doc:

    print(token.text, token.pos_)

    print(token.text, token.pos_)

    -How do you extract named entities (NER) from a text using spaCy?
    To extract Named Entities (NER) from a text using spaCy, you can use the ents attribute of a processed document. Here's a simple way to do it:

Steps to Extract Named Entities:

Install spaCy (if not installed):


pip install spacy

Download the English language model (if not downloaded yet):

python -m spacy download en_core_web_sm

Load the model and extract NER:


import spacy

# Load the English language model

nlp = spacy.load("en_core_web_sm")


# Process a text

doc = nlp("Barack Obama was born in Hawaii and became the President of the United States.")

# Extract named entities

for ent in doc.ents:

    print(ent.text, ent.label_)


    -How do you find the dependency relations between words using spaCy?
    import spacy
from spacy import displacy


nlp = spacy.load("en_core_web_sm")

doc = nlp("spaCy is an amazing library for natural language processing.")

# Visualize the dependency relations

displacy.render(doc, style="dep")

    -How do you perform sentence segmentation in spaCy?
    import spacy

# Load the English language model

nlp = spacy.load("en_core_web_sm")


# Process the text

doc = nlp("Hello! How are you doing today? I hope you're having a great time.")

# Segment into sentences

for sent in doc.sents:

 print(sent.text)
    -How do you check if a word is a stopword in spaCy?
    import spacy


# Load the English language model

nlp = spacy.load("en_core_web_sm")


# Process a text

doc = nlp("This is a sample sentence for checking stopwords.")


# Check if each word is a stopword

for token in doc:

    print(token.text, "is stopword:", token.is_stop)


##### TextBlob

    -How do you compute the sentiment polarity of a text using TextBlob?

    pip install textblob

    from textblob import TextBlob


# Sample text

text = "I love programming, but it can sometimes be frustrating."


# Create a TextBlob object

blob = TextBlob(text)

# Get the sentiment polarity and subjectivity

polarity, subjectivity = blob.sentiment

print(f"Polarity: {polarity}")

print(f"Subjectivity: {subjectivity}")


    -How do you perform spelling correction using TextBlob?

    pip install textblob

    from textblob import TextBlob

# Sample text with spelling errors

text = "I hav a beautifull gardn and I lovee it."


# Create a TextBlob object

blob = TextBlob(text)

# Correct the spelling

corrected_text = blob.correct()

print("Original Text:", text)

print("Corrected Text:", corrected_text)

    -How do you get noun phrases from a sentence using TextBlob?
    
    pip install textblob

    from textblob import TextBlob

# Sample sentence

sentence = "The quick brown fox jumped over the lazy dog."


# Create a TextBlob object

blob = TextBlob(sentence)



# Extract noun phrases

noun_phrases = blob.noun_phrases

print("Noun Phrases:", noun_phrases)



    -How do you translate a text from English to another language using TextBlob?

    pip install textblob

    from textblob import TextBlob

# Sample sentence

sentence = "The quick brown fox jumped over the lazy dog."

# Create a TextBlob object

blob = TextBlob(sentence)

# Extract noun phrases

noun_phrases = blob.noun_phrases

print("Noun Phrases:", noun_phrases)

    -How do you detect the language of a given text using TextBlob?


In [1]:
pip install textblob



In [11]:
!pip install textblob googletrans==3.1.0a0

from textblob import TextBlob
from googletrans import Translator

# Sample text
text = "Rani Chennamma was the queen of Kittur, a princely state in present-day Karnataka, India. She is remembered as one of the earliest Indian rulers to fight against British colonial rule. Born in 1778, she was well-trained in horse riding, sword fighting, and archery. After her husband's death, she ascended the throne and fiercely opposed the Doctrine of Lapse policy imposed by the British, which prevented her adopted son from inheriting the kingdom. In 1824, Rani Chennamma led an armed rebellion against the British East India Company, refusing to surrender her land. Despite initial victories, she was ultimately captured and imprisoned in Bailhongal Fort, where she died in 1829. Her resistance is seen as a precursor to India's independence movement, and she remains a symbol of courage and patriotism."

# Create a TextBlob object
blob = TextBlob(text)

# Detect the language using googletrans as a workaround
translator = Translator()  # Create a Translator object
detected = translator.detect(text)  # Detect the language

print("Detected Language:", detected.lang)  # Print the detected language code

Collecting googletrans==3.1.0a0
  Downloading googletrans-3.1.0a0.tar.gz (19 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting httpx==0.13.3 (from googletrans==3.1.0a0)
  Downloading httpx-0.13.3-py3-none-any.whl.metadata (25 kB)
Collecting hstspreload (from httpx==0.13.3->googletrans==3.1.0a0)
  Downloading hstspreload-2025.1.1-py3-none-any.whl.metadata (2.1 kB)
Collecting chardet==3.* (from httpx==0.13.3->googletrans==3.1.0a0)
  Downloading chardet-3.0.4-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting idna==2.* (from httpx==0.13.3->googletrans==3.1.0a0)
  Downloading idna-2.10-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting rfc3986<2,>=1.3 (from httpx==0.13.3->googletrans==3.1.0a0)
  Downloading rfc3986-1.5.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting httpcore==0.9.* (from httpx==0.13.3->googletrans==3.1.0a0)
  Downloading httpcore-0.9.1-py3-none-any.whl.metadata (4.6 kB)
Collecting h11<0.10,>=0.8 (from httpcore==0.9.*->httpx==0.13.3->googletrans==3.1.0a0

#### Question:

Flipkart has collected customer reviews for a smartphone. Your task is to train a sentiment analysis model using TextBlob's Naive Bayes Classifier to predict whether a given review is positive or negative.

You have a dataset of 50 reviews, out of which 40 reviews are for training and 10 reviews are for testing. Train your model and evaluate its performance on the test data.



##### Dataset (Train & Test)

##### Training Data (40 Reviews)

###### Testing Data (10 Reviews)

###### Your Task:    
    - Train a Naive Bayes Classifier using textblob.classifiers.NaiveBayesClassifier on the training dataset.
    - Predict sentiment (positive or negative) for the test data.
    - Evaluate model accuracy and analyze the results.

In [3]:
import nltk

In [10]:
from textblob.classifiers import NaiveBayesClassifier

In [13]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


In [7]:
#downloading nltk data
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [8]:
# Training Data (40 Reviews)
train = [
    ("This phone is amazing, I love the camera!", "pos"),
    ("Battery life is superb, lasts more than a day.", "pos"),
    ("The display quality is top-notch.", "pos"),
    ("Fast charging works like a charm!", "pos"),
    ("Very smooth performance and no lag.", "pos"),
    ("Great phone at this price range.", "pos"),
    ("I am highly satisfied with the features.", "pos"),
    ("Audio quality is crystal clear.", "pos"),
    ("Superb design and looks premium.", "pos"),
    ("Face unlock is super fast.", "pos"),
    ("I regret buying this phone, worst experience.", "neg"),
    ("Battery drains too fast, not recommended.", "neg"),
    ("Overheating issue while gaming, disappointing.", "neg"),
    ("Camera quality is terrible in low light.", "neg"),
    ("Too many ads in the UI, very annoying.", "neg"),
    ("Fingerprint sensor is slow and unresponsive.", "neg"),
    ("Touchscreen sometimes freezes.", "neg"),
    ("Speaker volume is too low.", "neg"),
    ("Performance is sluggish, lags a lot.", "neg"),
    ("Build quality feels very cheap.", "neg"),
    ("The phone is very lightweight and easy to hold.", "pos"),
    ("I love the color and design of this phone.", "pos"),
    ("The UI is clean and easy to use.", "pos"),
    ("Gaming performance is fantastic.", "pos"),
    ("Storage capacity is more than enough.", "pos"),
    ("This phone is a complete package!", "pos"),
    ("Good for video calling, camera quality is decent.", "pos"),
    ("5G connectivity works smoothly.", "pos"),
    ("This phone exceeded my expectations.", "pos"),
    ("The processor is fast and efficient.", "pos"),
    ("Network issues, keeps disconnecting.", "neg"),
    ("Screen started flickering after a few days.", "neg"),
    ("The phone heats up even with normal usage.", "neg"),
    ("No software updates, outdated security patches.", "neg"),
    ("Customer support is unhelpful.", "neg"),
    ("Charger stopped working within a month.", "neg"),
    ("Speakers produce distorted sound.", "neg"),
    ("Camera app crashes frequently.", "neg"),
    ("Poor optimization, apps take forever to load.", "neg"),
    ("Too heavy to hold for long hours.", "neg")
]

In [9]:
# Testing Data (10 Reviews)
test = [
    ("I love the camera, it clicks amazing pictures!", "pos"),
    ("Battery backup is disappointing, drains quickly.", "neg"),
    ("The display is very bright and vibrant.", "pos"),
    ("Phone lags when switching between apps.", "neg"),
    ("Very stylish and premium-looking design.", "pos"),
    ("Face unlock doesn't work properly.", "neg"),
    ("The performance is super smooth!", "pos"),
    ("Charging speed is very slow, takes ages.", "neg"),
    ("Call quality is excellent, voice is very clear.", "pos"),
    ("The phone keeps hanging, very frustrating.", "neg")
]

In [14]:
# Train Naive Bayes Classifier
classifier = NaiveBayesClassifier(train)

In [15]:
# Evaluate on test data
accuracy = classifier.accuracy(test)

In [16]:
# Predict sentiment for each test review
predictions = [(review[0], classifier.classify(review[0])) for review in test]


In [17]:
# Print results
print(f"Model Accuracy: {accuracy * 100:.2f}%")
print("\nPredictions:")
for review, sentiment in predictions:
    print(f"Review: {review}\nPredicted Sentiment: {sentiment}\n")

Model Accuracy: 90.00%

Predictions:
Review: I love the camera, it clicks amazing pictures!
Predicted Sentiment: pos

Review: Battery backup is disappointing, drains quickly.
Predicted Sentiment: neg

Review: The display is very bright and vibrant.
Predicted Sentiment: pos

Review: Phone lags when switching between apps.
Predicted Sentiment: neg

Review: Very stylish and premium-looking design.
Predicted Sentiment: pos

Review: Face unlock doesn't work properly.
Predicted Sentiment: pos

Review: The performance is super smooth!
Predicted Sentiment: pos

Review: Charging speed is very slow, takes ages.
Predicted Sentiment: neg

Review: Call quality is excellent, voice is very clear.
Predicted Sentiment: pos

Review: The phone keeps hanging, very frustrating.
Predicted Sentiment: neg

