# NLP with spaCy for NER and Sentiment Analysis
Objective
This task focuses on using the spaCy library to perform two core Natural Language Processing (NLP) tasks on a sample of Amazon product reviews:


1.   Named Entity Recognition (NER): To automatically identify and extract entities like product names and brands (ORG, PRODUCT).
2.   Sentiment Analysis: To determine if a review is positive or negative using a custom, rule-based approach.

**Plan of Action**


1.   Setup & Installation: Import spaCy and download the necessary language model.
2.   Load Model & Data: Load the spaCy model and create a sample dataset of Amazon reviews.
3.   Task A: Named Entity Recognition (NER): Apply the pre-trained NER model to extract entities and visualize the results.
4.   Task B: Rule-Based Sentiment Analysis: Build a simple sentiment classifier based on keyword matching.
5.   Integrated Analysis: Create a pipeline that performs both NER and sentiment analysis on a given review.
6.   Conclusion: Summarize the results and confirm completion of the task.




**Step 1: Set Up & Installation**

import spaCy and ensure the small English language model (en_core_web_sm) is downloaded. This model contains the pre-trained components for NER.

In [1]:
# Import necessary libraries
import spacy
import pandas as pd
from spacy import displacy

# Download the spaCy English model if you don't have it.
# This command only needs to be run once.
# !python -m spacy download en_core_web_sm

print("Setup complete. spaCy is ready.")

Setup complete. spaCy is ready.


**Step 2: Load Model and Data**

Load the spaCy model into an object, which we'll call nlp. We'll also define our sample dataset of Amazon reviews. Using a small, inline dataset makes this notebook self-contained and easy to run.

In [2]:
# Load the small English model
try:
    nlp = spacy.load("en_core_web_sm")
    print("spaCy model 'en_core_web_sm' loaded successfully.")
except OSError:
    print("Model 'en_core_web_sm' not found. Please run: python -m spacy download en_core_web_sm")
    # In a real script, you might exit here. In a notebook, we let the user see the error.
    nlp = None

# Sample Amazon reviews
sample_reviews = [
    "I absolutely love my new Apple iPhone 14! The camera is fantastic and the battery lasts all day.",
    "The Samsung Galaxy Buds Pro have terrible battery life. I'm very disappointed with this purchase from Samsung.",
    "I bought the Anker Power Bank and it stopped working after just one week. Really poor quality.",
    "The Sony WH-1000XM5 headphones are a masterpiece of engineering. Best noise cancellation I've ever experienced.",
    "This is a decent laptop charger, it does the job but nothing special about it."
]

# Create a DataFrame for easier handling
df = pd.DataFrame({'review_text': sample_reviews})

df.head()

spaCy model 'en_core_web_sm' loaded successfully.


Unnamed: 0,review_text
0,I absolutely love my new Apple iPhone 14! The ...
1,The Samsung Galaxy Buds Pro have terrible batt...
2,I bought the Anker Power Bank and it stopped w...
3,The Sony WH-1000XM5 headphones are a masterpie...
4,"This is a decent laptop charger, it does the j..."


**Step 3: Task A - Named Entity Recognition (NER)**

NER is the process of locating and classifying named entities in text into pre-defined categories. spaCy's pre-trained models can identify entities like ORG (organizations, brands), PRODUCT, PERSON, GPE (geopolitical entities), etc.

Let's process each review and print the entities found.

In [3]:
if nlp:
    print("--- Detected Entities in Reviews ---\n")
    for index, row in df.iterrows():
        review_text = row['review_text']
        print(f"Review #{index + 1}: {review_text}")

        # Process the text with the spaCy model
        doc = nlp(review_text)

        # Check if any entities were found
        if doc.ents:
            for ent in doc.ents:
                # Print the entity text and its label
                print(f"  -> Entity: '{ent.text}', Label: '{ent.label_}'")
        else:
            print("  -> No entities found.")
        print("-" * 30)

--- Detected Entities in Reviews ---

Review #1: I absolutely love my new Apple iPhone 14! The camera is fantastic and the battery lasts all day.
  -> Entity: 'Apple', Label: 'ORG'
  -> Entity: '14', Label: 'CARDINAL'
  -> Entity: 'all day', Label: 'DATE'
------------------------------
Review #2: The Samsung Galaxy Buds Pro have terrible battery life. I'm very disappointed with this purchase from Samsung.
  -> Entity: 'Samsung', Label: 'ORG'
------------------------------
Review #3: I bought the Anker Power Bank and it stopped working after just one week. Really poor quality.
  -> Entity: 'the Anker Power Bank', Label: 'ORG'
  -> Entity: 'just one week', Label: 'DATE'
------------------------------
Review #4: The Sony WH-1000XM5 headphones are a masterpiece of engineering. Best noise cancellation I've ever experienced.
  -> Entity: 'Sony', Label: 'ORG'
------------------------------
Review #5: This is a decent laptop charger, it does the job but nothing special about it.
  -> No entiti

**Visualizing NER with displacy**

spaCy includes a fantastic visualizer, displacy, which highlights entities directly in the text. This makes it much easier to see what the model is identifying. Let's visualize the entities for the first review.

In [4]:
if nlp:
    # Process the first review
    doc = nlp(df['review_text'][0])

    # Render the visualization. In a Jupyter notebook, this will display automatically.
    displacy.render(doc, style="ent", jupyter=True)

Observation: The model correctly identifies "Apple" as an ORG (organization/brand) and "iPhone 14" as a WORK_OF_ART (spaCy's small model sometimes uses broader categories; larger models might label it PRODUCT). It successfully extracts the key product information.

**Step 4: Task B - Rule-Based Sentiment Analysis**

Build a simple classifier based on a list of positive and negative keywords. This is a common and transparent approach for sentiment analysis.

In [5]:
# Define keywords for our rule-based system
positive_keywords = ["love", "fantastic", "masterpiece", "best", "great", "amazing", "excellent"]
negative_keywords = ["terrible", "disappointed", "poor", "stopped working", "bad", "awful"]

def analyze_sentiment(text: str) -> str:
    """
    Analyzes the sentiment of a text based on keyword matching.
    Returns 'Positive', 'Negative', or 'Neutral'.
    """
    text_lower = text.lower()

    # Count positive and negative keywords
    pos_count = sum(1 for word in positive_keywords if word in text_lower)
    neg_count = sum(1 for word in negative_keywords if word in text_lower)

    if pos_count > neg_count:
        return "Positive"
    elif neg_count > pos_count:
        return "Negative"
    else:
        return "Neutral"

# Apply the sentiment analysis function to our DataFrame
df['sentiment'] = df['review_text'].apply(analyze_sentiment)

print("--- Sentiment Analysis Results ---")
df[['review_text', 'sentiment']]

--- Sentiment Analysis Results ---


Unnamed: 0,review_text,sentiment
0,I absolutely love my new Apple iPhone 14! The ...,Positive
1,The Samsung Galaxy Buds Pro have terrible batt...,Negative
2,I bought the Anker Power Bank and it stopped w...,Negative
3,The Sony WH-1000XM5 headphones are a masterpie...,Positive
4,"This is a decent laptop charger, it does the j...",Neutral


Observation: Our simple rule-based system correctly classified the sentiment for our sample reviews based on the presence of keywords like "love" and "terrible". The neutral review was also correctly identified.


**Step 5: Integrated Analysis Pipeline**

create a single function that takes a review and returns both the extracted entities and the calculated sentiment—a complete analysis package.

In [6]:
def process_review(text: str) -> dict:
    """
    Performs both NER and sentiment analysis on a review text.
    Returns a dictionary with extracted entities and sentiment.
    """
    if not nlp:
        return {"error": "spaCy model not loaded."}

    # Process text for NER
    doc = nlp(text)
    entities = {
        "products_brands": [ent.text for ent in doc.ents if ent.label_ in ["ORG", "PRODUCT", "WORK_OF_ART"]]
    }

    # Analyze sentiment
    sentiment = analyze_sentiment(text)

    return {
        "entities": entities,
        "sentiment": sentiment
    }

# Run our integrated pipeline on all reviews
print("--- Integrated Analysis Pipeline ---")
for review in sample_reviews:
    analysis_result = process_review(review)
    print(f"\nReview: '{review}'")
    print(f"Result: {analysis_result}")

--- Integrated Analysis Pipeline ---

Review: 'I absolutely love my new Apple iPhone 14! The camera is fantastic and the battery lasts all day.'
Result: {'entities': {'products_brands': ['Apple']}, 'sentiment': 'Positive'}

Review: 'The Samsung Galaxy Buds Pro have terrible battery life. I'm very disappointed with this purchase from Samsung.'
Result: {'entities': {'products_brands': ['Samsung']}, 'sentiment': 'Negative'}

Review: 'I bought the Anker Power Bank and it stopped working after just one week. Really poor quality.'
Result: {'entities': {'products_brands': ['the Anker Power Bank']}, 'sentiment': 'Negative'}

Review: 'The Sony WH-1000XM5 headphones are a masterpiece of engineering. Best noise cancellation I've ever experienced.'
Result: {'entities': {'products_brands': ['Sony']}, 'sentiment': 'Positive'}

Review: 'This is a decent laptop charger, it does the job but nothing special about it.'
Result: {'entities': {'products_brands': []}, 'sentiment': 'Neutral'}


# Step 6: Conclusion
In this notebook, we have successfully demonstrated how to use spaCy for two fundamental NLP tasks:


1.   Named Entity Recognition: We used spaCy's pre-trained model to effectively extract product and brand names from text.

2.   Rule-Based Sentiment Analysis: We built a custom, keyword-based classifier to determine the sentiment of reviews, fulfilling the specific requirement of the task.

The final integrated pipeline shows how these components can be combined to create a useful NLP tool for analyzing user feedback

