# 🧠 What is Sentiment Analysis?

Imagine you’re reading a news article and trying to guess how the writer feels about the topic. Is it cheerful and upbeat, or gloomy and critical? That’s what **sentiment analysis** does—it figures out the mood or opinion in a piece of text, labeling it as *positive*, *negative*, or *neutral*. Here are some examples:

- 😊 "The company’s profits soared!" → **Positive** (sounds happy and successful).
- 😟 "The economy is collapsing." → **Negative** (sounds worrying and bad).
- 😐 "The meeting is at 2 PM." → **Neutral** (just a fact, no strong feelings).

### Why Does Sentiment Analysis Matter?

In the real world, sentiment analysis is a big deal:
- **Companies** use it to see if customers love or hate their products by analyzing reviews.
- **Investors** check news or social media to gauge how people feel about the markets.
- **You** will use it to analyze how news articles "feel" about different subtopics—like whether technology gets more positive coverage than politics.

By the end of this notebook, you’ll have the skills to uncover these emotional insights from data. Pretty cool, right?

---

## ✅ Option 1: Sentiment Analysis Using TextBlob

Let’s start with **TextBlob**, a simple Python tool that’s perfect for beginners. It reads text and tells us how positive or negative it sounds. Think of it as a friendly assistant who can quickly scan an article and give you a thumbs-up or thumbs-down.

### Step 1: Install and Import TextBlob

Before we can use TextBlob, we need to set it up in our Colab environment.

```python
# Install TextBlob if it’s not already available in Colab
!pip install textblob

# Import the TextBlob class so we can use it
from textblob import TextBlob
```

**What’s happening here?**
- `!pip install textblob`: This command installs the TextBlob library. The `!` tells Colab to run it as a system command.
- `from textblob import TextBlob`: This brings the TextBlob tool into our notebook, ready for action.

**Why this matters:** We need to install and import TextBlob because it’s not built into Python by default. This step ensures we have the tools we need to analyze sentiment.

---

### Step 2: Try TextBlob on a Single Sentence

Let’s test TextBlob with a simple sentence to see how it works.

```python
# Create a TextBlob object with a sample sentence
blob = TextBlob("The market is doing terribly today.")

# Get the sentiment of the sentence
sentiment = blob.sentiment

# Print the result to see what TextBlob thinks
print(sentiment)
```

**Sample Output:**
```
Sentiment(polarity=-1.0, subjectivity=1.0)
```

**What does this mean?**
- **Polarity**: A number between -1 and +1 that shows how positive or negative the text is.
  - -1 = very negative (like this example).
  - 0 = neutral.
  - +1 = very positive.
- **Subjectivity**: A number between 0 and 1 that shows how opinionated the text is.
  - 0 = very factual (like a weather report).
  - 1 = very opinionated (like this sentence).

Here, the polarity is -1.0 (very negative), and subjectivity is 1.0 (very opinionated), which fits because the sentence expresses a strong negative feeling.

**Let’s experiment!** Try these sentences by replacing the text in the code above:
- 😊 "I love this new technology!"
- 😐 "The sky is blue."
- 😟 "This is the worst day ever."

What polarity and subjectivity do you expect? Run the code and check!

**Why this matters:** Testing single sentences helps us get comfortable with TextBlob. It’s like peeking under the hood of a car before driving it—we understand how it works before using it on our whole dataset.

---

### Step 3: Apply TextBlob to the News Articles

Now that we’ve practiced, let’s use TextBlob on our dataset of news articles. We’ll assume your dataset is loaded into a DataFrame called `df` with a column named `Description` (if your column names are different, adjust accordingly!).

```python
# Apply TextBlob to each article’s description and store the polarity in a new column
df['Sentiment_Polarity'] = df['Description'].apply(lambda x: TextBlob(x).sentiment.polarity)
```

**What’s happening here?**
- `df['Description']`: This selects the `Description` column, where each row is an article’s text.
- `.apply(lambda x: ...)`: This runs the same function on every row. The `lambda x` is a small, temporary function that takes each description (`x`) and processes it.
- `TextBlob(x).sentiment.polarity`: This calculates the polarity for each description.
- `df['Sentiment_Polarity']`: This creates a new column to save the polarity scores.

**Check your work:** Run `df.head()` to see the first few rows. Do the polarity scores match the tone of the descriptions?

**Why this matters:** Adding a sentiment score to each article lets us analyze the emotions across the dataset. Now we can ask big questions, like which subtopics are covered more positively or negatively. This is the heart of our project!

---

### Step 4: Reflect and Discuss

TextBlob is easy to use, but it’s not perfect. Let’s think about its limits:
- **Can it always get the sentiment right?** What about sarcasm, like "Great, another market crash..."? (Spoiler: It might think "great" is positive!)
- **Does it work well with short texts?** Like headlines or tweets?
- **What about complex emotions?** Can it tell if someone’s joking?

**🧠 Reflection Prompt:**  
When might TextBlob struggle to understand the true tone of a sentence? Why? Think of an example where the words sound positive but the meaning is negative—or the other way around.

---

## 🤖 Option 2: Sentiment Analysis Using Hugging Face Transformers

Ready to level up? Let’s try **Hugging Face Transformers**, a more advanced tool that’s like a super-smart AI. It’s trained on tons of text, so it can pick up on nuances that TextBlob might miss.

### What Are Transformers?

Think of TextBlob as a basic dictionary that knows some words and their feelings. Transformers are like expert English teachers who’ve read millions of books and can understand tone, context, and even hidden meanings. They’re more powerful because they’ve been trained on massive amounts of data.

---

### Step 1: Install and Set Up Hugging Face Transformers

First, we need to install the transformers library and bring in the tools we’ll use.

```python
# Install the transformers library
!pip install transformers

# Import the pipeline function, which makes using transformers easy
from transformers import pipeline
```

**What’s happening here?**
- `!pip install transformers`: This installs the transformers library in Colab.
- `from transformers import pipeline`: This imports a handy tool called `pipeline` that simplifies working with pretrained models.

**Why this matters:** We need these tools to access the power of transformers. This step sets us up for advanced sentiment analysis.

---

### Step 2: Load a Pretrained Sentiment Analysis Model

Hugging Face offers ready-to-use models. Let’s load one designed for sentiment analysis.

```python
# Create a sentiment analysis pipeline with a pretrained model
sentiment_model = pipeline("sentiment-analysis")
```

**What’s happening here?**
- `pipeline("sentiment-analysis")`: This sets up a pipeline that uses a pretrained model to analyze sentiment. It’s like hiring an expert who’s already trained and ready to work!

**Why this matters:** Using a pretrained model saves us time and effort. It’s already learned how to detect emotions from tons of text, so we can jump straight to using it.

---

### Step 3: Try the Model on One Example

Let’s test the transformer model with a single sentence.

```python
# Analyze a sample sentence
result = sentiment_model("The new technology is breaking boundaries.")

# Print the result
print(result)
```

**Sample Output:**
```
[{'label': 'POSITIVE', 'score': 0.9998}]
```

**What does this mean?**
- **Label**: The model predicts "POSITIVE" or "NEGATIVE."
- **Score**: A number between 0 and 1 showing how confident the model is (here, 99.98% sure it’s positive).

**Try it yourself:** Test "This product is terrible." What label and score do you get?

**Why this matters:** The confidence score helps us trust the model’s judgment. A high score means it’s very sure, while a low score (like 0.5) might mean the text is tricky. Testing it builds our understanding before we scale up.

---

### Step 4: Apply the Model to the Whole Dataset

Now, let’s use the transformer model on all our articles’ descriptions.

```python
# Apply the sentiment model to each description
df['HF_Sentiment'] = df['Description'].apply(lambda x: sentiment_model(x)[0])

# Split the result into two columns: label and score
df['HF_Sentiment_Label'] = df['HF_Sentiment'].apply(lambda x: x['label'])
df['HF_Sentiment_Score'] = df['HF_Sentiment'].apply(lambda x: x['score'])
```

**What’s happening here?**
- `sentiment_model(x)[0]`: The model returns a list with one dictionary per description. We take the first item (`[0]`).
- `df['HF_Sentiment']`: This temporarily stores the full result.
- We then extract:
  - `HF_Sentiment_Label`: The prediction ("POSITIVE" or "NEGATIVE").
  - `HF_Sentiment_Score`: The confidence score.

**Check your work:** Run `df.head()` to see the new columns. Do the labels match the descriptions’ tones?

**Why this matters:** Using a second model lets us compare approaches. Does Hugging Face agree with TextBlob? If not, why? This comparison makes our analysis stronger and more reliable.

---

### Step 5: Reflect and Discuss

Transformers are advanced, but they’re not flawless. Let’s think about them:
- **What’s better about this model?** It understands context—like "not good" being negative—better than TextBlob.
- **Can it still mess up?** Yes, especially with sarcasm or niche topics it wasn’t trained on.
- **Which model do you trust more?** If they disagree, why might that happen?

**🎁 Bonus Idea:**  
Try feeding both models a sarcastic sentence like "Oh great, another delay." Which one gets it right? Run the test and find out!

---

## 📊 Analyze Sentiment by Subtopic

We’ve got sentiment scores from both models—now let’s use them to answer our big question: Which subtopics have more positive or negative coverage?

### Step 1: Group by Subtopic and Calculate Average Sentiment

Let’s start with TextBlob’s polarity scores.

```python
# Group articles by subtopic and calculate the average polarity
avg_sentiment = df.groupby('Subtopic')['Sentiment_Polarity'].mean().sort_values()
```

**What’s happening here?**
- `df.groupby('Subtopic')`: This organizes the data by subtopic (from Part 1’s clustering).
- `['Sentiment_Polarity'].mean()`: This averages the polarity scores for each subtopic.
- `.sort_values()`: This sorts from most negative to most positive.

**Why this matters:** This step turns our raw scores into actionable insights. We can now see which subtopics are the most positive or negative overall—our main goal!

---

### Step 2: Plot the Results

Let’s visualize this with a bar chart using seaborn, a library that makes plots look great.

```python
# Import libraries for plotting
import seaborn as sns
import matplotlib.pyplot as plt

# Set up the plot size
plt.figure(figsize=(10, 6))

# Create a bar plot of average sentiment by subtopic
sns.barplot(x=avg_sentiment.values, y=avg_sentiment.index, palette="coolwarm")

# Add clear labels and a title
plt.title("Average Sentiment by Subtopic (TextBlob)")
plt.xlabel("Average Sentiment Polarity (-1 to +1)")
plt.ylabel("Subtopic")

# Display the plot
plt.show()
```

**What’s happening here?**
- `sns.barplot()`: This draws bars where the length shows the average polarity, and the y-axis lists subtopics.
- `palette="coolwarm"`: Colors range from red (negative) to blue (positive).
- Labels and title make it clear what we’re looking at.

**Why this matters:** A chart makes it easy to spot patterns—like which subtopics are positive or negative—at a glance. Bar charts are perfect for comparing categories like this.

**🧠 Reflection Prompt:**  
Were you surprised by which subtopics had the most positive or negative coverage? Why might that be?

---

### Step 3: Compare with Hugging Face Results

Let’s analyze the Hugging Face results too, focusing on the percentage of positive articles per subtopic.

```python
# Calculate the percentage of articles labeled "POSITIVE" per subtopic
positive_percentage = df.groupby('Subtopic')['HF_Sentiment_Label'].apply(lambda x: (x == 'POSITIVE').mean() * 100).sort_values()

# Set up the plot
plt.figure(figsize=(10, 6))

# Create a bar plot
sns.barplot(x=positive_percentage.values, y=positive_percentage.index, palette="viridis")

# Add labels and title
plt.title("Percentage of Positive Articles by Subtopic (Hugging Face)")
plt.xlabel("Percentage of Positive Articles (%)")
plt.ylabel("Subtopic")

# Show the plot
plt.show()
```

**What’s happening here?**
- `(x == 'POSITIVE').mean() * 100`: This calculates the fraction of "POSITIVE" labels per subtopic, then converts it to a percentage.
- The plot shows how many articles per subtopic are positive according to Hugging Face.

**Why this matters:** This gives us a different angle—focusing on positivity rates rather than average scores. Comparing both models helps us see the full picture.

---

## 🗣 Visual Storytelling & Interpretation

You’ve got the data—now let’s tell a story with it! Create 1–2 new charts to explore further. Ideas:
- Compare TextBlob’s average polarity with Hugging Face’s positive percentage.
- Show the spread of sentiment scores for one subtopic (e.g., using a histogram).

**Tips for Awesome Charts:**
- **Label axes** clearly (e.g., "Sentiment Score" or "Subtopic").
- **Add a title** that explains the chart’s purpose.
- **Pick colors** that make sense (green for positive, red for negative).
- **Keep it honest**—don’t stretch the y-axis to exaggerate differences.

**🧠 Reflection Prompt:**  
What story do your charts tell? If you were presenting this to a team, what would your headline be? Example: "Tech News Shines Bright While Politics Dims."

---

## 💬 Final Discussion & Extensions

Amazing work! Let’s wrap up with some big questions:
- **Are these models biased?** Do they favor certain words or topics? How could we test that?
- **Do they treat all subtopics fairly?** Maybe some topics naturally use more emotional language.
- **What about article length?** Do short headlines or long articles work better?

**🎁 Bonus Idea:**  
Combine the `Title` and `Description` into one text (e.g., `df['Title'] + " " + df['Description']`) and rerun the sentiment analysis. Does adding the title improve the results?

---

## 🧾 Final Deliverables

To finish Part 2, prepare these for your hackathon submission:
- ✅ **What your models do:** Explain how TextBlob and Hugging Face analyze sentiment.
- 📊 **Visualizations:** Include at least two charts showing your insights.
- 🤔 **Reflections:** Share what worked, what didn’t, and what surprised you.
- 🛠 **One improvement:** Suggest something you’d try with more time (e.g., testing more models).

---

## 🎉 Congratulations!

You’ve just completed a real-world sentiment analysis project! You’ve learned to use two powerful tools, compare their results, and turn data into a story with charts. Even better, you’re thinking like a data scientist—asking questions, experimenting, and reflecting.

Keep exploring and stay curious. The skills you’ve gained here are the start of something big. The tech world is lucky to have you! 🌟