# Sentimental Analysis

Goal: Scrape data from a website like bloomberg, separate the news into categories, then assign a sentimental value

spaCY was used to do preprocessing

The following library are to be explored:

1. VADER
2. TextBlob
3. Flair
4. Models - RoBERTA (HuggingFace), DistilliBERT (HuggingFace)
5. LLM
6. Self Built (self-sourced Dataset)


## Exploring the Py libraries

In [6]:
import nltk
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('punkt_tab')

[nltk_data] Downloading package wordnet to C:\Users\Jay
[nltk_data]     Tai\AppData\Roaming\nltk_data...
[nltk_data] Downloading package stopwords to C:\Users\Jay
[nltk_data]     Tai\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\stopwords.zip.
[nltk_data] Downloading package punkt_tab to C:\Users\Jay
[nltk_data]     Tai\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt_tab.zip.


True

In [7]:
# setup

sentence = "Trump to Leave G-7 Tonight Due to Middle East Crisis"

# Preprocessing

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# Initialize NLTK tools
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

# Preprocess text
tokens = word_tokenize(sentence.lower())  # Tokenize and lowercase
cleaned_tokens = [lemmatizer.lemmatize(token) for token in tokens if token.isalpha() and token not in stop_words]
test_sentence = " ".join(cleaned_tokens)
print(test_sentence)

trump leave tonight due middle east crisis


### 1. VADER

In [23]:
# Prebuilt Vader sentiment package

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
scores = analyzer.polarity_scores(test_sentence)
print(scores)

{'neg': 0.515, 'neu': 0.485, 'pos': 0.0, 'compound': -0.6486}


### 2. TextBlob

In [24]:
# Prebuilt Textblob sentiment package

from textblob import TextBlob
text = TextBlob(test_sentence)
score = text.sentiment
print(score)

Sentiment(polarity=-0.0625, subjectivity=0.1875)


### 3. Flair

Is optimized for sequence labeling but also has prebuild sentiment classification

In [None]:
# Prebuilt Flair sentiment package/Model

from flair.data import Sentence
from flair.nn import Classifier

sentence = Sentence(test_sentence)
tagger = Classifier.load('sentiment')
tagger.predict(sentence)
print(sentence)

### 4. HuggingFace Transformers

In [None]:
from transformers import pipeline

#### - RoBERTa

In [None]:
classifier = pipeline('sentiment-analysis', model='cardiffnlp/twitter-roberta-base-sentiment')
result = classifier(test_sentence)
print(result)

#### - DistilBERT

In [None]:
classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
result = classifier(test_sentence)
print(result)

#### - Google Flan t5 base LLM model (Open source via HuggingFace)

In [None]:
classifier = pipeline("text2text-generation", model="google/flan-t5-base")
prompt = f"Classify the sentiment of '{test_sentence}' as positive, negative, or neutral, and give a sentimental score of -1 to 1."
result = classifier(prompt)
print(result)

### 5. OpenRouter to send api request to LLM

In [8]:
# LLM Qwen

from openai import OpenAI

client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="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",
)

completion = client.chat.completions.create(
  extra_body={},
  model="deepseek/deepseek-r1-0528-qwen3-8b:free",
  messages=[
    {
      "role": "user",
      "content": f"Conduct Sentimental Analysis on the following statement(s) and give me a polarity and score. {test_sentence}"
    }
  ]
)
print(completion.choices[0].message.content)

Okay, let's break down the statement: "trump leave tonight due middle east crisis".

1.  **Understanding the Statement:** This statement is saying that someone named Trump will leave tonight because there is a "middle east crisis". Assuming "leave" means departing from the Middle East, the context implies leadership withdrawal during a time of instability.

2.  **Sentiment Analysis:**
    *   **Polarity:** The inherent relationship described (领导人因危机离开) generally carries negative connotations. Crises are disruptive, stressful, and often seen negatively. His departure *during* a crisis, especially if he holds a position of authority often considered stabilizing in such situations (like Commander-in-Chief), creates uncertainty. Sentiment analysis tools often interpret this as having a **slightly negative or negative polarity.**
    *   **Score:** Standard sentiment analysis gives scores (e.g., from -1 to +1). Most tools would lean towards the negative side.
        *   A simple analysis m

### 6. Building own model from data taken from Kraggle

cnbc: (3080, 3) --> used for tesing

guardian: (17800, 2)

retuers: (32770, 3)

In [None]:
import pandas as pd
cnbc = pd.read_csv("data/cnbc_headlines.csv") # used as a testing data across the board
guardian = pd.read_csv("data/guardian_headlines.csv")
reuters = pd.read_csv("data/reuters_headlines.csv")

In [None]:
# preparing training data --> combining guardian and reuters together

train_set = pd.DataFrame({
    'Headlines': list(guardian['Headlines']) + list(reuters['Headlines']),
    'Time': list(guardian['Time']) + list(reuters['Time'])
})

train_set.shape

In [None]:
# preparing testing data

test_set = pd.DataFrame({
    'Headlines': list(cnbc['Headlines']),
    'Time': list(cnbc['Time'])
})