<a href="https://colab.research.google.com/github/sahanyafernando/My_NLP_Learning/blob/main/Public_Response_Analysis/notebooks/05_ner_aspect_temporal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 05 â€“ Entity Extraction, Aspect-Based Sentiment, and Temporal Analysis

This notebook performs transformer-based NER, simple aspect-based sentiment analysis
around policy topics, and temporal sentiment trend exploration.

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
import pickle, pathlib

artifacts_root = pathlib.Path("/content/drive/MyDrive/My_NLP_Learning/Public_Response_Analysis")
artifacts_path = artifacts_root / "artifacts/preprocessing_outputs.pkl"

if artifacts_path.exists():
    with open(artifacts_path, "rb") as f:
        artifacts = pickle.load(f)
    df = artifacts["df"]
    print("Loaded preprocessing artifacts and DataFrame.")
else:
    raise FileNotFoundError(
        "Artifacts not found. Please run 01_data_loading_and_preprocessing.ipynb first "
        "and execute the 'Save preprocessing artifacts' cell."
    )


## Transformer-based Named Entity Recognition (NER)

We use a multilingual transformer model for NER to extract entities mentioned in posts.

In [None]:
!pip install -q transformers sentencepiece

from transformers import pipeline

ner_pipeline = pipeline(
    task="ner",
    model="Davlan/xlm-roberta-base-ner-hrl",
    aggregation_strategy="simple",
)

sample_texts = df["text"].head(5).tolist()
for text in sample_texts:
    print("\nText:", text)
    ents = ner_pipeline(text)
    print("Entities:", ents)


## Aspect-based sentiment around policy topics

We use the existing sentiment labels and topics as a simple ABSA setting:
for each `topic`, we analyze the distribution of sentiment and key example posts.

In [None]:
topic_sent = (
    df.groupby(["topic", "sentiment_label"])
    .size()
    .unstack(fill_value=0)
    .sort_index()
)

print("Sentiment distribution per topic:")
print(topic_sent)

for topic in df["topic"].unique():
    print(f"\n=== Topic: {topic} ===")
    subset = df[df["topic"] == topic]
    print("Example positive posts:")
    print(subset[subset["sentiment_label"] == "positive"]["text"].head(3).to_string(index=False))
    print("\nExample negative posts:")
    print(subset[subset["sentiment_label"] == "negative"]["text"].head(3).to_string(index=False))


## Temporal sentiment trends

We convert timestamps to datetime, aggregate sentiment over time,
and visualize changes across events and languages.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df_time = df.copy()
df_time["timestamp"] = pd.to_datetime(df_time["timestamp"])

sentiment_map = {"negative": -1, "neutral": 0, "positive": 1}
df_time["sentiment_score"] = df_time["sentiment_label"].map(sentiment_map)

daily = df_time.set_index("timestamp").groupby([pd.Grouper(freq="D")])["sentiment_score"].mean()

plt.figure(figsize=(10, 4))
daily.plot(marker="o")
plt.title("Average sentiment over time")
plt.xlabel("Date")
plt.ylabel("Average sentiment score")
plt.grid(True)
plt.show()


### Simple change-point style analysis

We flag days where the sentiment deviates strongly from the overall mean
as potential change-points related to key events.

In [None]:
import numpy as np

mean_sent = daily.mean()
std_sent = daily.std()
threshold = mean_sent + 1.0 * std_sent

print("Global mean sentiment:", mean_sent)
print("Global std sentiment:", std_sent)

anomalies = daily[daily > threshold]
print("\nPotential positive sentiment spikes:")
print(anomalies)

threshold_neg = mean_sent - 1.0 * std_sent
anomalies_neg = daily[daily < threshold_neg]
print("\nPotential negative sentiment drops:")
print(anomalies_neg)
