<a href="https://colab.research.google.com/github/nananair/Research-NLP-projects/blob/main/Distributional_analysis_toolkit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook accompanies the article: "Methodological pathways to studying the shareability of news values"

It offers a quartile-based distributional analysis for examining the shareability of news values.

To use this notebook: simply upload your labelled dataset (Excel or CSV). No coding experience is required.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm


In [None]:
# Upload dataset

from google.colab import files

print("Please upload your labelled dataset (Excel or CSV).")
uploaded = files.upload()

for filename in uploaded.keys():
    if filename.endswith(".xlsx"):
        df = pd.read_excel(filename)
    elif filename.endswith(".csv"):
        df = pd.read_csv(filename)
    else:
        raise ValueError("File must be .xlsx or .csv")
    break

print("Dataset loaded successfully")
df.head()


In [None]:
# Check dataset structure
# Your dataset should have columns like "Share counts", "Negativity", "Impact", "Article", etc.

df.info()
print(df.head())


In [None]:
# Analysis

news_values = [
    "Negativity",
    "Positivity",
    "Impact",
    "Personalisation",
    "Eliteness",
    "Superlativeness",
    "Consonance",
    "Unexpectedness",
    "Proximity",
    "Timeliness"
]

df["Share Quartile"] = pd.qcut(df["Share counts"], q=4, labels=["Q1", "Q2", "Q3", "Q4"])
avg_scores = df.groupby("Share Quartile")[news_values].mean().reset_index()

plt.figure(figsize=(8,6))
sns.lineplot(data=avg_scores, x="Share Quartile", y="Negativity", marker="o", label="Negativity")
sns.lineplot(data=avg_scores, x="Share Quartile", y="Impact", marker="o", label="Impact")
sns.lineplot(data=avg_scores, x="Share Quartile", y="Personalisation", marker="o", label="Personalisation")
plt.title("Average News Values across Share Quartiles")
plt.ylabel("Average frequency")
plt.legend()
plt.show()


In [None]:
# Save results back to Excel

df.to_excel("shareworthiness_results.xlsx", index=False)
files.download("shareworthiness_results.xlsx")


Done!  

You have successfully run a shareworthiness analysis on your dataset.  
This notebook is fully reproducible and can be adapted to any dataset annotated with news values.  
