# Moral Foundations Classification of SCU Dilemma Dataset

This notebook performs moral foundation classification on a dataset of dilemmas using the [MoralFoundationsClassifier](https://huggingface.co/MMADS/MoralFoundationsClassifier) classifier. 

Workflow:

1. Cloning the classifier model from Hugging Face.
2. Loading the model, tokenizer, and label mappings.
3. Reading the dilemma dataset from a CSV file.
4. Running the classifier on each dilemma, handling long texts by chunking.
5. Averaging the model's predictions across text chunks for each moral foundation.
6. Saving the results, including the most probable moral foundation per dilemma, to an Excel file.

In [None]:
!git clone https://huggingface.co/MMADS/MoralFoundationsClassifier

In [None]:
%pip install transformers torch

In [None]:
"""
loading the model and tokenizer and labels
"""
import torch
from transformers import RobertaTokenizer, RobertaForSequenceClassification
import json

model_path = "MoralFoundationsClassifier"
model = RobertaForSequenceClassification.from_pretrained(model_path)
tokenizer = RobertaTokenizer.from_pretrained(model_path)
with open(f"{model_path}/label_names.json") as f:
    labels = json.load(f)

In [None]:
"""
Read the dilemma dataset
"""
import pandas as pd

input_path = "crafting_tech_8_business_ethics.csv"
df = pd.read_csv(input_path)


In [None]:
from transformers import TextClassificationPipeline

# 1. Build your pipeline once
pipeline = TextClassificationPipeline(
    model=model,
    tokenizer=tokenizer,
    framework="pt",
    # return_all_scores=True,
    top_k=None,
    function_to_apply="sigmoid"
)

# 2. Define a chunking function
def chunk_text(text, tokenizer, chunk_size=512, stride=256):
    tokens = tokenizer(
        text,
        return_tensors="pt",
        add_special_tokens=False
    )["input_ids"][0]
    chunks = []
    for start in range(0, len(tokens), chunk_size - stride):
        end = start + chunk_size
        chunk_ids = tokens[start:end]
        chunks.append(tokenizer.decode(chunk_ids, skip_special_tokens=True))
        if end >= len(tokens):
            break
    return chunks

results = []
for _, row in df.iterrows():
    text = row["Content"]
    # 3. Split the text into chunks
    chunks = chunk_text(text, tokenizer, chunk_size=512, stride=256)

    # 4. Get a dict of label→score for each chunk
    chunk_scores = [
        {
            item["label"]: item["score"]
            for item in pipeline(
                chunk,
                truncation=True,
                max_length=512
            )[0]   # [0] unwraps the list-of-dicts for this single input
        }
        for chunk in chunks
    ]

    # 5. Average each label’s score across all chunks
    idx_to_label = {f"LABEL_{i}": name for i, name in enumerate(label_names)}
    # 1) Remap each chunk's dict from LABEL_i → score to label_name → score
    mapped_chunk_scores = [
        {idx_to_label[k]: v for k, v in chunk.items()}
        for chunk in chunk_scores
    ]

    # 2) Compute the average score per moral category
    avg_scores = {
        label: sum(d[label] for d in mapped_chunk_scores) / len(mapped_chunk_scores)
        for label in label_names
    }

    # 6. Merge with original row and collect
    results.append({**row.to_dict(), **avg_scores})

# 7. Save back to Excel as before
out_df = pd.DataFrame(results)
output_path = "moral_classification_results.xlsx"
out_df.to_excel(output_path, index=False)


In [None]:
# moral‑foundation columns:
moral_labels = [
    "care_virtue","care_vice",
    "fairness_virtue","fairness_vice",
    "loyalty_virtue","loyalty_vice",
    "authority_virtue","authority_vice",
    "sanctity_virtue","sanctity_vice"
]

# 1. Compute the max probability score per row
out_df["max_score"] = out_df[moral_labels].max(axis=1)

# 2. Identify which label gave that max score
out_df["max_label"] = out_df[moral_labels].idxmax(axis=1)

# 3. new column appear at the end
cols = out_df.columns.tolist()
new_order = [c for c in cols if c not in ("max_score","max_label")] + ["max_score","max_label"]
out_df = out_df[new_order]

output_path = "moral_classification_scored.xlsx"
# 4. Save back to Excel
out_df.to_excel(output_path, index=False)

print(f"Saved enriched results to {output_path}")
