This notebook walks you through the process of getting Riveter scores out of a Riveter object and into a pandas dataframe. To run this notebook you will need your own (pickled) Riveter object, which you can create using Riveter (Antoniak et al. 2023). You can learn more about using Riveter on its [Github page](https://github.com/maartensap/riveter-nlp/tree/main). 

You can also use the [Python runfile available on Github here](https://github.com/julianeugarten/CCLS2025/blob/main/riveter_runfile.py), which was used for the paper *A Powerful Hades is an Unpopular Dude: Dynamics of Power and Agency in Hades/Persephone Fanfiction* for the Conference of Computational Literary Studies 2025.

To run that file, you will need a csv of work-ids associated with works on AO3.

The Riveter objects resulting from the analysis in that paper have not been shared in open access to preserve the privacy of the fanfiction community.

If you just want to explore the Riveter models explored in that paper, you can skip this notebook and go straight to the next two, which import the scores from a csv that is the output of this notebook.

## Import libraries

In [1]:
# importing the requirements

from collections import defaultdict
import os
import numpy as np
import pandas as pd
import random
from riveter import Riveter

import csv
# import scipy.stats as stats
# from scipy.stats import pearsonr
# from collections import Counter
# from datetime import datetime
# from sklearn.preprocessing import LabelEncoder
# from sklearn.feature_extraction.text import CountVectorizer
# from pprint import pprint

# import seaborn as sns
# import matplotlib.pyplot as plt
import pickle

# import IPython.display as display

# SPACY & COREF IMPORTS
import spacy
import spacy_experimental
nlp = spacy.load("en_core_web_sm")
nlp_coref = spacy.load("en_coreference_web_trf")

nlp_coref.replace_listeners("transformer", "coref", ["model.tok2vec"])
nlp_coref.replace_listeners("transformer", "span_resolver", ["model.tok2vec"])

nlp.add_pipe("coref", source=nlp_coref)
nlp.add_pipe("span_resolver", source=nlp_coref)

<spacy_experimental.coref.span_resolver_component.SpanResolver at 0x7fc6b3a390a0>

I created the Riveter models out of all stories under 10.000 characters in the HadPer subset, because of length limitations related to coreference resolution.

## Loading models and data

In [2]:
with open("YOUR_RIVETER_OBJECT.pkl", 'rb') as f: # insert the name of your pickled Riveter object here.
    riveter = pickle.load(f)

In [4]:
# also load your csv of work-ids

df = pd.read_csv('YOUR_FILE_NAME_HERE.csv')

In [None]:
# quick sanity check
df.head()

# What are the power scores?

In [None]:
# Create a function to get the score for the desired entities
# Hades is used as an example here, assuming Riveter's power lexicon has been applied

def get_character_score(identifier):
    scores = riveter.get_scores_for_doc(identifier)  # get the word-score dictionary
    return scores.get('hades', None)  # get the score for 'hades', or None if not found

# Apply the function to each identifier and create a new column for the scores
df['hades_power'] = df['work_id'].apply(get_character_score).astype(float)

In [None]:
# You can get some descriptive statistics of the scores

df['hades_power'].dropna().describe()

# Get Power Differential

In [9]:
# If you have used the code above to extract power-scores for two entities you can calculate the difference scores
# hades and persephone are used as an example here
# Ensure the 'hades' and 'persephone' columns are of float type

df['hades_power'] = df['hades_power'].astype(float).dropna()
df['persephone_power'] = df['persephone_power'].astype(float).dropna()

# Now calculate the power difference and store it in the new column
df['power_diff'] = df['hades_power'] - df['persephone_power']

This way a *negative* power difference indicates that Persephone was the higher-powered entity

In [12]:
df['power_diff'].describe()

## You can now save these scores to a csv

In [None]:
df.to_csv('scores.csv', index=False)

# In the notebooks that follow, my own csv of scores is titled 'CCLS2025.csv'