# 0. What kind of narrative are we talk about?

The most common approaches to narrative discovery are keyword-based and
topic-modeling methods. 
* **Keyword-based**   
focuses on sets of
coherent individual tokens, or keywords (e.g., Lazard et al.
2015), as a narrative.

* **Topic-modeling**   
discovers “topics”—
another proxy of narrative—each of which is a probability distribution over tokens. Latent Dirichlet Allocation (Blei, Ng, and Jordan 2003) and its variations (Yu
et al. 2019; Yu and Qiu 2019), including a BERT-based
model (Grootendorst 2022), and matrix factorization methods like NMF (Lee and Seung 2000) are commonly used.


However, despite the great utility of these methods and the
fact that keywords and topics are crucial elements of a narrative, **they are still not equivalent to narratives**.
One useful way to conceptualize is that narratives contain **actors, motives** (Dourish and
Gómez Cruz 2018). This is intimately related to semantic role labeling (SRL),
which identifies triplets of   

<img style="float: left;" src="../figs/Action_Agent_Patient.png" width="350">  



For example, the sentence **"I love this coffee shop"**,    
contains a triplet of     
<img style="float: left;" src="../figs/I_love_coffeeShop.png" width="450">



# 1. Set up the environment

In [None]:
%pip install allennlp
%pip install allennlp-models
%pip install ordered-set
%pip install spacy
%pip install nltk
%pip install networkx pyvis pandas

In [None]:
!python -m spacy download en_core_web_sm

# 2.What is Semantic Role Labeling 

The aim is to identify all elements that fulfill a semantic role in relation to this verb and to determine their specific roles, such as Agent, Patient, or Instrument, as well as their adjuncts, like Locative, Temporal, or Manner. For instance, in the sentence "Mary sold the book to John," the verb predicate is "sold." Here, "Mary" acts as the agent of the predicate, performing the action on "the book," which is transferred to the recipient "John." The diversity of sentence structures makes Semantic Role Labeling (SRL) a fascinating and challenging task.

<img style="float: left;" src="../figs/SRL_demo.png" width="450">

### You might want to explore the online demo and experiment with it.

https://cogcomp.seas.upenn.edu/page/demo_view/SRLEnglish

# 3. Deploy SRL using AllenNLP

In [9]:
from allennlp.predictors import Predictor
import spacy
spacy.load('en_core_web_sm')

<spacy.lang.en.English at 0x7f118fddffa0>

In [10]:
sentence = "The NBA (National Basketball Association) is a professional basketball league in North America, featuring the best players in the world competing in exciting games."

### It would be better to download the pre-trained model locally beforehand.
https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz

In [None]:
local_path = "https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz"
predictor = Predictor.from_path(local_path)

In [12]:
value= predictor.predict(sentence)

In [13]:
value

{'verbs': [{'verb': 'is',
   'description': '[ARG1: The NBA ( National Basketball Association )] [V: is] [ARG2: a professional basketball league in North America] , [ARGM-PRD: featuring the best players in the world competing in exciting games] .',
   'tags': ['B-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'B-V',
    'B-ARG2',
    'I-ARG2',
    'I-ARG2',
    'I-ARG2',
    'I-ARG2',
    'I-ARG2',
    'I-ARG2',
    'O',
    'B-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'I-ARGM-PRD',
    'O']},
  {'verb': 'featuring',
   'description': 'The NBA ( National Basketball Association ) is [ARG0: a professional basketball league in North America] , [V: featuring] [ARG1: the best players in the world competing in exciting games] .',
   'tags': ['O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
   

### As you can see,   
the predictions from SRL return a JSON structure that identifies the core verbs in the sentence and their corresponding semantic roles. In our example sentence, two core verbs were detected:

##### raw sentence:  
"The NBA (National Basketball Association) is a professional basketball league in North America, featuring the best players in the world competing in exciting games."

##### description:   
* [ARG0: a professional basketball league in North America] , **[V: featuring]** [ARG1: the best players in the world competing in exciting games] .'  


* [ARG0: the best players in the world] **[V: competing]** [ARGM-LOC: in exciting games] .'

### AllenNLP uses PropBank Annotation,   
where each verb sense has numbered arguments such as ARG-0, ARG-1, ARG-TMP, and ARG-LOC. For more details, please check [this article](https://medium.com/@ajazturki10/allennlp-dependency-parsing-semantic-role-labelling-coreference-resolution-70637afbe8cf). 

Given that we want to extract   
<img style="float: left;" src="../figs/Action_Agent_Patient.png" width="350">  





And in Semantic Role Labeling   
<img style="float: left;" src="../figs/ARG01.png" width="250"> 


Therefore, we will keep the first description since it includes the parts we need. 
* [ARG0: a professional basketball league in North America] , **[V: featuring]** [ARG1: the best players in the world competing in exciting games] .' 

# 4. Aggregate narrative triplets

We collect narrative triplets with similar semantic meanings and then apply clustering methods to aggregate them into a more concise form. For detailed implementation, please refer to our paper [*Discovering Collective Narrative Shifts in Online Discussions*](https://ojs.aaai.org/index.php/ICWSM/article/view/31427), or visit the [GitHub repository](https://github.com/wanyingzhao/collective_narrative_shift/tree/main) for more information on implementation details.

For example, with sentence "CDC Confirms First US Coronavirus Case"

* The Centers for Disease Control and Prevention (CDC) has reported the initial known case of the coronavirus in the United States.
* The CDC has officially verified the first incidence of coronavirus within the US.
* In the United States, the initial case of coronavirus has been confirmed by the CDC.
* The CDC has confirmed the first documented case of coronavirus in the US.
* The first known case of coronavirus within the United States has been validated by the CDC.
* The CDC has announced that the first case of coronavirus within the US has been confirmed.
* A person in the United States has been identified with coronavirus, as confirmed by the CDC.
* The CDC has verified the first instance of coronavirus in the US.
* The initial incidence of coronavirus in the United States has been validated by the CDC.
* The CDC has confirmed the first recorded case of coronavirus within the United States.

could be aggregated into triplet  

<img style="float: left;" src="../figs/cdc_confirm_us_covid.png" width="450"> 
