# 0. What kind of narrative are we talk about?

The most common approaches to narrative discovery are keyword-based and
topic-modeling methods. 
* Keyword-based   
focuses on sets of
coherent individual tokens, or keywords (e.g., Lazard et al.
2015), as a narrative.

* Topic-modeling   
discovers “topics”—
another proxy of narrative—each of which is a probability distribution over tokens. Latent Dirichlet Allocation (Blei, Ng, and Jordan 2003) and its variations (Yu
et al. 2019; Yu and Qiu 2019), including a BERT-based
model (Grootendorst 2022), and matrix factorization methods like NMF (Lee and Seung 2000) are commonly used.


However, despite the great utility of these methods and the
fact that keywords and topics are crucial elements of a narrative, **they are still not equivalent to narratives**.
One useful way to conceptualize is that narratives contain **actors, motives** (Dourish and
Gómez Cruz 2018). This is intimately related to semantic role labeling (SRL),
which identifies triplets of <span style="font-size:1.5em;color:red">Action</span> (a verb), <span style="font-size:1.5em;color:green">Agent</span> (who initiates the action) and <span style="font-size:1.5em;color:blue">Patient</span> (the recipient of the action) from a sentence (Fillmore 1967). 

For example, the sentence <span style="font-size:1.5em">“I love this coffee shop”</span> ,   
contains a triplet of   
<span style="font-size:1.5em;color:green">“I” (agent)</span>,  <span style="font-size:1.5em;color:red">“love” (action)</span>, <span style="font-size:1.5em;color:blue">“coffee shop” (patient)</span>


# 1. Set up the environment

<div class="alert alert-block alert-info">
<b>Please follow:</b> <br>

For extract_triplets.ipynb <br>
    
* conda create -n ic2s2_narrative  <br>
* conda activate ic2s2_narrative  <br>
* conda install python=3.8 <br><br>
* conda install pytorch torchvision torchaudio cpuonly -c pytorch<br>
* conda install matplotlib <br>
* pip install allennlp<br>
* pip install ipywidgets<br>
* pip install protobuf==3.20.* <br>

* conda install -c anaconda ipykernel  <br>
* python -m ipykernel install --user --name=ic2s2_narrative <br>
    
    
For visual_narrative_network.ipynb <br>
* conda install networkx
* pip install pyvis

</div>


# 2.What is Semantic Role Labeling 

The aim is to identify all elements that fulfill a semantic role in relation to this verb and to determine their specific roles, such as Agent, Patient, or Instrument, as well as their adjuncts, like Locative, Temporal, or Manner. For instance, in the sentence "Mary sold the book to John," the verb predicate is "sold." Here, "Mary" acts as the agent of the predicate, performing the action on "the book," which is transferred to the recipient "John." The diversity of sentence structures makes Semantic Role Labeling (SRL) a fascinating and challenging task.

<img src="SRL_demo.png" width="450">

### You might want to explore the online demo and experiment with it.

https://cogcomp.seas.upenn.edu/page/demo_view/SRLEnglish

# 3. Deploy SRL using AllenNLP

In [1]:
from allennlp.predictors import Predictor

In [11]:
sentence = "The NBA (National Basketball Association) is a professional basketball league in North America, featuring the best players in the world competing in exciting games."

### It would be better to download the pre-trained model locally beforehand.
https://storage.googleapis.com/allennlp-public-models/structured-prediction-srl-bert.2020.12.15.tar.gz

In [None]:
local_path = "../structured-prediction-srl-bert.2020.12.15.tar.gz"
predictor = Predictor.from_path(local_path)

In [12]:
value= predictor.predict(sentence)

In [13]:
value

{'verbs': [{'verb': 'featuring',
   'description': 'The NBA ( National Basketball Association ) is [ARG0: a professional basketball league in North America] , [V: featuring] [ARG1: the best players in the world competing in exciting games] .',
   'tags': ['O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'B-ARG0',
    'I-ARG0',
    'I-ARG0',
    'I-ARG0',
    'I-ARG0',
    'I-ARG0',
    'I-ARG0',
    'O',
    'B-V',
    'B-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'I-ARG1',
    'O']},
  {'verb': 'competing',
   'description': 'The NBA ( National Basketball Association ) is a professional basketball league in North America , featuring [ARG0: the best players in the world] [V: competing] [ARGM-LOC: in exciting games] .',
   'tags': ['O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'O',
    'B-

### As you can see,   
the predictions from SRL return a JSON structure that identifies the core verbs in the sentence and their corresponding semantic roles. In our example sentence, two core verbs were detected:

##### raw sentence:  
"The NBA (National Basketball Association) is a professional basketball league in North America, featuring the best players in the world competing in exciting games."

##### description:   
* [ARG0: a professional basketball league in North America] , **[V: featuring]** [ARG1: the best players in the world competing in exciting games] .'  


* [ARG0: the best players in the world] **[V: competing]** [ARGM-LOC: in exciting games] .'

### AllenNLP uses PropBank Annotation,   
where each verb sense has numbered arguments such as ARG-0, ARG-1, ARG-TMP, and ARG-LOC. For more details, please check [this article](https://medium.com/@ajazturki10/allennlp-dependency-parsing-semantic-role-labelling-coreference-resolution-70637afbe8cf). 

Given that we want to extract <span style="font-size:1.5em;color:red">Action</span> (a verb), <span style="font-size:1.5em;color:green">Agent</span> (who initiates the action), and <span style="font-size:1.5em;color:blue">Patient</span> (the recipient of the action) from a sentence, we focus on ARG-0, Verb, and ARG-1. <span style="font-size:1.5em;color:green">ARG-0</span> refers to the <span style="font-size:1.5em;color:green">Agent</span>, and <span style="font-size:1.5em;color:blue">ARG-1</span> refers to the <span style="font-size:1.5em;color:blue">Patient</span>.

Therefore, we will keep the first description since it includes the parts we need. 
* [ARG0: a professional basketball league in North America] , **[V: featuring]** [ARG1: the best players in the world competing in exciting games] .' 



# 4. Aggregate narrative triplets

We collect narrative triplets with similar semantic meanings and then apply clustering methods to aggregate them into a more concise form. For detailed implementation, please refer to our paper [*Discovering Collective Narrative Shifts in Online Discussions*](https://ojs.aaai.org/index.php/ICWSM/article/view/31427), or visit the [GitHub repository](https://github.com/wanyingzhao/collective_narrative_shift/tree/main) for more information on implementation details.

For example, with sentence "CDC Confirms First US Coronavirus Case"

* The Centers for Disease Control and Prevention (CDC) has reported the initial known case of the coronavirus in the United States.
* The CDC has officially verified the first incidence of coronavirus within the US.
* In the United States, the initial case of coronavirus has been confirmed by the CDC.
* The CDC has confirmed the first documented case of coronavirus in the US.
* The first known case of coronavirus within the United States has been validated by the CDC.
* The CDC has announced that the first case of coronavirus within the US has been confirmed.
* A person in the United States has been identified with coronavirus, as confirmed by the CDC.
* The CDC has verified the first instance of coronavirus in the US.
* The initial incidence of coronavirus in the United States has been validated by the CDC.
* The CDC has confirmed the first recorded case of coronavirus within the United States.

could be aggregated into triplet  

<span style="font-size:1.5em">(</span>
<span style="font-size:1.5em;color:green">CDC</span>
<span style="font-size:1.5em"> ,</span>
<span style="font-size:1.5em;color:red">confirm:1</span>
<span style="font-size:1.5em"> ,</span>
<span style="font-size:1.5em;color:blue">First US Cornavirus Case</span>
<span style="font-size:1.5em"> )</span>  

    ↓           ↓         ↓

<span style="font-size:1.5em">(</span>
<span style="font-size:1.5em;color:green">ARG-0</span>
<span style="font-size:1.5em"> ,</span>
<span style="font-size:1.5em;color:red">Verb/Frame</span>
<span style="font-size:1.5em"> ,</span>
<span style="font-size:1.5em;color:blue">ARG-1</span>
<span style="font-size:1.5em"> )</span>

