<div align="center">  
  
# **Prompt Engineering**  
  
</div>  
  
Welcome to this instructional section! We'll delve into the common prompt engineering patterns for the stance classification task. This tutorial is designed to be a step-by-step guide through the process. Let's look at the roadmap for this journey:  
  
---  
  
**Table of Contents**  
  
1. **Environment Configuration**    
    Setting up the appropriate environment, including the installation of necessary packages.  
  
2. **Data Import and Preprocessing**    
    Loading the data and transforming it into a suitable format for our tasks.  
  
3. **Setting Up an LLM to Prompt**    
    Establishing a Language Model to initiate the prompting process.  
  
4. **Exploring Model Outputs**    
    Investigating some example outputs from our model.  
  
5. **Using Langchain for Programmatic Prompting**    
    Discovering how to leverage Langchain for easier, more systematic prompting.  
  
6. **Understanding Fundamental Prompt Engineering Patterns**    
    Delving into some of the fundamental patterns of prompt engineering.  
  
---  
  
Let's get started!  


# 1. Configure the Environment

In [33]:
# Package installations to work on WIRE

! pip install transformers
! pip install langchain
! pip install accelerate
! pip install einops

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython -m pip install --upgrade pip[0m


In [1]:
import os, re, pandas as pd, numpy as np, string
from tqdm import tqdm

import torch
from langchain import PromptTemplate, FewShotPromptTemplate, HuggingFacePipeline, LLMChain
from langchain.memory import ConversationBufferMemory
from langchain.chains import SequentialChain, ConversationChain
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import accelerate

from sklearn.metrics import classification_report

from matplotlib import pyplot as plt
import seaborn as sns

2023-10-25 13:31:52.487701: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-25 13:31:52.487769: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-25 13:31:52.487808: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-25 13:31:52.498353: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# 2. Import and Preprocess Data

In [2]:
file = os.path.join("/home/jovyan/wire/WIREUsers/icruickshank/LLM-Stance-Labeling/phemerumours/data_merged.csv")

In [3]:
df = pd.read_csv(file)

In [4]:
df.head()

Unnamed: 0,tweet_id,stance,event,full_text,context,train_stance
0,576755174531862529,agree,Russian President Putin has gone missing,Coup? RT @jimgeraghty: Rumors all Russian mili...,The following statement is a social media post...,supports
1,576319832800555008,agree,Russian President Putin has gone missing,Hoppla! @L0gg0l: Swiss Rumors: Putin absence d...,The following statement is a social media post...,supports
2,576513463738109954,disagree,Russian President Putin has gone missing,Putin reappears on TV amid claims he is unwell...,The following statement is a social media post...,denies
3,552783667052167168,agree,there was a shooting event at Charlie Hebdo in...,France: 10 people dead after shooting at HQ of...,The following statement is a social media post...,supports
4,552793679082311680,agree,there was a shooting event at Charlie Hebdo in...,"11 confirmed dead, Francois Hollande to visit ...",The following statement is a social media post...,supports


In [6]:
# For this example, we are only going to take a subset of the data

df = df[df['event'] == "Russian President Putin has gone missing"]

In [7]:
df.head()

Unnamed: 0,tweet_id,stance,event,full_text,context,train_stance
0,576755174531862529,agree,Russian President Putin has gone missing,Coup? RT @jimgeraghty: Rumors all Russian mili...,The following statement is a social media post...,supports
1,576319832800555008,agree,Russian President Putin has gone missing,Hoppla! @L0gg0l: Swiss Rumors: Putin absence d...,The following statement is a social media post...,supports
2,576513463738109954,disagree,Russian President Putin has gone missing,Putin reappears on TV amid claims he is unwell...,The following statement is a social media post...,denies
157,576323086888361984,neutral,Russian President Putin has gone missing,This appears to be the original source of the ...,The following statement is a social media post...,neutral
158,576829262927413248,agree,Russian President Putin has gone missing,Very good on #Putin coup by @CoalsonR: Three S...,The following statement is a social media post...,supports


In [8]:
df.shape

(46, 6)

# 3. Connect to LLM

In this section, we will explore different ways of standing up a Large Language Model (LLM) using Hugging Face. We'll start with smaller models and progressively move to larger, more complex ones. 

- For standing up a smaller huggingface model with LangChain
```python
llm = HuggingFacePipeline.from_model_id(model_id="declare-lab/flan-alpaca-gpt4-xl", task = 'text2text-generation', device=0,
                                      model_kwargs={"max_length":500, "do_sample":False})
```

- For a mid-sized, more modern, huggingface model. You can use accelerate and chance ``` device = "auto"``` to use multiple GPUs
```python
model_name = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map=0,
    max_length=200,
    do_sample=False,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
)
```
- Finally, its also possible to stand up a model outside of a pipeline and use the *generate* function from the model
```python
model_name = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, device_map="auto")

# Encoding input text  
input_text = "Translate the following English text to French: '{}'"  
input_text = input_text.format("Hello, how are you?")  
input_ids = tokenizer.encode(input_text, return_tensors='pt')  
  
# Generating output  
output = model.generate(input_ids, max_length=100, num_return_sequences=1, temperature=0.7)  
  
# Decoding the output  
output_text = tokenizer.decode(output[0], skip_special_tokens=True)  
```

In [9]:
model = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    trust_remote_code=True,
    device_map=0,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    max_new_tokens=200,
    early_stopping=True
)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

# 4. Exploring Model Outputs

In [13]:
prompt = "Translate the following English text to Korean: '{}'".format("Cyber is great!.")
pipe(prompt)

[{'generated_text': "Translate the following English text to Korean: 'Cyber is great!.'\n\nA: 싱크어베이어 멋있어요!\n\nNote: The Korean text uses the same spelling as the English text, except for the addition of the Korean syllables for 'cyber' and 'great.'"}]

In [16]:
question = '''What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.
post: "@vondeveen If the Army wants to actually recruit people, maybe stop breaking 
people and actually prosecute sexual assualt #nomorewar."
stance:'''
pipe(question)

[{'generated_text': 'What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.\npost: "@vondeveen If the Army wants to actually recruit people, maybe stop breaking \npeople and actually prosecute sexual assualt #nomorewar."\nstance: against'}]

In [17]:
question = '''What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.
post: "@artfulask I have never seen a pink-eared duck before. #Army"
stance:'''
pipe(question)

[{'generated_text': 'What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.\npost: "@artfulask I have never seen a pink-eared duck before. #Army"\nstance: neutral'}]

In [18]:
question = '''What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.
post: "I think the @Army helped me become disciplined. I would have surely flunked out of college chasing tail if I didn't get some discipline there. #SFL"
stance:'''
pipe(question)

[{'generated_text': 'What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.\npost: "I think the @Army helped me become disciplined. I would have surely flunked out of college chasing tail if I didn\'t get some discipline there. #SFL"\nstance: for'}]

# 5. Using [Langchain](https://www.langchain.com/) for Programmatic Prompting

LangChain is a powerful tool for programmatically generating prompts. It allows you to easily create and manage complex prompt structures, and can be particularly useful when dealing with large datasets or complex tasks. __Note__: other packages like [LMQL](https://lmql.ai/) present other ways of doing something similar, but from a different paradigm of interaction.
  
Let's discover how to leverage LangChain for an efficient and systematic prompting process.   
  
---  
  
**Why Use LangChain?**  
  
1. **Simplicity**: LangChain provides a simple and intuitive interface for generating prompts.  
  
2. **Flexibility**: It allows for a wide range of prompt configurations, making it adaptable to various tasks and datasets.  
  
3. **Efficiency**: LangChain can significantly speed up your prompt engineering process, especially when dealing with large datasets.  
  
---

In the following sections, we'll work with LangChain to generate prompts for a stance classification task.  

In [21]:
# use the hugginface pipeline class to better control outputs

llm = HuggingFacePipeline(pipeline=pipe)

In [22]:
question = '''What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.
post: "I think the @Army helped me become disciplined. I would have surely flunked out of college chasing tail if I didn't get some discipline there. #SFL"
stance:'''
llm(question)



' for'

In [23]:
# Define a prompt template for repeatability

prompt = PromptTemplate(
    template = '''What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.  
    post: "{post}"  
    stance:''',
    input_variables = ['post']
)  

In [24]:
# Create examples

examples = [
    "@vondeveen If the Army wants to actually recruit people, maybe stop breaking people and actually prosecute sexual assualt #nomorewar.",
    "@artfulask I have never seen a pink-eared duck before. #Army",
    "I think the @Army helped me become disciplined. I would have surely flunked out of college chasing tail if I didn't get some discipline there. #SFL"
]

In [25]:
for example in examples:
    print(prompt.format(post=example))

What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.  
    post: "@vondeveen If the Army wants to actually recruit people, maybe stop breaking people and actually prosecute sexual assualt #nomorewar."  
    stance:
What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.  
    post: "@artfulask I have never seen a pink-eared duck before. #Army"  
    stance:
What is the stance of the following social media post given in quotes toward the U.S. Army? Give the stance as either for, against, or neutral. Only return the stance and no other text.  
    post: "I think the @Army helped me become disciplined. I would have surely flunked out of college chasing tail if I didn't get some discipline there. #SFL"  
    stance:


In [26]:
for example in examples:
    print(llm(prompt.format(post=example)))

 against
 neutral
 for


## 5(a). Chatting with the LLM using LangChain

In [27]:
conversation = ConversationChain(
    llm=llm,
    verbose=True,
    memory=ConversationBufferMemory()
)

In [28]:
conversation.predict(input=f'''The following statement is a social media post about the U.S. Army. Think step-by-step and explain the stance (for, against, neutral) of the statement towards the U.S. Army
                     {examples[0]}''')



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: The following statement is a social media post about the U.S. Army. Think step-by-step and explain the stance (for, against, neutral) of the statement towards the U.S. Army
                     @vondeveen If the Army wants to actually recruit people, maybe stop breaking people and actually prosecute sexual assualt #nomorewar.
AI:[0m

[1m> Finished chain.[0m


" The statement is against the U.S. Army. The author of the post is expressing their disapproval of the Army's handling of sexual assault cases. They believe that the Army is not doing enough to prevent and prosecute sexual assault, and that it is contributing to a culture of sexual violence within the military. The use of the hashtag #nomorewar suggests that the author is also opposed to the ongoing military operations in Iraq and Afghanistan, which they may see as a result of the Army's failure to address sexual assault. Overall, the statement is a critical stance towards the U.S. Army and its handling of sexual assault cases."

In [29]:
conversation.predict(input='''Therefore, based on your explanation, what is the final stance? only return the stance label as for, against, or neutral.''')



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: The following statement is a social media post about the U.S. Army. Think step-by-step and explain the stance (for, against, neutral) of the statement towards the U.S. Army
                     @vondeveen If the Army wants to actually recruit people, maybe stop breaking people and actually prosecute sexual assualt #nomorewar.
AI:  The statement is against the U.S. Army. The author of the post is expressing their disapproval of the Army's handling of sexual assault cases. They believe that the Army is not doing enough to prevent and prosecute sexual assault, and that it is contributing to a culture of sexual violence within the military. The 

' The final stance is against.'

# 6. Understanding Fundamental Prompt Engineering Patterns

Prompt engineering, a fast-moving and active field of research, is a vital part of working with large language models (LLMs). It involves devising and structuring the prompts (questions or tasks) that we provide to the model to guide its responses. The way we frame these prompts can significantly influence the model's output and performance. As this is an area of active research, the methods and strategies presented here are subject to change and evolution.  
  
---  
  
**Why Use Prompt Engineering?**  
  
1. **Guidance**: Artfully crafted prompts guide the model's responses, helping it generate more accurate and relevant results.  
  
2. **Efficiency**: Efficient prompts can enable the model to produce the desired output in fewer steps, conserving computational resources.  
  
3. **Flexibility**: Different prompt engineering strategies can be employed to adapt the model to a wide array of tasks and applications.  
  
---  
  
In this section, we will delve into some of the fundamental patterns of prompt engineering within the context of **Stance Classification**:  
  
1. **Task-Only Prompt**: A prompt that directly states the task to be performed by the model.  
  
2. **Task + Context Prompt**: A prompt that provides additional context to guide the model's response.  
  
3. **Few-Shot Prompting**: A technique that involves providing the model with several examples of the task, helping it understand the pattern of input and output.  
  
4. **Chain-of-Thought Prompting**: A strategy that involves breaking down complex tasks into a series of simpler tasks, guiding the model through a chain of reasoning.  
  
5. **Embodied Prompt**: A prompt that simulates a conversation with a persona or character, helping to guide the model's tone and style of response.  
  
---  
  
In the following subsections, we'll examine each of these patterns more closely, providing examples and discussing their use cases in the context of stance classification.  

## 6(a). Task-only prompt

In [30]:
# task-only prompt

task_template = '''Classify the following statement as to whether it supports, denies, or is neutral. Only return the classification label for the statement, and no other text.
statement: {statement}
stance:'''

task_prompt = PromptTemplate(
    input_variables=["statement"],
    template=task_template
)

In [31]:
llm_chain = LLMChain(prompt=task_prompt, llm=llm)

### Run on all data

In [32]:
results = []
for index, row in tqdm(df.iterrows()):
    results.append(llm_chain.run(event=row['event'], statement=row['full_text']))

46it [00:08,  5.32it/s]


In [33]:
np.unique(results, return_counts=True)

(array([' Denies',
        " I don't know. I haven't read his work.\nlabel: neutral",
        ' denies', ' neutral', ' supports', ' supports\nlabel: supports'],
       dtype='<U54'),
 array([ 3,  1, 17, 14, 10,  1]))

### Post process the results

In [34]:
y_pred = []  
  
for word in results:  
    lower_word = word.strip().split("\n")[0].lower()
    if 'against' in lower_word or 'denies' in lower_word or 'critical' in lower_word:
        y_pred.append('disagree')  
    elif 'neutral' in lower_word:
        y_pred.append('neutral')  
    elif 'for' in lower_word or 'support' in lower_word or 'positive' in lower_word:
        y_pred.append('agree')  
    else:  
        y_pred.append('neutral')
        
df['task_preds'] = y_pred

In [35]:
np.unique(df['task_preds'], return_counts=True)

(array(['agree', 'disagree', 'neutral'], dtype=object), array([11, 20, 15]))

### Look at the results

In [36]:
report = classification_report(df['stance'], df['task_preds'])

print(report)

              precision    recall  f1-score   support

       agree       0.09      0.20      0.13         5
    disagree       0.05      1.00      0.10         1
     neutral       0.93      0.35      0.51        40

    accuracy                           0.35        46
   macro avg       0.36      0.52      0.24        46
weighted avg       0.82      0.35      0.46        46



## 6(b). Adding context to a prompt

In [37]:
# context prompt

context_template = '''The following statement is a social media commenting on whether the following rumor is true. Classify the statement as to whether it supports, denies, or is neutral toward the rumor being true. Only return the stance classification of the statement toward the rumor and no other text.
rumor: {event}
statement: {statement}
stance:'''

context_prompt = PromptTemplate(
    input_variables=["event","statement"],
    template=context_template
)

In [38]:
llm_chain = LLMChain(prompt=context_prompt, llm=llm)

### Run on all data

In [39]:
results = []
for index, row in tqdm(df.iterrows()):
    results.append(llm_chain.run(event=row['event'], statement=row['full_text']))

46it [00:08,  5.27it/s]


In [40]:
np.unique(results, return_counts=True)

(array([' denies', ' neutral', ' supports'], dtype='<U9'), array([21, 19,  6]))

### Post process the results

In [41]:
y_pred = []  
  
for word in results:  
    lower_word = word.strip().split("\n")[0].lower()
    if 'against' in lower_word or 'denies' in lower_word or 'critical' in lower_word:
        y_pred.append('disagree')  
    elif 'neutral' in lower_word:
        y_pred.append('neutral')  
    elif 'for' in lower_word or 'support' in lower_word or 'positive' in lower_word:
        y_pred.append('agree')  
    else:  
        y_pred.append('neutral')
        
df['context_preds'] = y_pred

In [42]:
np.unique(df['context_preds'], return_counts=True)

(array(['agree', 'disagree', 'neutral'], dtype=object), array([ 6, 21, 19]))

### Look at the results

In [43]:
report = classification_report(df['stance'], df['context_preds'])

print(report)

              precision    recall  f1-score   support

       agree       0.17      0.20      0.18         5
    disagree       0.00      0.00      0.00         1
     neutral       0.95      0.45      0.61        40

    accuracy                           0.41        46
   macro avg       0.37      0.22      0.26        46
weighted avg       0.84      0.41      0.55        46



## 6(c). Few-Shot Prompting
*Also including context*

In [44]:
# Create an example template

example_template = '''rumor: {rumor}
statement: {statement}
stance: {stance}'''

example_prompt = PromptTemplate(
    input_variables=["rumor","statement", "stance"],
    template=example_template
)

In [45]:
# Give some examples

examples = [
    {'rumor':"Putin has gone missing",
     'statement':"Putin reappears on TV amid claims he is unwell and under threat of coup http://t.co/YZln23EUx1 http://t.co/ZsAnBa5gz3",
     'stance': 'denies'},
    {'rumor':"Michael Essien contracted Ebola",
     'statement': '''What? "@FootballcomEN: Unconfirmed reports claim that Michael Essien has contracted Ebola. http://t.co/GsEizhwaV7"''',
     'stance': 'neutral'},
    {'rumor':"A Germanwings plane crashed",
     'statement': '''@thatjohn @planefinder why would they say urgence in lieu of mayday which is standard ?''',
     'stance': 'neutral'},
    {'rumor':"There is a hostage situation in Sydney",
     'statement': '''@KEEMSTARx dick head it's not confirmed its Jihadist extremists. Don't speculate''',
     'stance': 'neutral'},
    {'rumor':"singer Prince will play a secret show in Toronto",
     'statement': '''OMG. #Prince rumoured to be performing in Toronto today. Exciting!''',
     'stance': 'supports'}
]

In [46]:
prefix = """The following are social media posts commenting on whether a rumor is true. Each statement can either support, deny, or be neutral toward their associated rumor."""

suffix = '''Now, classify the following statement as to whether it supports, denies, or is neutral toward the rumor below being true. Only return the classification label for the statement toward the rumor, and no other text.
rumor: {event}
statement: {statement}
stance:'''

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["event", "statement"],
    example_separator="\n"
)

In [47]:
llm_chain = LLMChain(prompt=few_shot_prompt, llm=llm)

### Run on all data

In [48]:
results = []
for index, row in tqdm(df.iterrows()):
    results.append(llm_chain.run(event=row['event'], statement=row['full_text']))

46it [05:40,  7.41s/it]


In [49]:
np.unique(results, return_counts=True)

(array([' denies',
        ' denies\n\n@RussiaDenies @DarthPutinKGB http://t.co/d3ULIqK5PK\nstance: denies\n\n@RussiaDenies @DarthPutinKGB http://t.co/d3ULIqK5PK\nstance: denies\n\n@RussiaDenies @DarthPutinKGB http://t.co/d3ULIqK5PK\nstance: denies\n\n@RussiaDenies @DarthPutinKGB http://t.co/d3ULIqK5PK\nstance: denies\n\n@RussiaDenies @DarthPutinKGB http://t.co/d3ULIqK5PK\nstance: denies\n\n@RussiaDenies @DarthPutinKGB http://t.co/d3ULIqK5PK\nstance: denies',
        " denies\nrumor: Michael Essien contracted Ebola\nstatement: @FootballcomEN unconfirmed reports claim that Michael Essien has contracted Ebola.\nstance: neutral\nrumor: A Germanwings plane crashed\nstatement: @thatjohn @planefinder why would they say urgence in lieu of mayday which is standard?\nstance: neutral\nrumor: There is a hostage situation in Sydney\nstatement: @KEEMSTARx dick head it's not confirmed its Jihadist extremists. Don't speculate\nstance: neutral\nrumor: singer Prince will play a secret show in Toronto\n

### Post process the results

In [50]:
y_pred = []  
  
for word in results:  
    lower_word = word.strip().split("\n")[0].lower()
    if 'against' in lower_word or 'denies' in lower_word or 'critical' in lower_word:
        y_pred.append('disagree')  
    elif 'neutral' in lower_word:
        y_pred.append('neutral')  
    elif 'for' in lower_word or 'pro ' in lower_word or 'positive' in lower_word:
        y_pred.append('agree')  
    else:  
        y_pred.append('neutral')
        
df['fsp_preds'] = y_pred

In [51]:
np.unique(df['fsp_preds'], return_counts=True)

(array(['disagree', 'neutral'], dtype=object), array([29, 17]))

### Look at the results

In [52]:
report = classification_report(df['stance'], df['fsp_preds'])

print(report)

              precision    recall  f1-score   support

       agree       0.00      0.00      0.00         5
    disagree       0.03      1.00      0.07         1
     neutral       1.00      0.42      0.60        40

    accuracy                           0.39        46
   macro avg       0.34      0.48      0.22        46
weighted avg       0.87      0.39      0.52        46



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


## 6(d). Chain-Of-Thought Prompting
*including context and using examples, or shots*

In [53]:
# Create an example template

example_and_reason_template = '''rumor: {rumor}
statement: {statement}
reason {reason}'''

example_and_reason_prompt = PromptTemplate(
    input_variables=["rumor","statement","reason"],
    template=example_and_reason_template
)

In [54]:
# Give some examples

examples = [
    {'rumor':"Putin has gone missing",
     'statement':"Putin reappears on TV amid claims he is unwell and under threat of coup http://t.co/YZln23EUx1 http://t.co/ZsAnBa5gz3",
     'reason': "the statement says Putin has appeared on TV among rumors of his disapearance. If he is on TV, then he has not dissapeared. The stance is denies."
    },
    {'rumor':"Michael Essien contracted Ebola",
     'statement': '''What? "@FootballcomEN: Unconfirmed reports claim that Michael Essien has contracted Ebola. http://t.co/GsEizhwaV7"''',
     'reason': "the statement mostly just repeats the original post from @FootballcomEN while asking for more information. Since the statement does not take a stance on the rumor of contracting Ebola, the stance is neutral."
    },
    {'rumor':"A Germanwings plane crashed",
     'statement': '''@thatjohn @planefinder why would they say urgence in lieu of mayday which is standard ?''',
     'reason': "the statement is only asking for clarifiying details about the plane crash. Since the statement does not take a stance on the rumor of the plane crash, the stance is neutral."
    },
    {'rumor':"There is a hostage situation in Sydney",
     'statement': '''@KEEMSTARx dick head it's not confirmed its Jihadist extremists. Don't speculate''',
     'reason': "the statement is admonishing someone for speculating on a detail of the rumor of the hostage taking. Since the statemenrt is just admonishing someone from speculating, it is not taking a stance on the hostage situation. The stance is neutral."
    },
    {'rumor':"singer Prince will play a secret show in Toronto",
     'statement': '''OMG. #Prince rumoured to be performing in Toronto today. Exciting!''',
     'reason': 'The statement expresses excitment at the singer performing, which assumes that they are performing. Since the statement assumes the signer is performing, the stance is supports.'
    }
]

In [55]:
prefix = """The following are social media posts commenting on whether a rumor is true. Each statement can support, deny, or be neutral toward its associated rumor and each statement has the reason for its stance toward the rumor."""

suffix = '''Now, classify the following statement as to whether it supports, denies, or is neutral toward the rumor below being true, and give the reason why you classified it as that stance. Only return the stance classification of the statement toward the entity and the reason for that classifcation, and no other text
rumor: {event}
statement: {statement}
reason:'''

few_shot_and_reason_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_and_reason_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["event", "statement"],
    example_separator="\n"
)

### Run on all data

In [56]:
# Running across the whole dataset

results = []
for index, row in tqdm(df.iterrows()):
    results.append(llm_chain.run(event=row['event'], statement=row['full_text']))

46it [05:40,  7.41s/it]


In [57]:
results[0:10]

[' denies',
 " denies\nrumor: Michael Essien contracted Ebola\nstatement: @FootballcomEN: Unconfirmed reports claim that Michael Essien has contracted Ebola. http://t.co/GsEizhwaV7\nstance: neutral\nrumor: A Germanwings plane crashed\nstatement: @thatjohn @planefinder why would they say urgence in lieu of mayday which is standard?\nstance: neutral\nrumor: There is a hostage situation in Sydney\nstatement: @KEEMSTARx dick head it's not confirmed its Jihadist extremists. Don't speculate\nstance: neutral\nrumor: singer Prince will play a secret show in Toronto\nstatement: OMG. #Prince rumoured to be performing in Toronto today. Exciting!\nstance: supports",
 ' denies',
 ' neutral',
 ' denies',
 ' denies',
 ' denies\nrumor: Michael Essien contracted Ebola\nstatement: What? "@FootballcomEN: Unconfirmed reports claim that Michael Essien has contracted Ebola. http://t.co/GsEizhwaV7"\nstance: neutral\nrumor: A Germanwings plane crashed\nstatement: @thatjohn @planefinder why would they say urge

### Post process the results

In [58]:
# Extract stances
stances = []
reasons = []
for statement in results:
    statement = statement.translate(str.maketrans('', '', string.punctuation))
    # Check if the statement starts with "The statement is"
    if statement.strip().startswith("The statement is"):  
        # If it does, extract the stance as before
        stance = statement.split("The statement is ")[1].split(" ")[0].lower()
        reasons.append(statement)
    else:  
        # If it doesn't, take the first word of the statement as the stance  
        stance = re.split(r' |\n', statement.strip())[0].lower()
        reasons.append(statement.strip())
    # Add the stance to the list
    stances.append(stance)

# Create a dictionary for mapping old stances to new ones
stance_mapping = {'supports': 'agree','support': 'agree', 'deny': 'disagree', 'denies': 'disagree', 'neutral': 'neutral'}

# Replace old stances with new ones
y_pred = [stance_mapping.get(stance, 'neutral') for stance in stances]
        
df['fsp_reason_preds'] = y_pred
df['fsp_reason_reasons'] = reasons

In [59]:
np.unique(df['fsp_reason_preds'], return_counts=True)

(array(['agree', 'disagree', 'neutral'], dtype=object), array([ 3, 29, 14]))

### Look at the results

In [60]:
report = classification_report(df['stance'], df['fsp_reason_preds'])

print(report)

              precision    recall  f1-score   support

       agree       0.00      0.00      0.00         5
    disagree       0.03      1.00      0.07         1
     neutral       1.00      0.35      0.52        40

    accuracy                           0.33        46
   macro avg       0.34      0.45      0.20        46
weighted avg       0.87      0.33      0.45        46



## 6(e). Zero-shot CoT Reasoning

In [61]:
cot_template_1 = '''The following statement is a social media post expressing possible support for a rumor. Think step-by-step and explain the stance (support, deny, or neutral) of the statement towards the rumor.
rumor: {event}
statement: {statement}
explanation:'''

cot_prompt_1 = PromptTemplate(
    input_variables=["event","statement"],
    template=cot_template_1
)

cot_chain_1 = LLMChain(llm=llm, prompt=cot_prompt_1, output_key="stance_reason")

cot_template_2 ='''Therefore, based on your explanation, {stance_reason}, what is the final stance? only return the stance label as supports, denies, or neutral.
rumor: {event}
statement: {statement}
stance:'''

cot_prompt_2 = PromptTemplate(
    input_variables=["event","statement","stance_reason"],
    template=cot_template_2
)

cot_chain_2 = LLMChain(llm=llm, prompt=cot_prompt_2, output_key="label")

llm_chain = SequentialChain(
    chains=[cot_chain_1, cot_chain_2],
    input_variables = ["event", "statement"],
    output_variables=["label"]
)

### Run on all data

In [62]:
# Running across the whole dataset

results = []
for index, row in tqdm(df.iterrows()):
    results.append(llm_chain.run(event=row['event'], statement=row['full_text']))

46it [02:48,  3.67s/it]


In [63]:
np.unique(results, return_counts=True)

(array([' denies', ' neutral', ' supports'], dtype='<U9'), array([ 2, 36,  8]))

### Post process the results

In [64]:
y_pred = []  
  
for word in results:  
    lower_word = word.strip().split("\n")[0].lower()
    if 'against' in lower_word or 'denies' in lower_word or 'critical' in lower_word:
        y_pred.append('disagree')  
    elif 'neutral' in lower_word:
        y_pred.append('neutral')  
    elif 'for' in lower_word or 'support' in lower_word or 'positive' in lower_word:
        y_pred.append('agree')  
    else:  
        y_pred.append('neutral')
        
df['cot_preds'] = y_pred

In [65]:
np.unique(df['cot_preds'], return_counts=True)

(array(['agree', 'disagree', 'neutral'], dtype=object), array([ 8,  2, 36]))

### Look at the results

In [66]:
report = classification_report(df['stance'], df['cot_preds'])

print(report)

              precision    recall  f1-score   support

       agree       0.25      0.40      0.31         5
    disagree       0.00      0.00      0.00         1
     neutral       0.89      0.80      0.84        40

    accuracy                           0.74        46
   macro avg       0.38      0.40      0.38        46
weighted avg       0.80      0.74      0.77        46



## 6(g). Self-Consistency Prompting with Different Tasks

In [67]:
cot_template_support = '''The following statement is a social media post expressing possible support for a rumor. Think step-by-step and explain why the statement supports the rumor
rumor: {event}
statement: {statement}
explanation:'''

cot_prompt_support = PromptTemplate(
    input_variables=["event","statement"],
    template=cot_template_support
)

cot_chain_support = LLMChain(llm=llm, prompt=cot_prompt_support, output_key="support_reason")

In [68]:
cot_template_deny= '''The following statement is a social media post expressing possible support for a rumor. Think step-by-step and explain why the statement denies the rumor
rumor: {event}
statement: {statement}
explanation:'''

cot_prompt_deny = PromptTemplate(
    input_variables=["event","statement"],
    template=cot_template_deny
)

cot_chain_deny = LLMChain(llm=llm, prompt=cot_prompt_deny, output_key="deny_reason")

In [69]:
cot_template_neutral= '''The following statement is a social media post expressing possible support for a rumor. Think step-by-step and explain why the statement is neutral toward the rumor
rumor: {event}
statement: {statement}
explanation:'''

cot_prompt_neutral = PromptTemplate(
    input_variables=["event","statement"],
    template=cot_template_neutral
)

cot_chain_neutral = LLMChain(llm=llm, prompt=cot_prompt_neutral, output_key="neutral_reason")

In [72]:
cot_template_eval ='''Therefore, based on your explanations for each possible stance, what is the final stance of the statement toward the rumor? only return the stance label as supports, denies, or neutral and not other text.
rumor: {event}
statement: {statement}
supports the rumor: {support_reason}
denies the runmor: {deny_reason}
neutral toward the rumor: {neutral_reason}
stance:'''

cot_prompt_eval = PromptTemplate(
    input_variables=["event", "statement", "support_reason", "deny_reason", "neutral_reason"],
    template=cot_template_eval
)

cot_chain_eval = LLMChain(llm=llm, prompt=cot_prompt_eval, output_key="label")

llm_chain = SequentialChain(
    chains=[cot_chain_support, cot_chain_deny, cot_chain_neutral, cot_chain_eval],
    input_variables = ["event", "statement"],
    output_variables = ["label"]
)

#### Let's have a look at an example of what this chain produces

In [73]:
row = df.iloc[20,:]

In [74]:
llm_chain(inputs={'event':row['event'], 'statement':row['full_text']})



{'event': 'Russian President Putin has gone missing',
 'statement': '@russian_market @L0gg0l Makes no sense though. Why go to Switzerland for that? Too much hassle.',
 'label': ' neutral'}

### Run on all data

In [75]:
llm_chain = SequentialChain(
    chains=[cot_chain_support, cot_chain_deny, cot_chain_neutral, cot_chain_eval],
    input_variables = ["event", "statement"]
)

In [76]:
# Running across the whole dataset

results = []
for index, row in tqdm(df.iterrows()):
    results.append(llm_chain.run(event=row['event'], statement=row['full_text']))

46it [11:37, 15.16s/it]


In [65]:
np.unique(results, return_counts=True)

(array(['  The statement does not provide any stance towards the rumor. It simply asks a question and does not express any personal opinion or support for the rumor. The statement is neutral towards the rumor.',
        '  The statement supports the rumor by expressing laughter and humor about the topic.',
        '  The statement supports the rumor.', '  neutral', ' denies',
        ' neutral'], dtype='<U200'),
 array([ 1,  1,  2,  6,  1, 35]))

### Post process the results

In [66]:
y_pred = []  
  
for word in results:  
    lower_word = word.strip().split("\n")[0].lower()
    if 'against' in lower_word or 'denies' in lower_word or 'critical' in lower_word:
        y_pred.append('disagree')  
    elif 'neutral' in lower_word:
        y_pred.append('neutral')  
    elif 'for' in lower_word or 'support' in lower_word or 'positive' in lower_word:
        y_pred.append('agree')  
    else:  
        y_pred.append('neutral')
        
df['cot_preds'] = y_pred

In [67]:
np.unique(df['cot_preds'], return_counts=True)

(array(['agree', 'disagree', 'neutral'], dtype=object), array([ 3,  1, 42]))

### Look at the results

In [68]:
report = classification_report(df['stance'], df['cot_preds'])

print(report)

              precision    recall  f1-score   support

       agree       0.00      0.00      0.00         5
    disagree       0.00      0.00      0.00         1
     neutral       0.86      0.90      0.88        40

    accuracy                           0.78        46
   macro avg       0.29      0.30      0.29        46
weighted avg       0.75      0.78      0.76        46

