# LLM for stance detection


In this notebook, we will use the Large Language Models to perform stance detection. The task is to classify the stance of a given text towards a given target. And as I mentioned, the LLMs can be used to perform this task with Few-Shot or even Zero-Shot learning. I will provide a Zero-Shot learning example in this notebook.

I will demonstrate how to use the LLMs for stance detection using the `transformers` library. We will use the `mistralai/Mixtral-8x7B-Instruct-v0.1` model for this task. The `transformers` library provides a simple API to use the LLMs for various NLP tasks.

We will use the `AutoModelForCausalLM` class to load the model and the `AutoTokenizer` class to load the tokenizer. Causal language models are the models that can generate the next token given the previous tokens and tokenizer is used to convert the text into tokens that can be fed into the model.


## Load the data


In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix

stance = pd.read_csv("../../data/cleaned_data.csv")

## Load the model and tokenizer


In [2]:
# Load Huggingface Transformers library
# https://huggingface.co/transformers/
# clear jupyter notebook output
from IPython.display import clear_output
from transformers import AutoModelForCausalLM, AutoTokenizer


device = 'cuda'

model_name = "mistralai/Mistral-7B-Instruct-v0.2"  # LLM
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")  # Load tokenizer
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", device_map=device)  # Load model
clear_output()
print(model.device)

cuda:0


## Specify the prompt template


Before we start format the prompt template, we need to understand what the inputs should look like and why we need to format the prompt template like this.

For training a Large Language Model, we need to provide the model with a prompt that contains the input text and the target text. And the model should be able to distinguish between human input and desired output. Therefore, we will roughly see two types of prompt templates: one will only distinguish between the human input and model output and the other will also provide the instruction. The only difference between the two is that the former will treat the instruction as a part of the input text and the latter will treat the instruction as a separate entity.

Based on the [model specification](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) provided by Mistral AI, they used the first type of prompt template. Therefore, we will also follow the same format for the prompt template.

The prompt template should look like this:

```
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
```

Where `<s>` and `</s>` are the special tokens that are used to indicate the start and end of the sequence. `[INST]` and `[/INST]` are the special tokens that are used to indicate the start and end of the instruction. `<input_text>` is the input text that we want to classify.

Luckily, for inference only task (i.e., zero-shot or few-shot learning), we don't need to provide the model with the target text. We only need to provide the model with the input text. And we only need to do single round of inference. For the spacial tokens, the model will automatically add them to the input text.

To reuse the prompt template for different inputs, we will create a `f` string that will take the input text and the target text as input and return the formatted prompt template. If you are not familiar with the `f` string, you can think of it as a string that can take variables as input and return the formatted string. More details can be found [here](https://realpython.com/python-f-strings/).


Let's take an example to understand how the prompt template will look like for the stance detection task.


In [3]:
topic = "Economy is better now than before"
comment = stance.loc[0, "comment"]

In [4]:
print(f"Topic: {topic}\nPost/Comment: {comment}\n")

Topic: Economy is better now than before
Post/Comment: Most youve ever made nominally could still be less than you made 5 years ago in real terms if your* raises since then are not in the 25% range. Inflation Jan 2019 to Jan 2024: 22.53%



In [5]:
# Define the prompt template

prompts = []

for i in range(838):
    prompts.append(f"""
                    Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a reddit post/comment, delimited by triple backticks, concerning whether the {topic}. Please make the following assessment:
                    (1) Determine whether the comment/post discusses the topic of whether the {topic}. If so, please indicate whether the Reddit user who posted the tweet favors, opposes, or is neutral about whether the {topic}.

                    Your response should be formatted as follows and include nothing else: "Stance: [F, O, N]"

                    Here F stands for Favor, O stands for Oppose and N stands for Neutral.

                    Post/Comment: ```{stance.loc[i, 'comment']}```
                   """)


In [6]:
print(f"Prompt: {prompts[836]}\n")

Prompt: 
                    Instruction: You have assumed the role of a human annotator. In this task, you will be presented with a reddit post/comment, delimited by triple backticks, concerning whether the Economy is better now than before. Please make the following assessment:
                    (1) Determine whether the comment/post discusses the topic of whether the Economy is better now than before. If so, please indicate whether the Reddit user who posted the tweet favors, opposes, or is neutral about whether the Economy is better now than before.

                    Your response should be formatted as follows and include nothing else: "Stance: [F, O, N]"

                    Here F stands for Favor, O stands for Oppose and N stands for Neutral.

                    Post/Comment: ```"It is estimated there is more than $6 trillion in money markets currently and most of those accounts are earning close to 5%," Detrick said.


Also plenty of HYSA out there currently:

https://w

### LLM inference


### Tokenize the input text


Since we already have the raw input text, we will need to transform it into the format that the model can understand. We will use the `AutoTokenizer` class to convert the input text into tokens that can be fed into the model. The `AutoTokenizer` class will automatically select the appropriate tokenizer for the model.

For more information on understand how tokenizer works and how to use the `AutoTokenizer` class, you can refer to the [official documentation](https://huggingface.co/transformers/model_doc/auto.html#autotokenizer).


In [7]:
# Tokenize the prompt
inputs_list = []

for prompt in prompts:
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    inputs_list.append(inputs)

print(f"Tokens: {inputs_list[4]['input_ids']}\n")

Tokens: tensor([[    1, 28705,    13,   359,  2287,  3133,  3112, 28747,   995,   506,
         11012,   272,  3905,   302,   264,  2930,   396,  1478,  1028, 28723,
           560,   456,  3638, 28725,   368,   622,   347,  7567,   395,   264,
         22003,  1704, 28748,  9318, 28725,   882,   321,  1345,   486, 22212,
           852, 28707,  5446, 28725, 15942,  3161,   272, 11003, 28724,   349,
          1873,  1055,   821,  1159, 28723,  5919,  1038,   272,  2296, 15081,
         28747,    13,   359,  2287,   325, 28740, 28731,  5158, 21824,  3161,
           272,  4517, 28748,  3192,  3342,   274,   272,  9067,   302,  3161,
           272, 11003, 28724,   349,  1873,  1055,   821,  1159, 28723,  1047,
           579, 28725,  4665, 11634,  3161,   272, 22233,  2188,   693, 10198,
           272,  9394,   299,  7556,   734, 28725,  5793,   274, 28725,   442,
           349, 14214,   684,  3161,   272, 11003, 28724,   349,  1873,  1055,
           821,  1159, 28723,    13,    13, 

In [10]:
# Note: the tokenized prompt can always be decoded back to the original prompt
# Decode the tokenized prompt
# decoded_prompt = tokenizer.decode(inputs['input_ids'][0])
# print(decoded_prompt)

### Feed the input text (tokens) into the model


In [None]:
outputs_list = []
stance['LLM Stance'] = ''

for i in range(837):
    outputs = model.generate(**inputs_list[i], max_new_tokens=20)  # Generate the model output
    # Decode the generated output
    generated_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
    stance.loc[i, 'LLM Stance'] = generated_output[len(prompt):]
    print(i)
    

stance.to_csv("../../data/updated_cleaned_data.csv", index=False)  
    
    

There are several parameters that we can use to control the output of the model. We used the `max_length` parameter to control the maximum length of the output. We alsod use the `return_tensors` parameter to control the output format. We set it to `pt` to get the output in PyTorch tensors format.

Besides these parameters, we can also use the `temperature` parameter to control the randomness of the output. We can use the `top_k` and `top_p` parameters to control the diversity of the output. We can also use the `num_return_sequences` parameter to control the number of output sequences.

There is a great explanation of temperature parameter (which is also the parameter you will use for OpenAI's Models) in the [blog](https://lukesalamone.github.io/posts/what-is-temperature/)

Feel free to play around with these parameters to see how they affect the output of the model.


In [None]:
# Analysing performance measures of the LLM

df = pd.read_csv('../../data/updated_cleaned_data.csv')  

human_labels = df['label'].tolist()
llm_labels = df['LLM Stance'].tolist()

# Calculating metrics
accuracy = accuracy_score(human_labels, llm_labels)
print(f'Accuracy: {accuracy:.2f}')

# Calculating precision, recall, and F1-score for each class
precision, recall, f1, support = precision_recall_fscore_support(human_labels, llm_labels, labels=["F", "O", "N"])
print("Precision per class:")
for label, prec in zip(["F", "O", "N"], precision):
    print(f'{label}: {prec:.2f}')

print("Recall per class:")
for label, rec in zip(["F", "O", "N"], recall):
    print(f'{label}: {rec:.2f}')

print(f'Precision (Weighted): {precision.mean():.2f}')
print(f'Recall (Weighted): {recall.mean():.2f}')
print(f'F1-Score (Weighted): {f1.mean():.2f}')

# Calculating specificity for each class
cm = confusion_matrix(human_labels, llm_labels, labels=["F", "O", "N"])
specificity_scores = []
for i in range(len(cm)):
    true_negatives = sum(np.delete(np.delete(cm, i, 0), i, 1).flatten())
    false_positives = sum(np.delete(cm[i], i))
    specificity = true_negatives / (true_negatives + false_positives)
    specificity_scores.append(specificity)

print(f'Specificity per class (F, O, N): {specificity_scores}')
