# Generierung von Gegenreden zu Verschwörungstheorien mit Mistral-7B-Instruct-v0.3

Dieses Notebook demonstriert die Verwendung des Mistral-7B-Instruct-v0.3 Sprachmodells zur Generierung von Gegenreden für einen Datensatz mit 100 verschwörungstheoretischen Kommentaren. Das Skript ist darauf ausgelegt, auf einer GPU ausgeführt zu werden, um die hohen Rechenanforderungen effizient zu bewältigen. Durch den Einsatz des Mistral-Modells wird jeder Kommentar verarbeitet, um eine prägnante und effektive Gegenrede zu erzeugen, die spezifischen Richtlinien folgt, wie etwa Empathie, Faktentreue und Klarheit. Der resultierende Datensatz enthält sowohl die ursprünglichen Verschwörungstheorie-Kommentare als auch die generierten Gegenreden, was wertvolle Einblicke für weiterführende Analysen und Studien bietet.


In [None]:
import pandas as pd
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load dataset with conspiracy theory comments
ct_dataset = pd.read_csv('../data/qanon_deepstate_comments.csv')

# Initialize model and tokenizer 
model_id = "mistralai/Mistral-7B-Instruct-v0.3"
device = "cuda"  # the device to load the model onto

tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side='left')
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16
).to(device)

# Function to generate counter speech
def generate_counter_speech(comment):
    
    # Defining the system and user messages
    system_prompt = """You are a trained expert in generating counter speech to conspiracy theory comments.
             Follow these response guidelines: 
                1. Show empathy and positivity in your response.
                2. Do not state 'this is a conspiracy theory' directly.
                3. Use narrative storytelling, including a first-person perspective, detailed accounts of characters' internal lives, metaphors and figurative language.  Include a relatable protagonist (well-known figures only) or credible real-life examples to illustrate your point. 
                4. Ensure clarity in your argumentation with defined objectives.
                5. Challenge the statement and refute it with specific facts from reliable sources. If appropriate, ask for sources or factual basis.
                6. Maintain a respectful and calm tone throughout your response. Be cautious with sarcasm, humor, parody, and satire.
                7. Always respond concisely, directly, and clearly. Limit your response to 800 characters. 
                """
    user_prompt = f"Generate counter speech to the following conspiracy theory comment: {comment}."
    prefix = "Very concise and short counter speech that uses less than 200 tokens:"
    
    # Formats the messages according to Mistral's requirements
    messages = [
        {"role": "user", "content": f"{system_prompt} {user_prompt}"},
        {"role": "assistant", "content": prefix, "prefix": True}
    ]
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
    model_inputs = inputs.to(device)
    
    generated_ids = model.generate(
        model_inputs, 
        max_new_tokens=350, 
        do_sample=True,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
    )
    outputs = generated_ids[:, model_inputs.shape[-1]:]  # Cut off the original input length
    response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    return response[0]
    
# Generate counter speech for every comment and save in a new column
ct_dataset['counter_speech_mistral'] = ct_dataset['comment_text'].apply(generate_counter_speech)

# Save updated dataset
ct_dataset.to_csv('../data/counterspeech_dataset_mistral.csv', index=False)

# Clean up resources to prevent memory leaks
del model, tokenizer
torch.cuda.empty_cache()