In [2]:
import pandas as pd

##  Datentransformation

#### Zusammenführung der Datensätze

Nachdem drei individuelle Datensätze mit verschwörungstheoretischen Kommentaren und den entsprechenden Gegenreden generiert wurden, erfolgt im nächsten Schritt die Konsolidierung dieser Datensätze. Ziel ist es, einen einheitlichen Datensatz zu erstellen. Das folgende Skript speichert die Gegenreden der drei Modelle zusammen in einer Tabelle ab.

In [5]:
# Load .csv files
df_gpt4 = pd.read_csv('../data/counterspeech_dataset_gpt-4o.csv')
df_llama3 = pd.read_csv('../data/counterspeech_dataset_llama3.csv')
df_mistral = pd.read_csv('../data/counterspeech_dataset_mistral.csv')

# Merge data columns
df_merged = df_gpt4.merge(df_llama3, on=['comment_id', 'tweet_id', 'comment_text'], how='outer')
df_merged = df_merged.merge(df_mistral, on=['comment_id', 'tweet_id', 'comment_text'], how='outer')

# Save results in new file
df_merged.to_csv('../data/counterspeech_responses_all_models.csv', index=False)

# Check results
df_merged.head(15)

Unnamed: 0,comment_id,tweet_id,comment_text,counter_speech_gpt-4o,counter_speech_llama3,counter_speech_mistral
0,1,1782877166998331434,Keep this in mind:\nRussia is destroying democ...,I understand your concerns about safeguarding ...,I understand your concern about the perceived ...,"In the spirit of American resilience, let's re..."
1,2,1733184143885025552,FIFTH NOVEMBER TWENTY TWENTY FOUR DONALD TRUMP...,I understand that the future can bring a lot o...,I understand your concern about the recent eve...,In a world painted with the brushstrokes of hi...
2,3,1783101908938322120,"Just a reminder, #PizzaGate was not made up by...",I understand that topics like this can underst...,"I understand your concern, but let's take a cl...","John Pedestrian, like many high-profile indivi..."
3,4,1782596239302721881,"The same strategy they used to ""debunk"" #QAnon...","I understand the concern about misinformation,...",I understand your concern about the potential ...,"Ah, the intriguing dance of truth and specula..."
4,5,1782476196107911645,958/-- New Prediction Date for World War 3--- ...,I can understand how predictions can capture o...,I understand your concern about the prediction...,"""Imagine, if you will, a world where unity and..."
5,6,1782499781572739564,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,I can see where concerns about world events co...,I understand your concern about the date and t...,"""As a history buff like Neil deGrasse Tyson, I..."
6,7,1781880739644211452,QFS = CBDC = 666 The mark of BEAST financial s...,I understand that discussions around financial...,I understand your concerns about the potential...,"In the grand tapestry of human history, we fin..."
7,8,1782796945582612869,Fact: Brainwashed deep state operatives are ma...,I understand your concerns—it can feel overpow...,I understand your concern about the integrity ...,"""A beacon of democracy, America... Though some..."
8,9,1781568827693965722,"Wake up, sheeple!\nBiden's just a puppet for t...",It's understandable to feel concerned about th...,I understand your concerns about President Bid...,Hello there! It's understandable to feel unset...
9,10,1781737844349817116,Most can't see what's right in front of their ...,"Eight years ago, I, too, embarked on a quest f...",I understand your enthusiasm for the idea of a...,"Hello there, I remember the time when I wasn't..."


#### Auswahl der Stichprobe 
Dieses Python-Skript erstellt einen neuen, kleineren Datensatz, der 30 Verschwörungstheorie-Kommentare und die dazugehörigen Gegenreden enthält. Die Stichprobe wird systematisch gezogen, indem jeder dritte Kommentar ausgewählt wird, beginnend bei der ID 6 bis zur ID 93. 

In [7]:
# Load the combined dataset
df_merged = pd.read_csv('../data/counterspeech_responses_all_models.csv')

# Generate indices starting from 5 and stepping by 3
indices = range(5, 95, 3)  

# Select the rows based on the computed indices
sampled_df = df_merged.iloc[indices]

# Verify the sampled data
sampled_df.head(30)

Unnamed: 0,comment_id,tweet_id,comment_text,counter_speech_gpt-4o,counter_speech_llama3,counter_speech_mistral
5,6,1782499781572739564,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,I can see where concerns about world events co...,I understand your concern about the date and t...,"""As a history buff like Neil deGrasse Tyson, I..."
8,9,1781568827693965722,"Wake up, sheeple!\nBiden's just a puppet for t...",It's understandable to feel concerned about th...,I understand your concerns about President Bid...,Hello there! It's understandable to feel unset...
11,12,1780467566479126845,PATRIOTS TAP IN:\nWe’re doing a nation wide gr...,I appreciate the concern for the well-being of...,"""I understand the enthusiasm, but let's focus ...","In the heart of Washington D.C., Dr. Fauci, a ..."
14,15,1780332562356768852,It’s time to take over the digital billboards...,I understand your passion and concern for the ...,I understand your frustration with the media a...,"Martin Luther King Jr., a beacon of hope and ..."
17,18,1781689670503661784,Biden Crime Family: Another Dynasty of Corrupt...,I understand the frustration that can come wit...,I understand your concerns about the Biden adm...,"""Think about the everyday folks, not the 'Crim..."
20,21,1793570499446124852,#QAnon has recently asserted that Benjamin Net...,I understand how intriguing such narratives ca...,I understand your enthusiasm for the QAnon mov...,"""Imagine Benjamin Netanyahu, a leader with a s..."
23,24,1792917786236174787,Lots going on behind the scenes of $Trump2024 ...,"I get where you're coming from, the sense of a...","""I understand the excitement and anticipation,...",Encouraging times for those eagerly awaiting p...
26,27,1794952977985446366,Do you get paid to shill for vaccines?\nOr are...,"I understand your concern, and it’s important ...","I understand your concerns about vaccines, but...","Dear friend, if I was a 'paid shill' for vacci..."
29,30,1792228285360869479,WE KNOW THE CIA COINED THE TERM “CONSPIRACY TH...,Hey there! I get where you're coming from; it'...,"""I understand the curiosity behind this questi...","""Though the CIA may have popularized the term ..."
32,33,1790012018985971975,We will be cannibalising you on the streets of...,It sounds like you're feeling deeply concerned...,"I understand your concerns and frustrations, b...","""Ah, a vivid vision of the future you paint! A..."


#### Skript um die Gegenreden aus den Spalten in Zeilen zu transformieren
Um den Datensatz für die Evaluierung nutzbar zu machen, müssen die Gegenreden, die aktuell in separaten Spalten vorliegen, in eine Form gebracht werden, in der jede Gegenrede als einzelne Zeile dargestellt wird. 

In [10]:
# Transform the counter_speech columns into rows
melted_data = sampled_df.melt(id_vars=['comment_id', 'comment_text'],
                              value_vars=['counter_speech_gpt-4o', 'counter_speech_llama3', 'counter_speech_mistral'],
                              var_name='model', value_name='counter_speech')

# Sort the data by 'comment_id'
sorted_melted_data = melted_data.sort_values(by='comment_id')

# Display the sorted, melted data
sorted_melted_data.head(20)

Unnamed: 0,comment_id,comment_text,model,counter_speech
0,6,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,counter_speech_gpt-4o,I can see where concerns about world events co...
30,6,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,counter_speech_llama3,I understand your concern about the date and t...
60,6,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,counter_speech_mistral,"""As a history buff like Neil deGrasse Tyson, I..."
31,9,"Wake up, sheeple!\nBiden's just a puppet for t...",counter_speech_llama3,I understand your concerns about President Bid...
61,9,"Wake up, sheeple!\nBiden's just a puppet for t...",counter_speech_mistral,Hello there! It's understandable to feel unset...
1,9,"Wake up, sheeple!\nBiden's just a puppet for t...",counter_speech_gpt-4o,It's understandable to feel concerned about th...
2,12,PATRIOTS TAP IN:\nWe’re doing a nation wide gr...,counter_speech_gpt-4o,I appreciate the concern for the well-being of...
32,12,PATRIOTS TAP IN:\nWe’re doing a nation wide gr...,counter_speech_llama3,"""I understand the enthusiasm, but let's focus ..."
62,12,PATRIOTS TAP IN:\nWe’re doing a nation wide gr...,counter_speech_mistral,"In the heart of Washington D.C., Dr. Fauci, a ..."
3,15,It’s time to take over the digital billboards...,counter_speech_gpt-4o,I understand your passion and concern for the ...


#### Hinzufügen von Bewertungskriterien-Spalten

In diesem Schritt werden neue Spalten zum DataFrame hinzugefügt, um die Bewertungskriterien für die Gegenreden zu erfassen. Diese Spalten sind speziell für die manuelle Evaluierung vorgesehen und umfassen die folgenden Kriterien:

- `empathy_factor`: Bewerten des Empathiefaktors der Antwort.
- `fidelity_to_facts`: Beurteilung der Faktentreue.
- `tone`: Bewertung des Tons der Antwort.
- `clarity_of_argumentation`: Einschätzung der Klarheit der Argumentation.
- `successful_use_of_storytelling`: Bewertung der erfolgreichen Nutzung von Storytelling-Elementen.

Zunächst werden diese Spalten mit dem Wert `NA` initialisiert, was anzeigt, dass sie noch nicht bewertet wurden. Diese Struktur bereitet den DataFrame für die nachfolgende detaillierte Bewertung der einzelnen Gegenreden vor.

In [9]:
# Adding new columns with initial values set to NaN for manual evaluation
sorted_melted_data['empathy_factor'] = pd.NA
sorted_melted_data['fidelity_to_facts'] = pd.NA
sorted_melted_data['tone'] = pd.NA
sorted_melted_data['clarity_of_argumentation'] = pd.NA
sorted_melted_data['successful_use_of_storytelling'] = pd.NA

# Save the updated DataFrame to a CSV file
sorted_melted_data.to_csv('../data/evaluation/counterspeech_90_samples_to_evaluate.csv', index=False)

# Display the updated DataFrame to verify the new columns
sorted_melted_data.head(90)

Unnamed: 0,comment_id,comment_text,model,counter_speech,empathy_factor,fidelity_to_facts,tone,clarity_of_argumentation,successful_use_of_storytelling
0,6,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,counter_speech_gpt-4o,I can see where concerns about world events co...,,,,,
30,6,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,counter_speech_llama3,I understand your concern about the date and t...,,,,,
60,6,FOURTEEN OCTOBER TWENTY TWENTY FOUR BEGIN THIR...,counter_speech_mistral,"""As a history buff like Neil deGrasse Tyson, I...",,,,,
31,9,"Wake up, sheeple!\nBiden's just a puppet for t...",counter_speech_llama3,I understand your concerns about President Bid...,,,,,
61,9,"Wake up, sheeple!\nBiden's just a puppet for t...",counter_speech_mistral,Hello there! It's understandable to feel unset...,,,,,
...,...,...,...,...,...,...,...,...,...
58,90,Creep!\nGo live in Hell where you & your croni...,counter_speech_llama3,"I understand your frustration, but let's focus...",,,,,
28,90,Creep!\nGo live in Hell where you & your croni...,counter_speech_gpt-4o,I understand your frustration and concern abou...,,,,,
29,93,New court document raises serious questions ab...,counter_speech_gpt-4o,I can understand why news like this might rais...,,,,,
59,93,New court document raises serious questions ab...,counter_speech_llama3,I understand your concern about the recent cou...,,,,,
