# Regenerate the data of all used attacks

This notebook reprocesses all data augmentations/ attacks, that are not based on 'prompting', i.e. based on the text generation via calls to the llm provider in the first place, but rather use the generated direct prompt and postprocess the generated results.
As the original direct_prompt might contain contamination, we use the cleaned data to reprocess these attacks. Some attacks might have minor changes to the original DetectRL code (e.g. by updating to the recent python package or aiming for more efficient and stable generation).
Therefore, besides for Dipper, the human generated text was reprocessed as well to ensure identical augmentations for human and llm generated content.
Attacks that are based on prompting have already been cleaned in previous steps.

Columns to be cleaned here:
 - adversarial_character_human
 - adversarial_character_llm
 - adversarial_word_human
 - adversarial_word_llm 
 - adversarial_character_word_human
 - adversarial_character_word_llm
 - paraphrase_back_translation_human
 - paraphrase_back_translation_llm
 - paraphrase_dipper_human (not reprocessed to save computational resources, as the original code was reused one-to-one for dipper)
 - paraphrase_dipper_llm
 
Columns already cleaned:
 - direct_prompt
 - paraphrase_polish_human
 - paraphrase_polish_llm
 - prompt_few_shot
 - prompt_SICO

The script is organized as follows:
1. Textattack perturbations (character, word and sentence level perturbations)
2. Paraphrase back translation (translates text to german and back to english)
3. Dipper (well known pre-trained Language Model for paraphrasing on word level, i.e. replace some words with synonyms)

# 1. Setup

## 1.1 Imports

In [5]:
from pathlib import Path
import pandas as pd
from warnings import filterwarnings
import os
from tqdm import tqdm
import json
import sys
from typing import Literal
from google.cloud import translate
import nltk
nltk.download('punkt_tab')
filterwarnings("ignore", category=pd.errors.SettingWithCopyWarning)


# === CONFIG ===
BASE_DIR = "../../"
sys.path.append(BASE_DIR)
DETECT_RL_DIR = "../../DetectRL/Data_Generation/"
sys.path.append(DETECT_RL_DIR)
sys.path.append(os.path.join(BASE_DIR, "datasets"))
from src.config import *


# textattack for adversarial attacks;
#   for sentence based attacks either CLAREAugmenter could be used or TextBuggerAugmenter
# from textattack.augmentation.recipes import CLAREAugmenter
from textattack.augmentation import CharSwapAugmenter
from textattack.augmentation import EmbeddingAugmenter
from src.TextAttackTextBugger.text_bugger import TextBuggerAugmenter

from src.general_functions_and_patterns_for_detection import (
    get_info_based_on_input_path,
    CLEANED_FILES_DIR, RESULT_DIR, RECLEANED_FILES_DIR
)

from DetectRL.Data_Generation.DIPPER import DipperParaphraser, spilt_paragraph

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /home/pdingfelder/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [2]:
DEBUG = False
PRINT_RESULTS = False
PRINTING = 0
INTERMEDIATE_RESULTS = f"{RESULT_DIR}/intermediate_results/"

# dipper
DIPPER_INTERMEDIATE_DIR = Path(f"{INTERMEDIATE_RESULTS}/dipper")
os.makedirs(DIPPER_INTERMEDIATE_DIR, exist_ok=True)

# paraphrase back translations
OUTPUT_DIR = Path(f"{INTERMEDIATE_RESULTS}/translations")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
TRANSLATIONS_DIR = Path(f"{BASE_DIR}/results/translations")
TRANSLATIONS_DIR.mkdir(parents=True, exist_ok=True)

## 1.2 Read files

In [6]:
path_writing = f'{RECLEANED_FILES_DIR}/writing_prompt_2800_recleaned.parquet'
path_abstract = f'{RECLEANED_FILES_DIR}/arxiv_2800_recleaned.parquet'
path_review = f'{RECLEANED_FILES_DIR}/yelp_review_2800_recleaned.parquet'
path_xsum = f'{RECLEANED_FILES_DIR}/xsum_2800_recleaned.parquet'

df_writing_cleaned = pd.read_parquet(path_writing)
df_abstract_cleaned = pd.read_parquet(path_abstract).drop(columns=["human_length", "direct_prompt_length"])
df_review_cleaned = pd.read_parquet(path_review)
df_xsum_cleaned = pd.read_parquet(path_xsum)

domain_dfs = {
    "arxiv": df_abstract_cleaned.copy(deep=True),
    "writing_prompt": df_writing_cleaned.copy(deep=True),
    "yelp_review": df_review_cleaned.copy(deep=True),
    "xsum": df_xsum_cleaned.copy(deep=True)
}

In [7]:
domain_dfs["arxiv"].info()

<class 'pandas.core.frame.DataFrame'>
Index: 2800 entries, 1 to 2800
Data columns (total 20 columns):
 #   Column                             Non-Null Count  Dtype 
---  ------                             --------------  ----- 
 0   id                                 2800 non-null   int64 
 1   title                              2800 non-null   object
 2   abstract                           2800 non-null   object
 3   direct_prompt                      2796 non-null   object
 4   llm_type                           2800 non-null   object
 5   domain                             2800 non-null   object
 6   prompt_few_shot                    2793 non-null   object
 7   prompt_SICO                        2798 non-null   object
 8   paraphrase_polish_human            2771 non-null   object
 9   paraphrase_polish_llm              2796 non-null   object
 10  adversarial_character_human        2800 non-null   object
 11  adversarial_character_llm          2800 non-null   object
 12  adversarial

# 2. Text-attacks

- implements character, word and sentence level perturbations for human and llm generated text
- compared to DetectRL character and word level perturbations have been updated to the most recent textattack package
- as the "TextBuggerAugmenter" is not present in the version 3.1 of textattack as a plain data augmentation method, the code used in DetectRL has been copied to ensure the same behavior like in the original dataset. CLAREAugmenter could be used as a recent alternative for sentence level perturbations.
- both human and llm generated text is re-processed to ensure identical data augmentations

In [6]:
class DataGenerationAugmentation:
    """
    A class to augment text data in a DataFrame using character, word,
    and sentence-level perturbation attacks.
    """
    
    def __init__(self, model_clare: str = "distilroberta-base", tokenizer_clare: str = "distilroberta-base"):
        """
        Initializes the augmenters for different attack types.

        Args:
            model_clare (str): Model name or path for the CLARE sentence augmenter.
            tokenizer_clare (str): Tokenizer name or path for the CLARE sentence augmenter.
        """
        # A list of available attack strategies
        self.attacks = ["perturbation_character", "perturbation_word", "perturbation_sent"]
        
        # Initialize augmenters from external library textattack
        self.word_augmenter = EmbeddingAugmenter(transformations_per_example=1) 
        self.character_augmenter = CharSwapAugmenter(transformations_per_example=1)
        # self.sentence_augmenter = CLAREAugmenter(model=model_clare, tokenizer=tokenizer_clare, transformations_per_example=1)
        self.sentence_augmenter = TextBuggerAugmenter(transformations_per_example=1)
        
    def _get_augmenter(self, attack: str):
        """A simple factory method to retrieve the correct augmenter based on the attack type."""
        if attack == "perturbation_character":
            return self.character_augmenter
        elif attack == "perturbation_word":
            return self.word_augmenter
        elif attack == "perturbation_sent":
            return self.sentence_augmenter
        else:
            raise ValueError(f"{attack} is not in perturbation_attacks")
        
    @staticmethod
    def split_text_in_sentences(input_text: str) -> list:
        """Splits a block of text into a list of sentences using NLTK."""
        return nltk.sent_tokenize(input_text)
        
    def augment_text(self, input_text: str, 
                     attack: Literal["perturbation_character", "perturbation_word", "perturbation_sent"] = "perturbation_character") -> str:
        """
        Applies a specified augmentation attack to each sentence in the input text.

        Args:
            input_text (str): The text to augment.
            attack (Literal): The type of attack to perform.

        Returns:
            str: The fully augmented text with sentences rejoined.
        """
        if input_text is None:
            return ""
        if attack not in ["perturbation_character", "perturbation_word", "perturbation_sent"]:
            raise ValueError('Attack has to be "perturbation_character", "perturbation_word" or "perturbation_sent"')
        # First, split the text into sentences to apply augmentation sentence-by-sentence
        split_text = self.split_text_in_sentences(input_text)
        augmenter = self._get_augmenter(attack)
        
        modified_sentences = augmenter.augment_many(split_text)
        modified_sentences = [item[0] for item in modified_sentences]
        # Rejoin the modified sentences into a single string
        output_text: str = ' '.join(modified_sentences)
        return output_text
    
    def execute_perturbation_attacks(self, row: pd.Series, columns_to_use: str|list, attacks: list = None,
                                     output_column_suffix: str = ""):
        """
        Takes a DataFrame row (as a Series) and applies multiple attacks to multiple columns.

        Args:
            row (pd.Series): A single row from a DataFrame.
            columns_to_use (str | list): The name of the column(s) to augment.
            attacks (list, optional): A list of attacks to apply. Defaults to all available attacks.
            output_column_suffix (str, optional): A suffix to add to the generated column names.

        Returns:
            dict: A dictionary where keys are the new column names and values are the augmented texts.
        """
        # Use all default attacks if none are specified
        attacks = self.attacks if attacks is None else attacks
        # Ensure 'columns_to_use' is always a list for consistent processing
        columns_to_use = [columns_to_use] if isinstance(columns_to_use, str) else columns_to_use
        
        # The output dictionary starts with the row's ID for potential merging later
        output = {"id": row["id"]}
        
        # Nested loops to apply each attack to each specified column
        for _column_to_attack in columns_to_use:
            for _attack in attacks:
                # Create a dynamic key for the output dictionary, e.g., "perturbation_word_direct_prompt"
                new_col_name = f"{_attack}_{_column_to_attack}{output_column_suffix}"
                try:
                    output[new_col_name] = self.augment_text(row[_column_to_attack], attack=_attack)
                except Exception as e:
                    raise e
                    # print(e)
                    # output[new_col_name] = "ERROR"
        return output

In [13]:
# Instantiate the class
augmenter = DataGenerationAugmentation()

for _domain, df in domain_dfs.items():
    # Extract the needed key for this domain
    _, _, human_key = get_info_based_on_input_path(_domain)
    
    
    # Make sure to set the tqdm description dynamically
    tqdm.pandas(desc=f"Processing rows for {_domain}")

    # Progress bar per row with domain as description
    results_series = df.progress_apply(
        lambda row: augmenter.execute_perturbation_attacks(
            row,
            columns_to_use=["direct_prompt", human_key],
            
            # due to processing times sentence perturbations have been executed separately
            # attacks=["perturbation_character", "perturbation_word"]
            attacks=["perturbation_sent"]
        ),
        axis=1
    )

    # Convert the resulting series of dictionaries into a new DataFrame
    generated_df = pd.DataFrame(results_series.to_list())
    generated_df.to_parquet(
        f"{INTERMEDIATE_RESULTS}/{_domain}_text_attacks_sentences.parquet"
    )

Processing rows for arxiv:  88%|████████▊ | 2475/2800 [2:45:24<17:49,  3.29s/it]   Building prefix dict from the default dictionary ...
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model from cache /tmp/jieba.cache
Loading model cost 0.714 seconds.
Loading model cost 0.714 seconds.
Prefix dict has been built successfully.
Prefix dict has been built successfully.
Processing rows for arxiv: 100%|██████████| 2800/2800 [3:05:34<00:00,  3.98s/it]  
Processing rows for writing_prompt: 100%|██████████| 2800/2800 [5:57:52<00:00,  7.67s/it]   
Processing rows for yelp_review: 100%|██████████| 2800/2800 [3:34:50<00:00,  4.60s/it]   
Processing rows for xsum: 100%|██████████| 2800/2800 [13:18:43<00:00, 17.12s/it]  


# 3. Paraphrase back translation

- translates the text to german and back to english again
- 'en-US' was chosen, as the API does not allow solely 'en' anymore
- updated the code to use the google TranslationServiceClient-API instead of Google Translate. To re-execute this part it is required to have a TranslationService within the google workspace enabled as well as a authentication method. 
- The data is split in chunks of 10 (or 15 for arxiv) human or llm entries to avoid token limitations of the translation API
- Due to minor changes in the translation setup both human and llm generated text is re-translated.

In [7]:
def translate_text(text: list, project_id="linen-arch-469021-f5", target_language_code: str = "de",
                   source_language_code: str = "en-US"):
    """
    Translates text using a service account for authentication.
    """

    # The client automatically finds and uses the credentials from the
    # GOOGLE_APPLICATION_CREDENTIALS environment variable.
    client = translate.TranslationServiceClient()

    location = "global"
    parent = f"projects/{project_id}/locations/{location}"
    
    response = client.translate_text(
        request={
            "parent": parent,
            "contents": text,
            "mime_type": "text/plain",
            "source_language_code": source_language_code,
            "target_language_code": target_language_code,
        }
    )
    
    response = client.translate_text(
        request={
            "parent": parent,
            "contents": [item.translated_text for item in response.translations],
            "mime_type": "text/plain",
            "source_language_code": target_language_code,
            "target_language_code": source_language_code,
        }
    )

    # Return a list of the translated text strings
    return [item.translated_text for item in response.translations]


def process_and_translate_dataframes(domain_dfs: dict, documents_to_translate: int = 10):
    """
    Iterates through a dictionary of dataframes, translates specified columns
    in batches, and saves the results as pickle files.
    """
    # Iterate over each domain and its corresponding dataframe
    for domain, df in domain_dfs.items():
        print(f"--- Processing domain: {domain} ---")
        _, _, human_column = get_info_based_on_input_path(domain)
        # Process the dataframe in chunks of 100 rows
        for i in tqdm(range(0, len(df), documents_to_translate)):
            batch_df = df.iloc[i:i+documents_to_translate]
            batch_counter = i // documents_to_translate
            
            # print(f"  Translating batch {batch_counter} (rows {i} to {i+documents_to_translate-1})...")
            
            # Define the output file path
            filename = f"{domain}_translate_{batch_counter}.pkl"
            output_path = OUTPUT_DIR / filename
            
            if filename not in os.listdir(OUTPUT_DIR):
                
                # Extract text from the two columns to be translated
                prompts_to_translate = batch_df['direct_prompt'].tolist()
                prompts_to_translate = [" " if item is None else item for item in prompts_to_translate]
                # prompts_to_translate
                humans_to_translate = batch_df[human_column].tolist()
                humans_to_translate = [" " if item is None else item for item in humans_to_translate]
                # print(prompts_to_translate, humans_to_translate)
                # Call the translation API for each list of texts
                translated_prompts = translate_text(text=prompts_to_translate)
                translated_humans = translate_text(text=humans_to_translate)
                
                # Store results in a new dataframe
                result_df = pd.DataFrame({
                    'translated_direct_prompt': translated_prompts,
                    'translated_human': translated_humans
                })
                # Keep the original index to align with the source data
                result_df.index = batch_df.index
                output_path = str(output_path).replace(".pkl", "_v2.pkl")
                # Save the resulting dataframe to a pickle file
                result_df.to_pickle(output_path)
                
            else:
                result_df = pd.read_pickle(output_path)
                # Extract text from the two columns to be translated
                prompts_to_translate = batch_df['direct_prompt'].tolist()
                prompts_to_translate = [" " if item is None else item for item in prompts_to_translate]

                # Call the translation API for each list of texts
                translated_prompts = translate_text(text=prompts_to_translate)
                
                # Store results in a new dataframe
                result_df['translated_direct_prompt'] = translated_prompts
                # Keep the original index to align with the source data
                result_df.index = batch_df.index
                
                output_path = str(output_path).replace(".pkl", "_v2.pkl")
                # print(output_path)
                # Save the resulting dataframe to a pickle file
                result_df.to_pickle(output_path)
            
def combine_translated_text_files(domain_name: str, store_dir: str = OUTPUT_DIR, start_counter: int = 0, end_counter: int = 280,
                                  prefix: str = "_v2"):
    list_all_translations = []
    for _counter in range(start_counter, end_counter):
        output_path = f"{store_dir}/{domain_name}_translate_{_counter}.pkl"
        if prefix:
            output_path = str(output_path).replace(".pkl", "_v2.pkl")
        translated_content = pd.read_pickle(output_path)
        list_all_translations.append(translated_content)
    return list_all_translations
    

In [None]:
process_and_translate_dataframes(domain_dfs)

In [9]:
# combine intermediate files to one DataFrame per domain
for _domain in domain_dfs.keys():
    end_counter = 280   #  187 if _domain == "arxiv" else
    results_translated = pd.concat(combine_translated_text_files(_domain, end_counter=end_counter, prefix="_v2"))
    results_translated.to_parquet(f"{TRANSLATIONS_DIR}/{_domain}_translated_files.parquet")

# 4. Dipper

In [None]:
dp = DipperParaphraser(model="kalpeshk2011/dipper-paraphraser-xxl")

for _domain, df in domain_dfs.items():
    print(_domain)
    _, _, human_key = get_info_based_on_input_path(_domain)

    data = df.copy(deep=True)
    
    human_error_list = []
    llm_error_list = []
    dipper_results = data.copy(deep=True)
    dipper_results["paraphrase_dipper_llm_new"] = None
    counter = 0
    last_stored = 0

    for index, article in tqdm(data.iterrows()):

        abstract = article[human_key]
        direct_prompt = article['direct_prompt']
        
        # dipper for human is not repeated, as we use the same model and parameters
        try:
            # if not (abstract is None or abstract == ""):
            #     prompt, input_text = spilt_paragraph(abstract)
            #     if len(input_text) >= 1024:
            #         input_text = input_text[:1024]
            #         input_text2 = input_text[1024:]
            #     else: input_text2 = None
            #     paraphrase_dipper_human = dp.paraphrase(input_text, lex_diversity=40, order_diversity=40, prefix=prompt,
            #                                             do_sample=False, max_length=1024)
            #     if input_text2 is not None:
            #         paraphrase_dipper_human += dp.paraphrase(input_text2, lex_diversity=40, order_diversity=40, prefix=prompt,
            #                                             do_sample=False, max_length=1024)
            #     if len(paraphrase_dipper_human) == 0:
            #         human_error_list.append(article)
            #         
            #     article['paraphrase_dipper_human'] = paraphrase_dipper_human
            # else:
            #     article['paraphrase_dipper_human'] = None
            #     # time.sleep(random.randint(1, 4))
    
            if not (direct_prompt is None or direct_prompt == ""):
                prompt, input_text = spilt_paragraph(direct_prompt)
                # if len(input_text) >= 1024:
                #     input_text = input_text[:1024]
                #     input_text2 = input_text[1024:]
                # else: input_text2 = None
                
                paraphrase_dipper_llm = dp.paraphrase(input_text, lex_diversity=40, order_diversity=40, prefix=prompt,
                                                        do_sample=False, max_length=2048)
                # if input_text2 is not None:
                #     paraphrase_dipper_llm += dp.paraphrase(input_text2, lex_diversity=40, order_diversity=40, prefix=prompt,
                #                                         do_sample=False, max_length=1024)
                    
            else:
                paraphrase_dipper_llm = None
        except Exception as e:
            print(e)
            paraphrase_dipper_llm, paraphrase_dipper_human = "", ""
                
        if paraphrase_dipper_llm is None:
            llm_error_list.append(article)
        elif len(paraphrase_dipper_llm) == 0:
            llm_error_list.append(article)
        dipper_results.loc[index, "paraphrase_dipper_llm_new"] = paraphrase_dipper_llm
        
        counter += 1
        if counter % 100 == 0:
            dipper_results.iloc[last_stored:counter].to_parquet(f"{DIPPER_INTERMEDIATE_DIR}/dipper_temp_{_domain}_{counter}.parquet")
            last_stored = counter
    
    dipper_results.to_parquet(f"{DIPPER_INTERMEDIATE_DIR}/{_domain}_dipper_final_test_run.parquet")

    print(f"human error number:{len(human_error_list)}")
    print(f"llm error number:{len(llm_error_list)}")

    # Convert list of Series -> DataFrame
    human_error_df = pd.DataFrame(human_error_list)
    llm_error_df   = pd.DataFrame(llm_error_list)
    # save as CSV
    human_error_df.to_parquet(f"{DIPPER_INTERMEDIATE_DIR}/{_domain}_human_error.parquet", index=False)
    llm_error_df.to_parquet(f"{DIPPER_INTERMEDIATE_DIR}/{_domain}_llm_error.parquet", index=False)

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

kalpeshk2011/dipper-paraphraser-xxl model loaded in 1.5297589302062988
arxiv


53it [04:56,  4.42s/it]

In [8]:
for _, item in dipper_results.iloc[2100:2103].iterrows():
    print(item.paraphrase_dipper_llm, "\n")
    print(item.paraphrase_dipper_llm_new, "\n")
    print(item.direct_prompt, "\n\n")

In [15]:
domain_dfs["yelp_review"].paraphrase_dipper_llm.loc[4]

'The menu was a riot of mouth-watering choices, and we were spoiled for choice. We started our culinary journey with the chef’s special appetizer, a fusion of flavors that danced on our tongues. For the main course, we chose their signature dish, and it was beyond our expectations. The chicken was tender and juicy, and the accompanying sauce was tangy and delicious. The presentation alone was a work of art, and showed the care and attention that went into each dish. The portions were generous, and we left the table feeling completely satisfied. What really stood out was the impeccable service we received throughout our meal. The staff was attentive and knowledgeable, and they were able to accommodate our dietary restrictions. It was obvious that they took great pride in making sure that each guest had a memorable experience. To top it all off, the desserts were divine. We indulged in a rich chocolate cake that melted in our mouths, accompanied by a velvety scoop of homemade vanilla ice

In [9]:
for _, item in pd.read_parquet(f"{DIPPER_INTERMEDIATE_DIR}/yelp_review_dipper_final_test_run.parquet").iterrows():
    print(item["direct_prompt"], "\n")
    print(item["paraphrase_dipper_llm"], "\n\n")

I had the misfortune of becoming a patient at Dr. Goldberg's office recently, and it was an experience that left me utterly disappointed and frustrated. From the moment I stepped into the waiting room, I sensed an air of disorganization and chaos. The receptionist seemed overwhelmed and didn't bother to greet me or even acknowledge my presence for several minutes. When I finally got called in, my initial impression of Dr. Goldberg was far from positive. He was dismissive and seemed disinterested in listening to my concerns. It felt as if he was just rushing through the appointment without giving me the attention or care I deserved. Moreover, his diagnostic skills were questionable at best. He made a hasty diagnosis without even considering other possibilities or conducting the necessary tests. I left the office feeling more confused and unsure about my condition than when I arrived. The lack of follow-up from the doctor or his staff only worsened my frustration. Despite repeated attemp

# 5. Store the results

In [50]:
# 1. read the dataframes of the different attacks
# 2. replace the original attack based on the contaminated text with the regenerated one
# 3. save the dataframes in the dict again

for _domain, original_df in domain_dfs.items():
    dipper_df: pd.DataFrame = pd.read_parquet(f"{DIPPER_INTERMEDIATE_DIR}/{_domain}_dipper_final.parquet")
    text_attack_sentences_df: pd.DataFrame = pd.read_parquet(f"{INTERMEDIATE_RESULTS}/{_domain}_text_attacks_sentences.parquet")
    text_attack_word_and_character_df: pd.DataFrame = pd.read_parquet(f"{INTERMEDIATE_RESULTS}/{_domain}_text_attacks.parquet")
    paraphrase_back_translation_df: pd.DataFrame = pd.read_parquet(f"{TRANSLATIONS_DIR}/{_domain}_translated_files.parquet")
    dfs = [dipper_df, text_attack_sentences_df, text_attack_word_and_character_df, paraphrase_back_translation_df]
    dataframe_with_all_attacks = dipper_df.copy(deep=True)

    for _df in dfs:
        # replace all empty phrases with None
        #   the number of Nones should be the same for all dataframes for the human/ llm generated text
        #   --> if something has a None (e.g. due to rejections) for direct prompt, the other attacks have to be empty,
        #   if not, the other attacks must have a value
        _df.replace(["ERROR", "", " "], None, inplace=True)
    
    # Same for paraphrase_back_translation_df
    paraphrase_back_translation_df = paraphrase_back_translation_df.reset_index()
    paraphrase_back_translation_df.columns = ['id' if col == paraphrase_back_translation_df.columns[0] else col 
                                              for col in paraphrase_back_translation_df.columns]
    
    # Ensure 'id' is a column in dipper_df without index-name clash
    dipper_df = dipper_df.reset_index(drop=True)
    dipper_df.drop(columns=["adversarial_character_human", "adversarial_character_llm", "adversarial_word_human", "adversarial_word_llm",
                            "adversarial_character_word_human", "adversarial_character_word_llm", 
                            "paraphrase_back_translation_human", "paraphrase_back_translation_llm",
                            "paraphrase_dipper_llm"
                    ], inplace=True)
    
    # Merge with dipper_df as base, keeping only dipper_df's rows
    merged_df: pd.DataFrame = (
        dipper_df
        .merge(text_attack_sentences_df, on="id", how="left")
        .merge(text_attack_word_and_character_df, on="id", how="left")
        .merge(paraphrase_back_translation_df, on="id", how="left")
    )
      
    _, _, human_key = get_info_based_on_input_path(_domain)
    merged_df.rename(columns={f"perturbation_character_{human_key}": "adversarial_character_human", 
                      "perturbation_character_direct_prompt": "adversarial_character_llm", 
                      f"perturbation_word_{human_key}": "adversarial_word_human", 
                      "perturbation_word_direct_prompt": "adversarial_word_llm",
                      f"perturbation_sent_{human_key}": "adversarial_character_word_human", 
                      "perturbation_sent_direct_prompt": "adversarial_character_word_llm", 
                      "translated_human": "paraphrase_back_translation_human", 
                      "translated_direct_prompt": "paraphrase_back_translation_llm",
                      "paraphrase_dipper_llm_new": "paraphrase_dipper_llm"
                    }, inplace=True)
    
    # Restore original index from 'id'
    merged_df = merged_df.set_index('id')
    # Get column order from original dataframe, dropping 'id'
    target_cols = [col for col in original_df.columns if col != "id"]
    # Keep only columns that exist in merged_df
    target_cols = [col for col in target_cols if col in merged_df.columns]
    # Reorder merged_df
    merged_df = merged_df[target_cols]
    
    # restore the dataframe in the dict of all dataframes
    domain_dfs[_domain] = merged_df.drop_duplicates()

In [53]:
path_writing = f'{CLEANED_FILES_DIR}/writing_prompt_2800_cleaned_final.parquet'
path_abstract = f'{CLEANED_FILES_DIR}/arxiv_2800_cleaned_final.parquet'
path_review = f'{CLEANED_FILES_DIR}/yelp_review_2800_cleaned_final.parquet'
path_xsum = f'{CLEANED_FILES_DIR}/xsum_2800_cleaned_final.parquet'

domain_dfs["writing_prompt"].to_parquet(path_writing)
domain_dfs["arxiv"].to_parquet(path_abstract)
domain_dfs["yelp_review"].to_parquet(path_review)
domain_dfs["xsum"].to_parquet(path_xsum)

In [54]:
for _, _df in domain_dfs.items():
    print(_df.info())

<class 'pandas.core.frame.DataFrame'>
Index: 2800 entries, 1 to 2800
Data columns (total 19 columns):
 #   Column                             Non-Null Count  Dtype 
---  ------                             --------------  ----- 
 0   title                              2800 non-null   object
 1   abstract                           2800 non-null   object
 2   direct_prompt                      2793 non-null   object
 3   llm_type                           2800 non-null   object
 4   domain                             2800 non-null   object
 5   prompt_few_shot                    2793 non-null   object
 6   prompt_SICO                        2798 non-null   object
 7   paraphrase_polish_human            2771 non-null   object
 8   paraphrase_polish_llm              2795 non-null   object
 9   adversarial_character_human        2800 non-null   object
 10  adversarial_character_llm          2793 non-null   object
 11  adversarial_word_human             2800 non-null   object
 12  adversarial