# Reverse Engineering Prompts with OpenAI API

**Authors:** Matías Arévalo, Pilar Guerrero, Moritz Goebbels, Tomás Lock, Allan Stalker  
**Date:** January – May 2025  

## Purpose
Generate prompts from existing spam messages using the OpenAI API, aiming to reverse engineer potential instructions that could have been given to create spam content.

## Import Libraries

In [1]:
import pandas as pd
from openai import OpenAI
from tqdm import tqdm

## Load Dataset

- **File name:** `preprocessed_spam_dataset.csv`  
- **Location:** `data/`

In [None]:
df = pd.read_csv("../../data/preprocessed_spam_dataset.csv")

In [3]:
df.head()

Unnamed: 0,label,message,clean_message
0,ham,"Go until jurong point, crazy.. Available only ...","go until jurong point, crazy.. available only ..."
1,ham,Ok lar... Joking wif u oni...,ok lar... joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,free entry in 2 a wkly comp to win fa cup fina...
3,ham,U dun say so early hor... U c already then say...,u dun say so early hor... u c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro...","nah i don [emoji] t think he goes to usf, he l..."


In [4]:
df.shape

(28479, 3)

## Dataset Processing  
For the purpose of this notebook, we will process the dataset used to include only the following from the original dataset:
- Include only the messages of `spam` label.
- Include only the `clean_message` column.
- Remove the `emoji` placeholder we created to replace emojis.
- Add a new column called `prompt` that will remain empty at first.

The dataset made with these changes will be saved in the a variable called `df_spam`


In [5]:
df_spam = df[df['label'] == 'spam']
df_spam = df_spam.drop('message', axis=1)
df_spam['clean_message'] = df_spam['clean_message'].str.replace(r'\[emoji\]', '', regex=True)
df_spam['prompt'] = ''

With these modifications, we will save the new spam only dataset without any prompts in a csv file to prevent rerunning the previous codes if something happens.

This new file:  
- Saved name: `spam_for_prompting.csv`  
- Location: `data/`

In [None]:
df_spam.to_csv('../../data/spam_for_prompting.csv', index=False)

With the file now saved, we can proceed with the following steps.

## Prompt Generation with OpenAI
For this section, we will use OpenAI's API in order to reverse engineer and find possible prompts that could have been used to create each of the spam messages. In order to be able to run the following codes an **OpenAI API Key** is needed.
For the codes to run properly, the following steps need to be done:  
- If necessary, create an OpenAI account and generate an API key from the OpenAI API keys page. Else, use your API Key.
- Paste your API key into the variable `api_key` in the cell that connects to the OpenAI API. Replace "YOUR-API-KEY-GOES-HERE" with it.

### Create the Function  
**REMEMBER**: Replace `YOUR-API-KEY-GOES-HERE` with **YOUR** API Key

In [None]:
client = OpenAI(api_key="YOUR-API-KEY-GOES-HERE")

In [7]:
tqdm.pandas()

def reverse_engineer_prompt(msg):
    prompt = f"""Given the following spam message, write a generic prompt that could have been used to generate it during language model fine-tuning. For exmaple, 'generate a prompt that targets seniors' or 'generate a prompt for someone that is inetrested in football'. Make sure that the prompt is specific to the message but also that it can reproduce multiple unique messages as well.

Scam message:
"{msg}"

Prompt:"""
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",  # ✅ switch to this model
            messages=[{"role": "user", "content": prompt}],
            temperature=0.8,
            )

        return response.choices[0].message.content.strip()
    except Exception as e:
        print("❌ ERROR:", e)
        return f"ERROR: {e}"

### Testing Function  
Before applying the function to all of the data, we will use it in a sample of 10 rows to check if it works properly.

In [8]:
sample_df = df_spam.head(10).copy()
sample_df['prompt'] = sample_df['clean_message'].progress_apply(reverse_engineer_prompt)
sample_df[['prompt', 'clean_message']].head()

100%|██████████| 10/10 [00:07<00:00,  1.37it/s]


Unnamed: 0,prompt,clean_message
2,Generate a prompt for someone looking to win t...,free entry in 2 a wkly comp to win fa cup fina...
5,Generate a prompt for individuals looking for ...,freemsg hey there darling it s been 3 week s...
8,Generate a prompt for a marketing message targ...,winner!! as a valued network customer you have...
9,Generate a prompt for individuals who have own...,had your mobile num months or more? u r enti...
11,Generate a prompt for a text message promotion...,"six chances to win cash! from num to num ,..."


With this we can see that the function has worked properly, so we can apply it to the whole `df_spam` dataset. This is done in the next step.

### Apply Function to Complete Spam Dataset

In [9]:
final_df = df_spam.copy()
final_df['prompt'] = final_df['clean_message'].progress_apply(reverse_engineer_prompt)
final_df[['prompt', 'clean_message']].head()

100%|██████████| 8049/8049 [1:39:27<00:00,  1.35it/s]


Unnamed: 0,prompt,clean_message
2,Generate a prompt for individuals interested i...,free entry in 2 a wkly comp to win fa cup fina...
5,Generate a prompt targeting individuals lookin...,freemsg hey there darling it s been 3 week s...
8,Generate a prompt targeting customers of a net...,winner!! as a valued network customer you have...
9,Generate a spam message targeting individuals ...,had your mobile num months or more? u r enti...
11,Generate a prompt targeting individuals who ar...,"six chances to win cash! from num to num ,..."


### Save the New File
This new file:  
- Saved name: `final_scam_prompt_dataset.csv`  
- Location: `data/`

In [None]:
final_df.to_csv("../../data/final_scam_prompt_dataset.csv", index=False)