## PAPILLON Tutorial
In this notebook, we will walk through how to set up your own PAPILLON pipeline locally with a GPU server step-by-step.

### What is PAPILLON?
PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles (PAPILLON) is a framework where trusted but weaker models can use untrusted but more powerful models as tools in order to preserve user inference-time privacy.

You can refer to the original paper [here](https://arxiv.org/abs/2410.17127) for how we constructed the benchmark for our task.

![Overview of the PAPILLON pipeline](figs/1.png)

For this tutorial, we will use **GPT-4o-mini** as the untrusted model and **Llama-3.1-8B-Instruct** as the trusted, locally-hosted model.

### Install Dependencies

In [3]:
%pip install dspy-ai==2.5.41 openai pandas sglang[all]

Collecting dspy-ai==2.5.41
  Downloading dspy_ai-2.5.41-py3-none-any.whl.metadata (4.9 kB)
Collecting anthropic>=0.20.0 (from sglang[all])
  Downloading anthropic-0.40.0-py3-none-any.whl.metadata (23 kB)
Collecting decord (from sglang[all])
  Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl.metadata (422 bytes)
Collecting hf_transfer (from sglang[all])
  Downloading hf_transfer-0.1.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.7 kB)
Collecting orjson (from sglang[all])
  Downloading orjson-3.10.12-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (41 kB)
Collecting python-multipart (from sglang[all])
  Downloading python_multipart-0.0.19-py3-none-any.whl.metadata (1.8 kB)
Collecting torchao (from sglang[all])
  Downloading torchao-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting modelscope (from sglang[all])
  Downloading modelscope-1.20.1-py3-none-any.whl.metadata (40 kB)
Downloading dsp

### Launch Llama-3.1-8B-Instruct
We will host this model using SGLang. If you have the model hosted somewhere else, that should also be okay, you can just adjust the `local_lm` variable accordingly in the following sections.

In [None]:
%pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ 

PORT_NUMBER = 7501 # You can change the port number here

!CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --port $PORT_NUMBER --model-path meta-llama/Llama-3.1-8B-Instruct

### Initialize Local LM and Remote LLM
The Local LM would correspond to the trusted (but usually weaker) model. The Local LM should ideally be the only component of the pipeline that manages your private information. The Remote LM would correspond to the untrusted (but usually more potent) model. The goal of the PAPILLON pipeline is to produce high-quality outputs while leaking as little of your private information as possible to the Remote LM.

In [None]:
import dspy

os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

local_lm = dspy.LM('openai/default', api_base=f"http://127.0.0.1:{PORT_NUMBER}/v1", api_key="", max_tokens=4000)
dspy.configure(lm=local_lm)

openai_lm = dspy.OpenAI(model="gpt-4o-mini", max_tokens=4000)

openai_gpt4o = dspy.OpenAI(model="gpt-4o-mini", max_tokens=4000)

### Define the PAPILLON DSPy Module
We will now define the Prompt Creator and Information Aggregator modules according to the diagram earlier in this notebook. 

After defining the module, we can then optimize the prompts for these modules using the MIPRO v2 DSPy prompt optimizer, so that you can keep creating new PAPILLON pipelines for your specific needs.

In [None]:

class CreateOnePrompt(dspy.Signature):
    """
    You are a helpful assistant that is very mindful of user privacy. You have access to a powerful large language model that you can query. Given a user request, create a prompt for your large language model that preserves user privacy, so that this model can help you complete the user request. Provide the prompt directly without any preamble. DO NOT COMPLETE THE USER QUERY, ONLY GENERATE A PROMPT.
    """
    userQuery = dspy.InputField(desc="The user's request to be fulfilled.")
    createdPrompt = dspy.OutputField()

class InfoAggregator(dspy.Signature):
    """
    You are a helpful assistant. Respond to queries from the user.
    """

    userQuery = dspy.InputField(desc="The user's request to be fulfilled.")
    modelExampleResponses = dspy.InputField(desc="You have the following information from a better language model responding to related query or queries. Complete the user query by referencing this information. Only you have access to this information.", format=lambda s: f'======\n\n{s.strip()}\n\n======')
    finalOutput = dspy.OutputField()



class PrivacyOnePrompter(dspy.Module):
    def __init__(self, trusted_model, untrusted_model):
        super().__init__()
        self.prompt_creater = dspy.ChainOfThought(CreateOnePrompt)
        self.info_aggregator = dspy.Predict(InfoAggregator)
        self.trusted_model = trusted_model
        dspy.configure(lm=self.trusted_model)
        self.untrusted_model = untrusted_model
        
    
    def forward(self, user_query):
        try:
            prompt = self.prompt_creater(userQuery=user_query)
        except ValueError:
            return dspy.Prediction(
                prompt="",
                output="",
                gptResponse=""
            )
        try:
            response = self.untrusted_model(prompt.createdPrompt)[0]
        except ValueError:
            return dspy.Prediction(
                prompt="",
                output="",
                gptResponse=""
            )
        try:
            final_output = self.info_aggregator(userQuery=user_query, modelExampleResponses=response)
        except ValueError:
            return dspy.Prediction(
                prompt="",
                output="",
                gptResponse=response
            )
        return dspy.Prediction(
            prompt=prompt.createdPrompt,
            output=final_output.finalOutput,
            gptResponse=response
        )


### Optimize Your PAPILLON Pipeline
#### Let's First Load Some Data

In our paper, we proposed the PUPA benchmark, available both on Huggingface (TBD) and in this repository (`pupa/*.csv`).

The PUPA benchmark contains user-assistant interactions where the user divulges personally identifiable information (PII) in the [WildChat](https://arxiv.org/abs/2405.01470) dataset. We use annotation schemas detailed in the [Trust No Bot](https://arxiv.org/abs/2407.11438) paper to determine whether an interaction contains private information of specific types. PUPA consists of two parts:
1. `PUPA-TNB` is data derived from the original annotations from the Trust No Bot paper.
2. `PUPA-New` is data derived from the rest of the WildChat dataset that is not annotated in the Trust No Bot paper.

We will use PUPA-New to optimize our PAPILLON program and for evaluation before and after the optimization. In our paper, we used PUPA-TNB for standardized evaluation across different models, but we are just using PUPA-New here for demonstration purposes.

In [None]:
# Let's first take a look at some data examples
import pandas

pupa_tnb = pandas.read_csv("pupa/PUPA_TNB.csv")

for i, row in pupa_tnb.iterrows():
    if i > 3:
        break
    print("ORIGINAL USER QUERY:")
    print(row["user_query"])
    print("PII UNITS:")
    print(row["pii_units"])
    print("TARGET RESPONSE FROM PAPILLON:")
    print(row["target_response"])
    print("====")


ORIGINAL USER QUERY:
Rachel Zheng is a clerk in the Administrative Department of H&R Technology. When she returns to the office, she finds a telephone message left by Johnny Bay, a manager of her department. In five days, Johnny will fly with the vice-president to the Univeristy of Westminster in order to attend a meeting. Johnny asks Rachel to book a hotel for them in Westminster. She needs to write an email to Johnny to explain her suggestions before booking. 请将以上句子翻译成中文
PII UNITS:
rachel||johnny||univeristy of westminster||westminster||johnny bay||rachel zheng||h&r technology
TARGET RESPONSE FROM PAPILLON:
Rachel Zheng 是 H&R Technology 行政部门的职员。当她回到办公室时，发现办公室留了一个电话留言，留言是由她所在部门的经理 Johnny Bay 留的。Johnny 将在五天后和副总裁一起飞往威斯敏斯特大学参加会议。Johnny 请求 Rachel 为他们在威斯敏斯特预订酒店。在预订之前，她需要给 Johnny 发送一封邮件解释她的建议。
====
ORIGINAL USER QUERY:
Please describe the job knowledge requirements of the International Advising Assistant position at University; the amount of information and understanding necessary to perfor

In [None]:
# Data Processing function
from dspy import Example

def synthesize_tvt(data_file):
    df = pandas.read_csv(data_file, index_col=False)
    train, val, test = [], [], []
    for i, row in df.iterrows():
        if pandas.isna(row["pii_units"]) or not isinstance(row["pii_units"], str) or len(row["pii_units"]) == 0:
            continue
        new_dp = Example({"target_response": row["target_response"],
                          "user_query": row["user_query"],
                          "pii_str": row["pii_units"]}).with_inputs("user_query")
        if i < 150:
            train.append(new_dp)
        elif 150 <= i < 300:
            val.append(new_dp)
        else:
            test.append(new_dp)
    return train, val, test


The goal of the optimization process is to:
1. Make sure the PAPILLON responses are at least on par with the `target_response`
2. Leak as few `pii_units` to the Remote LLM as possible

#### Define the LLM Judge

In [None]:
class JudgeQuality(dspy.Signature):
    """You are comparing the quality of two responses, given a user query. Is response A at least as good as response B? Respond with yes or no."""

    user_query = dspy.InputField(desc="The user's request to be fulfilled.")
    response_A = dspy.InputField()
    response_B = dspy.InputField()
    output = dspy.OutputField()

class JudgeLeakage(dspy.Signature):
    """Determine whether a piece of information is present in the prompt given. Respond with yes or no."""

    information_piece = dspy.InputField(desc="The piece of information that we wish to check. If this piece of information is a celebrity name, respond with no.")
    prompt = dspy.InputField()
    output = dspy.OutputField()

class JudgePromptQual(dspy.Signature):
    """Determine whether a piece of text is a valid prompt for a language model. Respond with yes or no."""
    prompt_input = dspy.InputField()
    output = dspy.OutputField()


class LLMJudge(dspy.Module):
    def __init__(self):
        super().__init__()
        self.quality_judge = dspy.ChainOfThought(JudgeQuality)
        self.fact_checker = dspy.Predict(JudgeLeakage)
        self.prompt_qual = dspy.ChainOfThought(JudgePromptQual)

    def forward(self, user_query, og_resp, new_resp=None, updated_query=None, pii_str=None):
        if new_resp:
            answer = self.quality_judge(user_query=user_query, response_A=new_resp, response_B=og_resp)
            judgment_1 = int((answer.output.lower()).startswith("yes"))
            answer = self.quality_judge(user_query=user_query, response_A=og_resp, response_B=new_resp)
            judgment_2 = int((answer.output.lower()).startswith("yes"))
            judgment = process_scores(judgment_1, judgment_2)
        else:
            judgment = -1
        if updated_query:
            if isinstance(pii_str, str):
                # The pii_str field must be separated by ||
                all_ppi_pieces = set(pii_str.split("||"))
                ppi_score = 0
                for p in all_ppi_pieces:
                    answer = self.fact_checker(information_piece=p, prompt=updated_query)
                    ppi_score += int((answer.output.lower()).startswith("yes"))
            else:
                ppi_score = -1
            ans = self.prompt_qual(prompt_input=updated_query)
            prompt_score = int((ans.output.lower()).startswith("yes"))
        else:
            ppi_score = -1
            prompt_score = -1
        
        return dspy.Prediction(
            quality=judgment,
            leakage=ppi_score,
            prompt=prompt_score
        )


def process_scores(s1, s2):
    # If the model thinks A > B and B > A (or the other way around but the model is not consistent)
    # We would assume that the two items are equal in quality
    # You can adjust the metric accordingly if needed
    if s1 == s2:
        return 1
    return s1

llm_judge = LLMJudge()

#### Define the Metric
This will guide the prompt optimization process.

In [9]:
def metric(gold, pred, trace=None):
    og_model_output, og_user_query, og_pii = gold.target_response, gold.user_query, gold.pii_str
    pred_prompt, pred_out = pred.prompt, pred.output
    if len(pred_prompt) == 0:
        return 0
    with dspy.context(lm=openai_gpt4o):
        score_dict = llm_judge(user_query=og_user_query, new_resp=pred_out, og_resp=og_model_output,
                                            updated_query=pred_prompt, pii_str=og_pii)       
        final_quality_score = score_dict.quality
        leakage_sc = score_dict.leakage
        prompt_sc = score_dict.prompt
        try:
            assert leakage_sc != -1
        except AssertionError:
            return 0
    # Want to maximize quality and minimize percentage of leakage
    final_total_score = (final_quality_score - leakage_sc / len(set(og_pii.split("||"))) + prompt_sc) / 2
    if trace is not None: return final_total_score >= 1
    return final_total_score


#### Optimize with MIPRO v2

In [None]:
from dspy.evaluate.evaluate import Evaluate
from dspy.teleprompt import MIPROv2
import json

DATA_PATH = "pupa/PUPA_New.csv"
# Where you want to store the optimized prompts
PROMPT_OUTPUT_FILE = "output_prompt.json" 

train, val, test = synthesize_tvt(DATA_PATH)
zeroshot = PrivacyOnePrompter(local_lm, openai_lm)
INCOMPLIANCE = 0
evaluate = Evaluate(metric=metric, devset=val, num_threads=8, display_progress=True, display_table=5, max_errors=100)
try:
    eval_score = evaluate(zeroshot)
except Exception as e:
    INCOMPLIANCE += 1
eval_scores = {}
eval_scores.update({"before_optimization": eval_score})
print(eval_score)

In [None]:
try:
    teleprompter = MIPROv2(prompt_model=openai_lm, task_model=local_lm, metric=metric, num_candidates=10, init_temperature=1.0)
    kwargs = dict(num_threads=8, display_progress=True, display_table=0)
    compiled_prompt_opt = teleprompter.compile(zeroshot, trainset=train, num_batches=200, max_bootstrapped_demos=0, max_labeled_demos=0, eval_kwargs=kwargs)
    compiled_prompt_opt.save(PROMPT_OUTPUT_FILE)
except ValueError as e:
    print(e)
    local_lm.inspect_history()


In [None]:
try:
    eval_score = evaluate(compiled_prompt_opt, devset=val, **kwargs)
    print(eval_score)
    eval_scores.update({"after_optimization": eval_score})
    
except ValueError as e:
    print(e)
    local_lm.inspect_history()

In [None]:
EVAL_FILE = PROMPT_OUTPUT_FILE.replace(".json", "_eval_socres.json")
json.dump(eval_scores, open(EVAL_FILE, "w+"))

### Trying Your Optimized PAPILLON Module

You have finished optimizing your PAPILLON module! Huzzah!! Now you can just load the newly optimized prompt and use it on user queries similar to those in PUPA.

In [None]:
priv_prompt = PrivacyOnePrompter(local_lm, openai_lm)
    
priv_prompt.load(PROMPT_OUTPUT_FILE)

while True:
    user_query = input("Your Query > ")
    pred = priv_prompt(user_query)
    print("PAPILLON PROMPT > ", pred.prompt)
    print("PAPILLON OUTPUT > ", pred.output)
