# LLM Prompt Recovery using pretrained LLM
This notebook explores the possibility of using pre-trained LLMs to recover the prompt used to generate a text.

### References:
- [h2oGPT Perplexity Ranking](https://www.kaggle.com/code/philippsinger/h2ogpt-perplexity-ranking)
- [Prompt Prediction w/ Mixtral/Mistral7B/Gemma/Llama](https://www.kaggle.com/code/aatiffraz/prompt-prediction-w-mixtral-mistral7b-gemma-llama/notebook)

### Change logs:
-[version 6] (LB=0.??) used phi-2
-[version 6] (LB=0.??) used Mistral-8x7b   
-[version 5] (LB=0.??) used Mistral-7b   
-[version 4] (LB=0.61) used Gemma-7b   

In [1]:
# Install bitsandbytes
!pip install --no-index /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/bitsandbytes-0.42.0-py3-none-any.whl --find-links=/kaggle/input/making-wheels-of-necessary-packages-for-hf-llms
!pip install --no-index /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/accelerate-0.27.2-py3-none-any.whl --find-links=/kaggle/input/making-wheels-of-necessary-packages-for-hf-llms
!pip install --no-index /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/transformers-4.38.1-py3-none-any.whl --find-links=/kaggle/input/making-wheels-of-necessary-packages-for-hf-llms
!pip install --no-index /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/optimum-1.17.1-py3-none-any.whl --find-links=/kaggle/input/making-wheels-of-necessary-packages-for-hf-llms

Looking in links: /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms
Processing /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/bitsandbytes-0.42.0-py3-none-any.whl
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.42.0
Looking in links: /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms
Processing /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/accelerate-0.27.2-py3-none-any.whl
accelerate is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
Looking in links: /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms
Processing /kaggle/input/making-wheels-of-necessary-packages-for-hf-llms/transformers-4.38.1-py3-none-any.whl
transformers is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
Looking in links: /kaggle/input/making-wheels-of-necessar

In [2]:
# Install sentence transfomer
!pip install -Uq /kaggle/input/sentence-transformers-2-4-0/sentence_transformers-2.4.0-py3-none-any.whl

In [3]:
import pandas as pd
from string import Template
from pathlib import Path
import numpy as np
import os, random
import torch
from torch import nn
# Transformer
import transformers
from transformers import (pipeline, AutoTokenizer, AutoModelForCausalLM, 
                          BitsAndBytesConfig, AutoConfig)
# For quantization
import bitsandbytes, accelerate, optimum
from tqdm.notebook import tqdm
# For sentence transformer
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

import warnings
warnings.simplefilter("ignore")

2024-03-29 02:32:46.642788: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-29 02:32:46.642890: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-29 02:32:46.782168: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [4]:
import ctypes, gc
libc = ctypes.CDLL("libc.so.6")
# Seed the same seed to all 
def seed_everything(seed=42):
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = True
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    
def clear_memory():
    libc.malloc_trim(0)
    torch.cuda.empty_cache()
    gc.collect()

SEED = 42
seed_everything(SEED)
# Set the GPUs
DEVICE = torch.device("cuda")

# Load the model

Quantization technique is used to reduce memory and computational costs by representing weights and activations with lower-precision data types like 8-bit integers (int8). T

In [5]:
# Comment/Uncomment and use as per wish
def load_model(model_name):
    if model_name == 'gemma-7b':
        MODEL_PATH = "/kaggle/input/gemma/transformers/7b-it/1"
    elif model_name == 'gemma-2b':
        MODEL_PATH = "/kaggle/input/gemma/transformers/2b-it/2"
    elif model_name == 'mistral-7b':
        MODEL_PATH = "/kaggle/input/mistral/pytorch/7b-instruct-v0.1-hf/1"
    elif model_name == 'mistral-8x7b':
        MODEL_PATH = "/kaggle/input/mixtral/pytorch/8x7b-instruct-v0.1-hf/1"
    elif model_name == 'phi-2':
        MODEL_PATH = "/kaggle/input/phi/transformers/2/1"

    # Load the model using quantization techniques for using less memory
    quantization_config = BitsAndBytesConfig(
        load_in_4bit = True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
    )
    
    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, model_max_length=3072)
    tokenizer.padding_side = "left"
    tokenizer.pad_token = tokenizer.eos_token
    # Load the model
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_PATH,
        device_map=DEVICE,
        trust_remote_code=True,
        quantization_config=quantization_config,
    )
    return model, tokenizer

In [6]:
model, tokenizer = load_model(model_name='phi-2')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [7]:
# Perplexity is a metric that measures the quality of language models
# Perplexity is calculated as the exponent of the loss obtained from the model.
class PerplexityNN(nn.Module):
    def __init__(self, reduce: bool = True):
        super().__init__()
        self.loss_fn = nn.CrossEntropyLoss()
        self.reduce = reduce

    def forward(self, logits, labels):
        shift_logits = logits[..., :-1, :].contiguous()
        shift_labels = labels[..., 1:].contiguous()

        perplexity = []
        for i in range(labels.shape[0]):
            perplexity.append(self.loss_fn(shift_logits[i], shift_labels[i]))
        perplexity = torch.stack(perplexity, dim=0) # We use stack instead of exp
        #perplexity = torch.exp(perplexity)
        if self.reduce:
            perplexity = torch.mean(perplexity)
        return perplexity 

In [8]:
rewrite_prompts = [
    """Please improve the following text using the writing style of [insert desired style here],
       maintaining the original meaning but enhancing the clarity, elegance, and impact 
       by altering the tone, diction, and stylistic elements to match the specified style 
       while ensuring the core message remains intact."""
]

In [9]:
# Combine the rewrite prompt with the data (original text, rewritten text) as a prompt
def format_prompt(row, rw_prompt):
    return f"""<start_of_turn>user {rw_prompt} {row["original_text"]}<end_of_turn>
                  <start_of_turn>model{row["rewritten_text"]}<end_of_turn>"""

# recover the promot for given texts (df)
def recover_prompt(df, model, tokenizer):
    perp_nn = PerplexityNN() # Compute the perplexity
    preds = []
    for idx in range(len(df)):
        row = df.iloc[idx]
        perps = []
        with torch.no_grad():
            formatted_prompts = []
            for rw_prompt in rewrite_prompts:
                formatted_prompts.append(format_prompt(row, rw_prompt))
            # Encode prompts to embeddings
            inputs = tokenizer(formatted_prompts, return_tensors="pt",
                               add_special_tokens=False,
                               padding=True, truncation=True).to("cuda")
            # Get the output
            output = model(input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"])
            output = output.logits
            
            labels = inputs["input_ids"]
            labels.masked_fill_(~inputs["attention_mask"].bool(), -100)
            
            # Compute the perplexity of model output (logits) and actual labels
            for j in range(len(rewrite_prompts)):
                p = perp_nn(output[j].unsqueeze(0), labels[j].unsqueeze(0))
                perps.append(p.detach().cpu())
            del inputs, labels, output, p
        # Convert 'perps' as numpy array
        perps = np.array(perps)
        # Get the best output results of the lowest 
        fnal_pred = [np.array(rewrite_prompts)[np.argsort(perps)][0]]
        preds.append(fnal_pred[0])
        clear_memory()
        print(preds)
    return preds

In [10]:
# Load the testing data
# train_df = pd.read_csv('/kaggle/input/llm-prompt-recovery'train.csv', index_col='id')
test_df = pd.read_csv('/kaggle/input/llm-prompt-recovery/test.csv', index_col='id')
test_df["rewrite_prompt"] = "" # Empty
test_df.head()

Unnamed: 0_level_0,original_text,rewritten_text,rewrite_prompt
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-1,The competition dataset comprises text passage...,Here is your shanty: (Verse 1) The text is rew...,


In [11]:
preds = recover_prompt(test_df, model, tokenizer)

['Please improve the following text using the writing style of [insert desired style here],\n       maintaining the original meaning but enhancing the clarity, elegance, and impact \n       by altering the tone, diction, and stylistic elements to match the specified style \n       while ensuring the core message remains intact.']


In [12]:
submission = pd.read_csv('/kaggle/input/llm-prompt-recovery/sample_submission.csv')
submission["rewrite_prompt"] = preds
submission.to_csv('submission.csv', index=False)
display(submission)

Unnamed: 0,id,rewrite_prompt
0,9559194,Please improve the following text using the wr...
