## In-Context Learning, Chain-of-Thought, Reasoning

<a target="_blank" href="https://colab.research.google.com/github/microsoft/LLMLingua/blob/main/examples/CoT.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

**In-Context Learning (ICL)** is a unique capability of large models, allowing Language Models (LLMs) to quickly learn relevant tasks from a few examples. Generally, ICL is used in combination with the Chain-of-Thought (CoT) approach, which involves describing the reasoning process in detail within the examples to enhance the LLMs' reasoning abilities. For instance, Yao et al.'s Complexity-Based Prompting improved GSM8K performance from 74.9 to 78.85 in GPT-3.5-Turbo-0301. However, this can also lead to increasingly lengthy prompts, such as the GSM8K prompt with a token count of **2,366**.

<center><img width="800" src="../images/LLMLingua_framework.png"></center>

To address this, we propose [**LLMLingua**](https://arxiv.org/abs/2310.05736), that uses a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, to detect the unimportant tokens in the prompt and enable inference with the compressed prompt in black-box LLMs, achieving up to **20x** compression with minimal performance loss.

### GSM8K

Next, we will demonstrate the use of LLMLingua on the GSM8K dataset, which effectively alleviates the "lost in the middle" issue. The original dataset can be found at https://github.com/FranxYao/chain-of-thought-hub/blob/main/gsm8k/lib_prompt/prompt_hardest.txt, which has 2,366 tokens and is an 8-shot setup.

In [1]:
# Install dependency.
!pip install llmlingua datasets

Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.9 -m pip install --upgrade pip[0m


In [3]:
# Download the original prompt and dataset
from datasets import load_dataset

!wget https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt
prompt_complex = open("./prompt_hardest.txt").read()
gsm8k = load_dataset("gsm8k", "main")
gsm8k_test = gsm8k["test"]

--2023-10-30 09:15:31--  https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8464 (8.3K) [text/plain]
Saving to: ‘prompt_hardest.txt’


2023-10-30 09:15:31 (78.8 MB/s) - ‘prompt_hardest.txt’ saved [8464/8464]



In [8]:
# Using the OAI
import openai

openai.api_key = "<insert_openai_key>"

In [4]:
# or Using the AOAI
import openai

openai.api_key = "<insert_openai_key>"
openai.api_base = "https://xxxx.openai.azure.com/"
openai.api_type = "azure"
openai.api_version = "2023-05-15"

### Setup Data

In [8]:
# select an example from GSM8K
question, answer = [gsm8k_test[2][key] for key in ["question", "answer"]]

In [9]:
# Ground-truth Answer
print("Question:", question)
print("Answer:", answer)

Question: Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?
Answer: The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000
He increased the value of the house by 80,000*1.5=<<80000*1.5=120000>>120,000
So the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000
So he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000
#### 70000


#### The response of Original prompt

In [40]:
# The response from original prompt
import json

instruction = "Please reference the following examples to answer the math question,\n"
prompt = instruction + prompt_complex + "\n\nQuestion: " + question

request_data = {
    "prompt": prompt,
    "max_tokens": 400,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
    "stop": "\n\n",
}
response = openai.Completion.create(
    "gpt-3.5-turbo-0301",
    **request_data,
)
print(json.dumps(response, indent=4))

{
    "id": "cmpl-8FZvcX70FH7ck9c9MegWmnUocH0A0",
    "object": "text_completion",
    "created": 1698723720,
    "model": "gpt-35-turbo",
    "choices": [
        {
            "text": " \nLet's think step by step\nThe repairs increased the value of the house by 150% so that means it increased by 80,000*1.5=$<<80000*1.5=120000>>120,000\nSo the total value of the house is 80,000+120,000=$<<80000+120000=200000>>200,000\nHe spent 80,000+50,000=$<<80000+50000=130000>>130,000\nSo he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\nThe answer is 70,000",
            "index": 0,
            "finish_reason": "stop",
            "logprobs": null
        }
    ],
    "usage": {
        "prompt_tokens": 2428,
        "completion_tokens": 142,
        "total_tokens": 2570
    }
}


#### The response of Compressed Prompt

In [12]:
# Setup LLMLingua
from llmlingua import PromptCompressor

llm_lingua = PromptCompressor()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [43]:
# 174 tokens Compression, 13.6x
compressed_prompt = llm_lingua.compress_prompt(
    prompt_complex.split("\n\n"),
    instruction="",
    question="",
    target_token=200,
    context_budget="*1.5",
    iterative_size=100,
)

instruction = "Please reference the following examples to answer the math question,\n"
prompt = (
    instruction + compressed_prompt["compressed_prompt"] + "\n\nQuestion: " + question
)

request_data = {
    "prompt": prompt,
    "max_tokens": 400,
    "temperature": 0,
    "top_p": 1,
    "n": 1,
    "stream": False,
    "stop": "\r\n",
}
response = openai.Completion.create(
    "gpt-3.5-turbo-0301",
    **request_data,
)
print(json.dumps(compressed_prompt, indent=4))
print("Response:", response)

{
    "compressed_prompt": "Question: Sam bought a dozen boxes, each with 30 highlighter pens inside, for $10 each. He reanged five of boxes into packages of sixlters each and sold them $3 per. He sold the rest theters separately at the of three pens $2. How much did make in total, dollars?\nLets think step step\nSam bought 1 boxes x00 oflters.\nHe bought 12 00ters in total\nSam then took5 boxes 6ters0ters\nHe sold these boxes for 5 *5\nAfterelling these  boxes there were 30330ters remaining\nese form 330 /30 of three\n sold each for2 each, so made * =0 from\n total, he0 $15\nSince his original1 he earned $120 = $115 in profit.\nThe answer is 115",
    "origin_tokens": 2365,
    "compressed_tokens": 174,
    "ratio": "13.6x",
    "saving": ", Saving $0.1 in GPT-4."
}
Response: {
  "id": "cmpl-8FZwYp1QIwiQs6pEhy2cRK6wnLnAO",
  "object": "text_completion",
  "created": 1698723778,
  "model": "gpt-35-turbo",
  "choices": [
    {
      "text": " \n\nThe repairs increased the value of the h

### Test in GSM8K test set

In [44]:
import re


def extract_ans(ans_model):
    ans_model = ans_model.split("\n")
    ans = []
    residual = []
    for li, al in enumerate(ans_model):
        ans.append(al)
        if "answer is" in al:
            break
    residual = list(ans_model[li + 1 :])
    ans = "\n".join(ans)
    residual = "\n".join(residual)
    return ans, residual


def parse_pred_ans(filename):
    with open(filename) as fd:
        lines = fd.readlines()
    am, a = None, None
    num_q, acc = 0, 0
    current_mode = "none"
    questions = []
    ans_pred = []
    ans_gold = []
    for l in lines:
        l = l.replace(",", "")
        if l.startswith("Q: "):
            if am is not None and a is not None:
                questions.append(q)
                ans_pred.append(am)
                ans_gold.append(a)
                if test_answer(am, a):
                    acc += 1
            current_mode = "q"
            q = l
            num_q += 1
        elif l.startswith("A_model:"):
            current_mode = "am"
            am = l
        elif l.startswith("A:"):
            current_mode = "a"
            a = l
        else:
            if current_mode == "q":
                q += l
            elif current_mode == "am":
                am += l
            elif current_mode == "a":
                a += l
            else:
                raise ValueError(current_mode)

    questions.append(q)
    ans_pred.append(am)
    ans_gold.append(a)
    if test_answer(am, a):
        acc += 1
    print("num_q %d correct %d ratio %.4f" % (num_q, acc, float(acc / num_q)))
    return questions, ans_pred, ans_gold


def get_result(text: str):
    pattern = "\d*\.?\d+"
    res = re.findall(pattern, text)
    return res[-1] if res else ""


def test_answer(pred_str, ans_str):
    pred, gold = get_result(pred_str), get_result(ans_str)
    return pred == gold

In [66]:
# Test in GSM8K test set
from tqdm import tqdm
import os

os.makedirs("outputs", exist_ok=True)
i = 0

compressed_prompt = llm_lingua.compress_prompt(
    prompt_complex.split("\n\n"),
    instruction="",
    question="",
    target_token=200,
    context_budget="*1.5",
    iterative_size=100,
)

for q, a in tqdm(
    zip(gsm8k_test["question"], gsm8k_test["answer"]), total=len(gsm8k_test["question"])
):
    instruction = (
        "Please reference the following examples to answer the math question,\n"
    )
    prompt = (
        instruction
        + compressed_prompt["compressed_prompt"]
        + "\n\nQuestion: "
        + q
        + "\n"
    )

    request_data = {
        "prompt": prompt,
        "max_tokens": 400,
        "temperature": 0,
        "top_p": 1,
        "n": 1,
        "stream": False,
    }
    response = openai.Completion.create(
        "gpt-3.5-turbo-0301",
        **request_data,
    )
    ans_model = response["choices"][0]["text"]
    ans_, residual = extract_ans(ans_model)
    with open("outputs/test_gpt_3.5_turbo_LLMLingua_174.txt", "a") as fd:
        fd.write(
            "Q: %s\nA_model:\n%s\nA:\n%s\n\n"
            % (q, ans_.replace("Q:", "").replace("A:", ""), a)
        )
    i += 1

100%|██████████| 1319/1319 [47:55<00:00,  2.18s/it] 


In [67]:
_ = parse_pred_ans("outputs/test_gpt_3.5_turbo_LLMLingua_174.txt")

num_q 1319 correct 1032 ratio 0.7824
