<a href="https://colab.research.google.com/github/nirvanesque/2d-slice-set-networks/blob/master/CometLLM_Prompts.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<p align="center">
    <picture>
        <source alt="cometLLM" media="(prefers-color-scheme: dark)" srcset="https://github.com/comet-ml/comet-llm/raw/main/logo-dark.svg">
        <img alt="cometLLM" src="https://github.com/comet-ml/comet-llm/raw/main/logo.svg">
    </picture>
</p>
<p align="center">
    <a href="https://pypi.org/project/comet-llm">
        <img src="https://img.shields.io/pypi/v/comet-llm" alt="PyPI version"></a>
    <a rel="nofollow" href="https://opensource.org/license/mit/">
        <img alt="GitHub" src="https://img.shields.io/badge/License-MIT-blue.svg"></a>   
    <a href="https://www.comet.com/docs/v2/guides/large-language-models/overview/" rel="nofollow">
        <img src="https://img.shields.io/badge/cometLLM-Docs-blue.svg" alt="cometLLM Documentation"></a>
    <a rel="nofollow" href="https://pepy.tech/project/comet-llm">
        <img style="max-width: 100%;" src="https://static.pepy.tech/badge/comet-llm" alt="Downloads"></a>   
</p>
<p align="center">

CometLLM is a new suite of LLMOps tools designed to help you effortlessly track and visualize your LLM prompts and chains. Use CometLLM to identify effective prompt strategies, streamline your troubleshooting, and ensure reproducible workflows.  

CometLLM complements Comet experiment tracking and production model management tools to arm LLM practitioners with everything they need to interact with, manage, and optimize their models with ease.  

👉 The best part? [It's 100% free to get started!](https://www.comet.com/signup/?utm_source=comet_llm&utm_medium=referral&utm_content=intro_colab)

__________
This guide will cover some of the basic features for logging prompts to Comet LLM.

For a preview of what's possible with CometLLM, head over to one of our example projects in the [public Comet workspace](https://www.comet.com/signup/?utm_source=comet_llm&utm_medium=referral&utm_content=intro_colab)!

# 🚧 Setup

In [None]:
%pip install -q comet_llm torch torchdata transformers datasets

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m31.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.3/519.3 kB[0m [31m38.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.7/559.7 kB[0m [31m47.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m28.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m65.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m67.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━

If you don't already have a Comet account, [create one here for free](https://www.comet.com/signup/?utm_source=comet_llm&utm_medium=referral&utm_content=intro_colab) and grab your API credentials.

In [None]:
import comet_llm
COMET_API_KEY = <"YOUR-COMET-API-KEY">
#comet_llm.init(project="comet-example-llmops")

[1;38;5;39mCOMET INFO:[0m Valid Comet API Key saved in /root/.comet.config (set COMET_CONFIG to change where it is saved).


In [None]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer
from transformers import GenerationConfig
import time

Now we're ready to start logging prompts to Comet! Let's do a quick test:

In [None]:
comet_llm.log_prompt(
    api_key = COMET_API_KEY,
    prompt = "What is this conversation about?",
    output = "A customer wants to return a purchase."
)

LLMResult(id='4c8a4e841128495393fbc8f8aa460c37', project_url='https://www.comet.com/anmorgan24/cometllm-prompt1')

It's really that simple! To check out your logged prompt in the Comet UI, click on the link above.

In most real-world scenarios, however, we'll want to log a lot more information than just the input and output. In the following examples we'll cover how to log a prompt with:

- 🗺 Instructions

- 📅 Metadata

- 🎓 In-context learning:

    - ⚽ One-shot-inference

    - 🏀 🎾 Few-shot-inference

- 🎛 [Hyperparameter](https://www.comet.com/production/site/lp/your-ultimate-guide-to-hyperparameter-tuning/) configurations

# 🤖 Our application

For this tutorial, **imagine you lead the Customer Support team at your company. It's the end of the quarter, and you want to summarize all of the support issues your team has dealt with to identify some possible areas of improvement.**

We'll be using the [dialogsum dataset](https://huggingface.co/datasets/knkarthick/dialogsum) from Hugging Face, which consists of 13,460 short conversations with corresponding manually labeled summaries and topics.

To summarize these conversations, we'll use Hugging Face's implementation of [FLAN-T5](https://huggingface.co/google/flan-t5-base).

In [None]:
DATASET_NAME = "knkarthick/dialogsum"
MODEL_NAME = "google/flan-t5-base"

In [None]:
dataset = load_dataset(DATASET_NAME, split = "test") #[120, 255, 303, 321, 333, 348, 354]
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

Downloading readme:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/442k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/3 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

## 🗺 Prompt with instructions

One of the most basic ways to improve our prompt is with a set of simple instructions. You may want to play around with the format and wording of these instructions to determine the prompt that best helps your model understand the task. Sometimes even slight rephrasings of a prompt can significantly alter the output.

This form of prompt engineering is typically the first one a practitioner will engage in because it's very, very inexpensive and gives you quick insights into whether you're on the right path.

In [None]:
def summarize_v1(user_prompt):

    input = tokenizer(user_prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            input["input_ids"],
            max_new_tokens=50,
        )[0],
        skip_special_tokens=True,
    )

    comet_llm.log_prompt(api_key = COMET_API_KEY, prompt = user_prompt, output = output)
    return output

In [None]:
user_prompt = f"""
Summarize the following conversation.

{dataset['dialogue'][255]}

Summary:
    """

Let's take a look at what our final prompt will look like:

In [None]:
print(user_prompt)


Summarize the following conversation.

#Person1#: What's the matter with this computer?
#Person2#: I don't know, but it just doesn't work well. Whenever I start it, it stops running.
#Person1#: Have you asked Mr. Li for some advice?
#Person2#: Yes, I have, but he doesn't seem to be able to solve the problem, either. Can you help me?
#Person1#: Me? I know nothing more than playing computer games.
#Person2#: What shall I do? I have to finish this report this afternoon, but...
#Person1#: But why don't you ring up the repairmen? They will be able to settle the problem.
#Person2#: Yes, I'll ring them up.

Summary:
    


Now let's see how well our model summarizes the conversation.

In [None]:
summarize_v1(user_prompt)

Prompt logged to https://www.comet.com/anmorgan24/cometllm-prompt1


INFO:comet_llm.summary:Prompt logged to https://www.comet.com/anmorgan24/cometllm-prompt1


"#Person1#: I'm having a computer problem."

Finally, let's compare this with our ground-truth label:

In [None]:
dataset['summary'][255]

'#Person2# finds that the computer has stopped running. #Person1# suggests #Person2# ring up the repairmen.'

[![SgRF2.gif](https://s11.gifyu.com/images/SgRF2.gif)](https://www.comet.com/examples/cometllm-prompt-example/prompts)


Not bad, but we can do better! Let's try a few more prompt engineering techniques.

## 📅 Prompt with metadata

As we begin to alter, or "engineer," our prompts, we might also want to log some important metadata. If we're comparing the output of several models, we'll want to log which models produced which results. We may also want to play around with different hyperparameter values, or "generation configurations" (more on that later).

Some other relevant pieces of information to log might include:
- ⏰ How long does each prompt take to process? (duration)
- 🗣 Which task is the model performing? (summarization, text-generation, translation, etc.)
- 🏷 Do we have ground truth labels? (usually human-generated responses)

In [None]:
def summarize_v2(user_prompt, prompt_template, tags, metadata):
    start = time.time()

    variables = {"user_prompt": user_prompt}
    final_prompt = prompt_template.format(**variables)

    input = tokenizer(final_prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            input["input_ids"],
            max_new_tokens=metadata["max_new_tokens"],
        )[0],
        skip_special_tokens=True,
    )

    duration = time.time() - start

    comet_llm.log_prompt(
        api_key = COMET_API_KEY,
        prompt=final_prompt,
        prompt_template=prompt_template,
        prompt_template_variables=variables,
        output=output,
        tags=tags,
        duration=duration * 1000,
        metadata=metadata,
    )

    return output

In [None]:
METADATA = {
    "model": MODEL_NAME,
    "max_new_tokens": 50,
    "skip_special_tokens": True,
}

TAGS = ["prompt-with-instructions", "summarization"]

In [None]:
user_prompt = dataset['dialogue'][255]

prompt_template = """
Summarize the following conversation.

{user_prompt}

Summary:
    """

Our final prompt:

In [None]:
print(user_prompt)

#Person1#: What's the matter with this computer?
#Person2#: I don't know, but it just doesn't work well. Whenever I start it, it stops running.
#Person1#: Have you asked Mr. Li for some advice?
#Person2#: Yes, I have, but he doesn't seem to be able to solve the problem, either. Can you help me?
#Person1#: Me? I know nothing more than playing computer games.
#Person2#: What shall I do? I have to finish this report this afternoon, but...
#Person1#: But why don't you ring up the repairmen? They will be able to settle the problem.
#Person2#: Yes, I'll ring them up.


Our output:

In [None]:
summarize_v2(user_prompt, prompt_template, TAGS, METADATA)

"#Person1#: I'm having a computer problem."

[![SgRFz.gif](https://s11.gifyu.com/images/SgRFz.gif)](https://www.comet.com/examples/cometllm-prompt-example/prompts)

## 🎓 Prompt template with in-context learning

Once we've done everything we can to optimize the prompt instructions, we might choose to further improve performance by including examples in our prompt. This is called [in-context learning](https://towardsdatascience.com/all-you-need-to-know-about-in-context-learning-55bde1180610#:~:text=Now%2C%20we%20can%20give%20a,source).

In one-shot inference, we provide a single example within the prompt. In few-shot learning, we provide multiple examples within the prompt. Generally, if you need more than five or six examples to get the output you're looking for, you may want to consider fine-tuning your model or selecting a different model.

If our few-shot example doesn't perform much better than our one-shot example, we might consider using the one-shot example for better latency (good thing we're keeping tracking of that in our metadata!). We'll also have to be aware of our model's context window (in this case, 512 tokens), which limits how many examples we can provide.

For our use case one "example" will include both a conversation (to be summarized), as well as an accurate summarization (available in the ground truth labels of our dataset).

### 🎾 One-shot inference

We provide a single example within our prompt:

In [None]:
def summarize_v3(user_prompt, prompt_template, tags, metadata):
    start = time.time()

    variables = {
        "user_prompt": user_prompt,
        "example_1": example_1,
        "summary_1": summary_1,
    }
    final_prompt = prompt_template.format(**variables)

    input = tokenizer(final_prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            input["input_ids"],
            max_new_tokens=metadata["max_new_tokens"],
        )[0],
        skip_special_tokens=True,
    )

    duration = time.time() - start

    comet_llm.log_prompt(
        api_key = COMET_API_KEY,
        prompt=final_prompt,
        prompt_template=prompt_template,
        prompt_template_variables=variables,
        output=output,
        tags=tags,
        duration=duration * 1000,
        metadata=metadata,
    )

    return output

In [None]:
user_prompt = dataset["dialogue"][255]
example_1 = dataset["dialogue"][120]
summary_1 = dataset["dialogue"][120]

prompt_template = f"""
Summarize the following conversation.

{example_1}

Summary:
{summary_1}


"""
prompt_template += """
Summarize the following conversation.

{user_prompt}

Summary:
"""

In [None]:
METADATA = {
    "model": MODEL_NAME,
    "max_new_tokens": 50,
    "skip_special_tokens": True,
}

TAGS = ["one-shot-inference", "summarization"]

Our final prompt:

In [None]:
print(user_prompt)

#Person1#: What's the matter with this computer?
#Person2#: I don't know, but it just doesn't work well. Whenever I start it, it stops running.
#Person1#: Have you asked Mr. Li for some advice?
#Person2#: Yes, I have, but he doesn't seem to be able to solve the problem, either. Can you help me?
#Person1#: Me? I know nothing more than playing computer games.
#Person2#: What shall I do? I have to finish this report this afternoon, but...
#Person1#: But why don't you ring up the repairmen? They will be able to settle the problem.
#Person2#: Yes, I'll ring them up.


Our output:

In [None]:
summarize_v3(user_prompt, prompt_template, TAGS, METADATA)

Token indices sequence length is longer than the specified maximum sequence length for this model (518 > 512). Running this sequence through the model will result in indexing errors


'#Person1#: I have a problem with my computer.'

[![SgRFL.gif](https://s11.gifyu.com/images/SgRFL.gif)](https://www.comet.com/examples/cometllm-prompt-example/prompts)


### ⚽ 🏀 Few-shot inference
We provide multiple examples within our prompt:

In [None]:
def summarize_v4(user_prompt, prompt_template, tags, metadata):
    start = time.time()

    variables = {
        "user_prompt": user_prompt,
        "example_1": example_1,
        "summary_1": summary_1,
        "example_2": example_2,
        "summary_2": summary_2,
        "example_3": example_3,
        "summary_3": summary_3,
    }
    final_prompt = prompt_template.format(**variables)

    input = tokenizer(final_prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            input["input_ids"],
            max_new_tokens=metadata["max_new_tokens"],
        )[0],
        skip_special_tokens=True,
    )

    duration = time.time() - start

    comet_llm.log_prompt(
        api_key = COMET_API_KEY,
        prompt=final_prompt,
        prompt_template=prompt_template,
        prompt_template_variables=variables,
        output=output,
        tags=tags,
        duration=duration * 1000,
        metadata=metadata,
    )

    return output

In [None]:
METADATA = {
    "model": MODEL_NAME,
    "max_new_tokens": 50,
    "skip_special_tokens": True,
}

TAGS = ["few-shot-inference", "summarization"]

In [None]:
user_prompt = dataset["dialogue"][255]
example_1 = dataset["dialogue"][120]
summary_1 = dataset["summary"][120]
example_2 = dataset["dialogue"][303]
summary_2 = dataset["summary"][303]
example_3 = dataset["dialogue"][354]
summary_3 = dataset["summary"][354]


prompt_template = f"""
Summarize the following conversation.

{example_1}

Summary:
{summary_1}


Summarize the following conversation.

{example_2}

Summary:
{summary_2}


Summarize the following conversation.

{example_3}

Summary:
{summary_3}


"""
prompt_template += """
Summarize the following conversation.

{user_prompt}

Summary:
"""

Our final prompt:

In [None]:
print(user_prompt)

#Person1#: What's the matter with this computer?
#Person2#: I don't know, but it just doesn't work well. Whenever I start it, it stops running.
#Person1#: Have you asked Mr. Li for some advice?
#Person2#: Yes, I have, but he doesn't seem to be able to solve the problem, either. Can you help me?
#Person1#: Me? I know nothing more than playing computer games.
#Person2#: What shall I do? I have to finish this report this afternoon, but...
#Person1#: But why don't you ring up the repairmen? They will be able to settle the problem.
#Person2#: Yes, I'll ring them up.


Our output:

In [None]:
summarize_v4(user_prompt, prompt_template, TAGS, METADATA)

'#Person1#: I have a problem with my computer.'

**Note** that because we reached our context window capacity with one-shot inference, we won't see any improvement with few-shot inference. The model will not accept more than 512 input tokens, so if we want to use few-shot learning for this task, we'll need to use another model.

[![SgRFK.gif](https://s11.gifyu.com/images/SgRFK.gif)](https://www.comet.com/examples/cometllm-prompt-example/prompts)

## 🎛 Optimizing generation configurations

In much the same way that we tune and optimize our hyperparameter values in traditional machine learning applications, in generative AI, we can tweak our "generation configuration."

Generation configuration parameters for this model include sampling methods, temperature, maximum new tokens, and more. The role of each of these parameters is beyond the scope of this tutorial, but generally speaking, we can think of the `temperature` as controlling the "creativity" of the model and the sampling method as controlling the "relevance" of the model.

In [None]:
def summarize_v5(user_prompt, prompt_template, tags, metadata):
    start = time.time()

    variables = {
        "user_prompt": user_prompt,
        "example_1": example_1,
        "summary_1": summary_1,
        "example_2": example_2,
        "summary_2": summary_2,
        "example_3": example_3,
        "summary_3": summary_3,
    }
    final_prompt = prompt_template.format(**variables)

    input = tokenizer(final_prompt, return_tensors="pt")
    output = tokenizer.decode(
        model.generate(
            input["input_ids"],
            generation_config=GenerationConfig(
                max_new_tokens=metadata["max_new_tokens"],
                do_sample=metadata["do_sample"],
                temperature=metadata["temperature"],
            ),
        )[0],
        skip_special_tokens=True,
    )

    duration = time.time() - start

    comet_llm.log_prompt(
        api_key = COMET_API_KEY,
        prompt=final_prompt,
        prompt_template=prompt_template,
        prompt_template_variables=variables,
        output=output,
        tags=tags,
        duration=duration * 1000,
        metadata=metadata,
    )

    return output

In [None]:
METADATA = {
    "model": MODEL_NAME,
    "max_new_tokens": 50,
    "skip_special_tokens": True,
    "do_sample": True,
    "temperature": 0.1,
}

TAGS = ["optimizing-config", "summarization"]

Our final prompt:

In [None]:
print(user_prompt)

#Person1#: What's the matter with this computer?
#Person2#: I don't know, but it just doesn't work well. Whenever I start it, it stops running.
#Person1#: Have you asked Mr. Li for some advice?
#Person2#: Yes, I have, but he doesn't seem to be able to solve the problem, either. Can you help me?
#Person1#: Me? I know nothing more than playing computer games.
#Person2#: What shall I do? I have to finish this report this afternoon, but...
#Person1#: But why don't you ring up the repairmen? They will be able to settle the problem.
#Person2#: Yes, I'll ring them up.


Our output:

In [None]:
summarize_v5(user_prompt, prompt_template, TAGS, METADATA)

'#Person1#: I have a problem with my computer.'

Note that we can also sort our rows by ascending or descending column values:

[![SgRFT.gif](https://s11.gifyu.com/images/SgRFT.gif)](https://www.comet.com/examples/cometllm-prompt-example/prompts)

# 🔎 Prompt search

Prompt engineering is a highly iterative process, so you're likely to repeat these processes many, many times. To make it easier to sift through all of your prompts, CometLLM has a search feature that allows you to isolate experiment runs based on keywords.

Maybe we run our experiments a few dozen times (or more!) before realizing that one of our example prompts (for in-context learning) has been incorrectly labeled. We want to remove any runs containing that example, because the output isn't relevant anymore.

Simply select the prompt variable you'd like to search and the filtering operator you'd like to use. Then type in your keyword and Comet will do the rest! Now we've found our erroneous prompts and can remove them!

_____

[![SgRjL.gif](https://s11.gifyu.com/images/SgRjL.gif)](https://www.comet.com/examples/cometllm-prompt-example/prompts)

____

Comet also integrates with most your favorite ML frameworks and tools, including OpenAI and Langchain. Head over to our [Integrations Page](https://www.comet.com/docs/v2/integrations/overview/?utm_medium=referral&utm_source=comet_llm&utm_term=intro_colab) to learn more!

# 📓 Additional Resources

- [Read Comet's prompt engineering blog post](https://heartbeat.comet.ml/organize-your-prompt-engineering-with-cometllm-66e390ef6645)
- [Check out our GitHub repo and give us a star](https://github.com/comet-ml/comet-llm)
- [Connect with us on our Community Slack channel](https://cometml.slack.com/join/shared_invite/enQtMzM0OTMwNTQ0Mjc5LWE4NzcxMzdiMmFjYzEzM2E5OTczOTk1MDZmZDg2MGJmODUwYWI0YWQ0YWMyMjlmMjQ5YmVmNzEyYjNlNzFhNjQ#/shared-invite/email)