# Logging, Tracking, and Debugging Prompts using Comet

In this section, we will demonstrate how to log, track, and debug prompt using the `comet-llm` library. `comet-llm` is an open-sourced repo managed by Comet. Please give the repo star if you have a chance and submit any feedback you have! https://github.com/comet-ml/comet-llm

Let's first load all the necessary libraries:


In [1]:
! pip install openai opik --quiet

In [2]:
from openai import OpenAI
import opik
import os
import IPython
import json
import pandas as pd
import numpy as np
import urllib

from dotenv import load_dotenv

#load the environment variables
load_dotenv()

# API configuration
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
COMET_API_KEY = os.getenv("COMET_API_KEY")
COMET_WORKSPACE = os.getenv("COMMET_WORKSPACE")

client = OpenAI(api_key= OPENAI_API_KEY)

# Configure opik
opik.configure()

OPIK: Configuration saved to file: /home/micha/.opik.config


The function below helps to generate the final results from the model after calling the OpenAI API:

In [None]:
# completion function
def get_completion(messages, model="gpt-40", temperature=0, max_tokens=300):
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature,
        max_tokens=max_tokens
    )
    return response.choices[0].message["content"]

### Load the Data

The code below loads both the few-shot demonstrations and the validation dataset used for testing the model.

In [3]:

# print markdown
def print_markdown(text):
    """Prints text as markdown"""
    IPython.display.display(IPython.display.Markdown(text))

# load validation data from GitHub
f = urllib.request.urlopen("https://raw.githubusercontent.com/comet-ml/comet-llmops/main/data/article-tags.json")
val_data = json.load(f)

# load few shot data from GitHub
f = urllib.request.urlopen("https://raw.githubusercontent.com/comet-ml/comet-llmops/main/data/few_shot.json")
few_shot_data = json.load(f)

The following is a helper function to obtain the final predictions from the model given a prompt template (e.g., zero-shot or few-shot) and the provided input data.

In [None]:
def get_predictions(prompt_template, inputs):

    responses = []

    for i in range(len(inputs)):
        messages = messages = [
            {
                "role": "system",
                "content": prompt_template.format(input=inputs[i])
            }
        ]
        response = get_completion(messages)
        responses.append(response)

    return responses

### Few-Shot

First, we define a few-shot template which will leverage the few-shot demonstration data loaded previously.

In [4]:
# function to define the few-shot template
def get_few_shot_template(few_shot_prefix, few_shot_suffix, few_shot_examples):
    return few_shot_prefix + "\n\n" + "\n".join([ "Abstract: "+ ex["abstract"] + "\n" + "Tags: " + str(ex["tags"]) + "\n" for ex in few_shot_examples]) + "\n\n" + few_shot_suffix

# function to sample few shot data
def random_sample_data (data, n):
    return np.random.choice(few_shot_data, n, replace=False)


# the few-shot prefix and suffix
few_shot_prefix = """Your task is to extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\"model_name\"]. If you don't find model names in the abstract or you are not sure, return [\"NA\"]"""
few_shot_suffix = """Abstract: {input}\nTags:"""

# load 3 samples from few shot data
few_shot_template = get_few_shot_template(few_shot_prefix, few_shot_suffix, random_sample_data(few_shot_data, 3))

In [5]:
few_shot_template

'Your task is to extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format ["model_name"]. If you don\'t find model names in the abstract or you are not sure, return ["NA"]\n\nAbstract: Children\'s drawings have a wonderful inventiveness, creativity, and variety to them. We present a system that automatically animates children\'s drawings of the human figure, is robust to the variance inherent in these depictions, and is simple and straightforward enough for anyone to use. We demonstrate the value and broad appeal of our approach by building and releasing the Animated Drawings Demo, a freely available public website that has been used by millions of people around the world. We present a set of experiments exploring the amount of training data needed for fine-tuning, as well as a perceptual study demonstrating the appeal of a novel twisted perspective retargeting technique. Finally, we introduce the Amateur Drawings Dataset,

### Zero-Shot Template

The code below defines the zero-shot template. Note that we use the same instruction from the few-shot prompt template. But in this case, we don't use the demonstrations.

In [6]:
zero_shot_template = """
Your task is extract model names from machine learning paper abstracts. Your response is an an array of the model names in the format [\"model_name\"]. If you don't find model names in the abstract or you are not sure, return [\"NA\"]

Abstract: {input}
Tags:
"""

### Get Predictions

We then generated all the predictions using the validation data as inputs:

In [None]:
# get the predictions

abstracts = [val_data[i]["abstract"] for i in range(len(val_data))]
few_shot_predictions = get_predictions(few_shot_template, abstracts)
zero_shot_predictions = get_predictions(zero_shot_template, abstracts)
expected_tags = [str(val_data[i]["tags"]) for i in range(len(val_data))]

In [None]:
print("Few shot predictions")
print(few_shot_predictions)
print("\n\nZero shot predictions")
print(zero_shot_predictions)
print("\n\nExpected tags")
print(expected_tags)

Few shot predictions
["['LLM', 'ChatGPT', 'LLaMA', 'WizardLM', 'OpenAI ChatGPT']", "['FLAN-T5', 'AMR', 'UD', 'SRL', 'LoRA']", '["NA"]', "['ChatGPT', 'GPT-4', 'LLaMA', 'Alpaca', 'GMMSeg', 'GMMs', 'PAXQA']", "['ChatGPT']", "['ViT', 'OpenCLIP']", "['SAM', 'IA', 'AIGC', 'Stable Diffusion']", "['Anything-3D', 'BLIP', 'Segment-Anything']", "['Chameleon', 'LLMs', 'GPT-4', 'ScienceQA', 'TabMWP', 'ChatGPT']", "['NA']"]


Zero shot predictions
['["WizardLM", "Evol-Instruct", "LLaMA", "ChatGPT"]', '["FLAN-T5"]', '["large language models", "generative AI", "hypothesis machines"]', '["PAXQA"]', '["ChatGPT"]', '["ViT", "OpenCLIP"]', '["Segment-Anything Model (SAM)", "Inpaint Anything (IA)", "AIGC models", "Stable Diffusion", "Inpaint Anything (IA)"]', '["Anything-3D", "BLIP", "Segment-Anything", "text-to-image diffusion model"]', '["Chameleon", "GPT-4", "ScienceQA", "TabMWP", "ChatGPT"]', '["NA"]']


Expected tags
["['LLaMA', 'ChatGPT', 'WizardLM']", "['FLAN-T5', 'FLAN']", "['NA']", "['PAXQA']", "['

### Log Prompt Results

Finally, we log the prompt + results to Comet. Note that we are logging both the few-shot and zero-shot results, together with all the metadata and tags.

In [None]:
# log the predictions in Comet along with the ground truth for comparison

# set up comet
# COMET_API_KEY = "COMET_API_KEY"
# COMET_WORKSPACE = "COMET_WORKSPACE"

# initialize comet
# experiment = comet_ml.start(api_key=COMET_API_KEY, workspace=COMET_WORKSPACE, project_name="ml-paper-tagger-prompts")

# Sample test
# few_shot_predictions = ["['LLM', 'ChatGPT', 'LLaMA', 'WizardLM', 'OpenAI ChatGPT']", "['FLAN-T5', 'AMR', 'UD', 'SRL', 'LoRA']", '["NA"]', "['ChatGPT', 'GPT-4', 'LLaMA', 'Alpaca', 'GMMSeg', 'GMMs', 'PAXQA']", "['ChatGPT']", "['ViT', 'OpenCLIP']", "['SAM', 'IA', 'AIGC', 'Stable Diffusion']", "['Anything-3D', 'BLIP', 'Segment-Anything']", "['Chameleon', 'LLMs', 'GPT-4', 'ScienceQA', 'TabMWP', 'ChatGPT']", "['NA']"]
# zero_shot_predictions = ['["WizardLM", "Evol-Instruct", "LLaMA", "ChatGPT"]', '["FLAN-T5"]', '["large language models", "generative AI", "hypothesis machines"]', '["PAXQA"]', '["ChatGPT"]', '["ViT", "OpenCLIP"]', '["Segment-Anything Model (SAM)", "Inpaint Anything (IA)", "AIGC models", "Stable Diffusion", "Inpaint Anything (IA)"]', '["Anything-3D", "BLIP", "Segment-Anything", "text-to-image diffusion model"]', '["Chameleon", "GPT-4", "ScienceQA", "TabMWP", "ChatGPT"]', '["NA"]']


# log the predictions
for i in range(len(expected_tags)):
    # log the few-shot predictions
    opik.Prompt(
        name='few shot template', 
        prompt=few_shot_template.format(input=abstracts[i]),
        metadata = {
            "expected_tags": expected_tags[i],
            "abstract": abstracts[i],
        }
        )
    # comet_ml.log_prompt(
    #     prompt=few_shot_template.format(input=abstracts[i]),
    #     prompt_template=few_shot_template,
    #     output=few_shot_predictions[i],
    #     tags = ["gpt-3.5-turbo", "few-shot"],
        # metadata = {
        #     "expected_tags": expected_tags[i],
        #     "abstract": abstracts[i],
        # }
    # )

    # log the zero-shot predictions
    opik.Prompt(
        name='zero shot template', 
        prompt=zero_shot_template.format(input=abstracts[i]),
        metadata = {
            "expected_tags": expected_tags[i],
            "abstract": abstracts[i],
        }
        )
    # comet_ml.log_prompt(
    #     prompt=zero_shot_template.format(input=abstracts[i]),
    #     prompt_template=zero_shot_template,
    #     output=zero_shot_predictions[i],
    #     tags = ["gpt-3.5-turbo", "zero-shot"],
    #     metadata = {
    #         "expected_tags": expected_tags[i],
    #         "abstract": abstracts[i],
    #     }
    # )