<a href="https://colab.research.google.com/github/rohithmsr/AI-practice/blob/main/Opik/3_create_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://raw.githubusercontent.com/comet-ml/opik/main/apps/opik-documentation/documentation/static/img/opik-logo.svg" width="250"/>

# Create an Evaluation Dataset With Opik

In this exercise, you'll create an evaluation dataset with Opik. Datasets can be used to track test cases you would like to evaluate your LLM on. Once a dataset has been created, you can run Experiments on it. Each Experiment will evaluate an LLM application based on the test cases in the dataset using an evaluation metric and report the results back to the dataset.

# Imports & Configuration

In [1]:
%pip install opik comet_ml --quiet

In [2]:
import os
import IPython
import ast
import csv
import opik
import getpass
from opik import Opik

In [3]:
# opik configs
if "OPIK_API_KEY" not in os.environ:
    os.environ["OPIK_API_KEY"] = getpass.getpass("Enter your Opik API key: ")

opik.configure()

Enter your Opik API key: ··········


OPIK: Opik is already configured. You can check the settings by viewing the config file at /root/.opik.config


# Dataset

The **`get_or_create_dataset`** method checks if dataset with the given name already exists, and, if so, the existing dataset will be returned. If not, then it creates the dataset.

Opik also automatically deduplicates items that are inserted into a dataset when using the Python SDK. This means that you can insert the same item multiple times without duplicating it in the dataset.

These two features combined means that you can use the SDK to manage your datasets in a "fire and forget" manner.

In [4]:
# Create or get the dataset
client = Opik()
dataset = client.get_or_create_dataset(name="foodchatbot_eval")

## Optional: Download Dataset From Comet

If you have not previously created the `foodchatbot_eval` dataset in your Opik workspace, run the following code to download the dataset as a Comet Artifact and populate your Opik dataset.

If you have already created the `foodchatbot_eval` dataset, you can skip to the next section.

In [5]:
import comet_ml

comet_ml.login(api_key=os.environ["OPIK_API_KEY"])
experiment = comet_ml.start(project_name="foodchatbot_eval")

logged_artifact = experiment.get_artifact(artifact_name="foodchatbot_eval",
                                          workspace="examples")
local_artifact = logged_artifact.download("./")
experiment.end()

[1;38;5;39mCOMET INFO:[0m Valid Comet API Key saved in /root/.comet.config (set COMET_CONFIG to change where it is saved).
[1;38;5;39mCOMET INFO:[0m Experiment is live on comet.com https://www.comet.com/rohithmsr/foodchatbot-eval/9ce7f386107d42fd8fcb8e9e8465f3b3

[1;38;5;39mCOMET INFO:[0m Couldn't find a Git repository in '/content' nor in any parent directory. Set `COMET_GIT_DIRECTORY` if your Git Repository is elsewhere.
[1;38;5;39mCOMET INFO:[0m Artifact 'examples/foodchatbot_eval:2.0.0' download has been started asynchronously
[1;38;5;39mCOMET INFO:[0m Still downloading 1 file(s), remaining 7.54 KB/7.54 KB
[1;38;5;39mCOMET INFO:[0m Artifact 'examples/foodchatbot_eval:2.0.0' has been successfully downloaded
[1;38;5;39mCOMET INFO:[0m ---------------------------------------------------------------------------------------
[1;38;5;39mCOMET INFO:[0m Comet.ml Experiment Summary
[1;38;5;39mCOMET INFO:[0m --------------------------------------------------------------------

In [6]:
# Read the CSV file and insert items into the dataset
with open('./foodchatbot_clean_eval_dataset.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    next(reader, None) # skip the header
    for row in reader:
        index, question, response = row
        dataset.insert([
            {"question": question, "response": response}
        ])