<a href="https://colab.research.google.com/github/scorecard-ai/scorecard-cookbook/blob/main/Scorecard_Heuristic_Scoring_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo: Scorecard Heuristic Scoring Example - Exact String Match

## 🧙‍♂️ Instructions

1. Create an account and [login to Scorecard](https://app.getscorecard.ai/). Copy your [API key](https://app.getscorecard.ai/api-key).
1. Add your Scorecard and OpenAI API Keys below.
1. Go to `Runtime` -> `Run all`. Enjoy!

In [2]:
#@title 👉 API Keys

OPENAI_API_KEY = "" #@param { type: "string" }
SCORECARD_API_KEY = "" #@param { type: "string" }

# Setup

In [None]:
#@title Install dependencies
#@markdown In order to keep the notebook working for all future users, we pin the dependency versions.

!pip install scorecard-ai=='v1.0.0-beta0'
!pip install openai==1.11.1

In [3]:
#@title Imports

from openai import OpenAI
from scorecard.client import Scorecard


# Build your LLM system

Now, let's define your system (aka system-under-test)! For this demo, we'll set up an LLM call to generate the opening line of a story, where the user determines what the topic of the story will be.

In [4]:
#@title Define our multi-message prompt template

PROMPT_TEMPLATE_1 = "You are a helpful assistant." #@param { type:"string" }

PROMPT_TEMPLATE_2 = "Assist the user in crafting a story about {user_topic}." #@param { type:"string" }

PROMPT_TEMPLATE_3 = "I need a good opening line for my story. Please generate only the opening line." #@param { type:"string" }

In [5]:
#@title Call OpenAI to generate a story
#@markdown Here we'll define an example of a multi-message prompt sent to OpenAI.

def generate_story(user_topic: str) -> str:
  client = OpenAI(api_key=OPENAI_API_KEY)
  response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # or "gpt-4" depending on your access and requirements
    messages=[
        {"role": "system", "content": PROMPT_TEMPLATE_1},
        {"role": "system", "content": PROMPT_TEMPLATE_2.format(user_topic=user_topic)},
        {"role": "user", "content": PROMPT_TEMPLATE_3}
    ]
  )

  return response.choices[0].message.content

# Evaluate your system

## Pre-req: Create Heuristic Metrics **[DO NOT SKIP]**

First, using the [Scoring Lab](https://app.getscorecard.ai/scoring-lab) in the Scorecard application, create your metrics and scoring config.

For this example,
create a binary metric called **Exact String Match**, which compares the model output and ideal response to make sure they are an exact match. After that, create a Scoring Config that includes the newly created metric.

Once you have created your Scoring Config, copy the ID and enter it below:

In [6]:
#@title Configure Metrics
HEURISTIC_METRIC_ID = 293  #@param { type: "number" }
SCORING_CONFIG_ID = 187  #@param { type: "number" }

In [None]:
#@title 1. Create a basic Testset
#@markdown Here we'll create a basic Testset that gets stored in Scorecard.

client = Scorecard(
    api_key=SCORECARD_API_KEY
)

# Create a Testset
testset = client.testset.create(
    name="Story Opening Lines with Ideal",
    description="Demo of a testset created via Scorecard Python SDK",
    using_retrieval=False
)

# Add three testcases
client.testcase.create(
    testset_id=testset.id,
    user_query="magical powers to control ice and snow",
    ideal="sample ideal response",
)
client.testcase.create(
    testset_id=testset.id,
    user_query="a journey with a rugged iceman, his loyal reindeer, and a naive snowman",
    ideal="sample ideal response",
)
client.testcase.create(
    testset_id=testset.id,
    user_query="the story of two royal sisters",
    ideal="sample ideal response",
)

print("Visit the Scorecard app to view your Testset:")
print(f"https://app.getscorecard.ai/view-dataset/{testset.id}")

In [8]:
#@title 2. Define heuristic scoring function
#@markdown Do not hit the "Run Scoring" button to run scoring. Instead, implement the heuristic scoring function and kick off the scoring here in the SDK.

def is_exact_string_match(model_response, ideal_response):
    return model_response == ideal_response

In [None]:
#@title 3. Execute the Testset and run heuristic scoring
#@markdown Now we'll create a new Run to execute our LLM system above.

from scorecard.types import RunStatus

run = client.run.create(
    testset_id=testset.id,
    scoring_config_id=SCORING_CONFIG_ID,
)
client.run.update_status(run_id=run.id, status="running_execution")

for testcase in client.testset.get_testcases(testset_id=testset.id).results:
    model_response = generate_story(user_topic=testcase.user_query)
    testrecord = client.testrecord.create(run_id=run.id,
                           testset_id=testset.id,
                           testcase_id=testcase.id,
                           user_query=testcase.user_query,
                           response=model_response)
    client.score.create(
        run_id=run.id,
        testrecord_id=testrecord.id,
        metric_id=HEURISTIC_METRIC_ID,
        int_score=None,
        binary_score=is_exact_string_match(model_response, testcase.ideal),
        reasoning="SDK generated heuristic score",
    )

client.run.update_status(run_id=run.id, status="completed")

print("Visit the Scorecard app to view your Run:")
print(f"https://app.getscorecard.ai/view-grades/{run.id}")