<a href="https://colab.research.google.com/github/scorecard-ai/scorecard-cookbook/blob/main/Scorecard_Custom_Schema_Run_Example.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Demo: Scorecard Custom Schema Run Example
## 🧙‍♂️ Instructions

1. Create an account and [login to Scorecard](https://app.getscorecard.ai/). Copy your [API key](https://app.getscorecard.ai/settings).
1. Add your Scorecard and OpenAI API Keys below.
1. Go to `Runtime` -> `Run all`. Enjoy!

In [None]:
#@title 👉 API Keys

OPENAI_API_KEY = "" #@param { type: "string" }
SCORECARD_API_KEY = "" #@param { type: "string" }

# Setup

In [None]:
#@title Install dependencies
#@markdown In order to keep the notebook working for all future users, we pin the dependency versions.

!pip install scorecard-ai=='v1.0.0-beta0'
!pip install openai==1.11.1

In [None]:
#@title Imports

from openai import OpenAI
from scorecard.client import Scorecard


# Build your LLM system

Now, let's define your system (aka system-under-test)! For this demo, we'll set up an LLM call to generate the opening line of a story, where the user determines what the topic of the story will be.

In [None]:
#@title Define our multi-message prompt template

PROMPT_TEMPLATE_1 = "You are a helpful assistant." #@param { type:"string" }

PROMPT_TEMPLATE_2 = "Assist the user in crafting a story about {user_query}." #@param { type:"string" }

PROMPT_TEMPLATE_3 = "I need a good opening line for my story containing the word '{keyword}', with a total characters of {max_chars} or less. Please generate only the opening line." #@param { type:"string" }

In [None]:
#@title Call OpenAI to generate a story
#@markdown Here we'll define an example of a multi-message prompt sent to OpenAI.

def generate_story(user_query: str, keyword: str, max_chars: int) -> str:
  client = OpenAI(api_key=OPENAI_API_KEY)
  response = client.chat.completions.create(
    model="gpt-3.5-turbo",  # or "gpt-4" depending on your access and requirements
    messages=[
        {"role": "system", "content": PROMPT_TEMPLATE_1},
        {"role": "system", "content": PROMPT_TEMPLATE_2.format(user_query=user_query)},
        {"role": "user", "content": PROMPT_TEMPLATE_3.format(keyword=keyword, max_chars=max_chars)}
    ]
  )

  return response.choices[0].message.content

# Evaluate your system

### Pre-req: Create Metrics

First, using the Scorecard application, create your metrics and scoring config. For this example,
we can use something simple like a Helpfulness metric that determines whether
the generation adheres to the user's request.

Once you have created your scoring config, copy the ID and enter it below:

In [None]:
#@title Configure Metrics
SCORING_CONFIG_ID = 1  #@param { type: "number" }

In [None]:
#@title 1. Create a Testset with custom variables
#@markdown Here we'll create a basic Testset that gets stored in Scorecard.

from scorecard.types import CustomSchema, CustomVariable
import json

client = Scorecard(
    api_key=SCORECARD_API_KEY
)

# Create a Testset
testset = client.testset.create(
    name="Story Opening Lines",
    description="Demo of a testset created via Scorecard Python SDK",
    using_retrieval=False,
    custom_schema=CustomSchema(
        variables=[
            CustomVariable(
                name="inputs",
                description="Custom Parameters for this Testset.",
                role="input",
                data_type="json_object",
            ),
        ]
    ),
)

# Add 2 testcases
client.testcase.create(
    testset_id=testset.id,
    user_query="magical powers to control ice and snow",
    custom_inputs={
        "inputs": json.dumps(
            {
                "keyword": "ice",
                "max_chars": 100
            }
        ),
    },
)
client.testcase.create(
    testset_id=testset.id,
    user_query="a journey with a rugged iceman, his loyal reindeer, and a naive snowman",
    custom_inputs={
        "inputs": json.dumps(
            {
                "keyword": "reindeer",
                "max_chars": 200
            }
        ),
    },
)

print("Visit the Scorecard app to view your Testset:")
print(f"https://app.getscorecard.ai/view-dataset/{testset.id}")

In [None]:
#@title 2. Run Tests
#@markdown Now we'll create a new Run to execute our LLM system above.

from scorecard.types import RunStatus

run = client.run.create(testset_id=testset.id)
client.run.update_status(run_id=run.id, status="running_execution")

for testcase in client.testset.get_testcases(testset_id=testset.id).results:
  custom_variables = json.loads(testcase.custom_inputs["inputs"])
  model_response = generate_story(
      user_query=testcase.user_query,
      keyword=custom_variables["keyword"],
      max_chars=custom_variables["max_chars"],
  )

  client.testrecord.create(run_id=run.id,
                           testset_id=testset.id,
                           testcase_id=testcase.id,
                           user_query=testcase.user_query,
                           custom_inputs=testcase.custom_inputs,
                           response=model_response)

client.run.update_status(run_id=run.id, status="awaiting_scoring")

print("Visit the Scorecard app to view your Run:")
print(f"https://app.getscorecard.ai/view-grades/{run.id}")

In [None]:
#@title 3. Kick off Scoring
#@markdown Once your run above is finished executing, hit the "Run Scoring" button to run scoring. Once that's done, visit the Results page:

print("Visit the Scorecard app to view your Results:")
print(f"https://app.getscorecard.ai/view-grades/{run.id}")