# Run Experiments in Playground

### Setup

In [1]:
# Or you can use a .env file
from dotenv import load_dotenv
load_dotenv(dotenv_path="../../.env", override=True)

True

### Playground in langsmith


![image.png](attachment:29905fde-6adb-4bb1-8093-daba38529dd7.png)

 A prompt template where the llm is a pirate from the 15th century on an ongoing shipwreck

![image.png](attachment:3503af61-ebdd-492d-9c18-cdc94fd532c0.png)



![image.png](attachment:5b48ff8c-43aa-421b-92aa-86cd5323334b.png)

We can compare different models in the playground

![image.png](attachment:04b25835-186a-4f58-b76f-ddcf491f11ba.png)


we can also set repititions to double check that we are getting a correct response every time


![image.png](attachment:1f1ea39a-a30a-4d32-b833-98126e4d8201.png)


we can add output schemas to make sure that our output follows a specific format

![image.png](attachment:3c6614d9-92da-4e00-8518-8cee4c461db5.png)

We can add tools, to see how the llm would behave if it had access to certain tools

### Create a Dataset

Let's create a toy example dataset to run experiments over for a prompt

In [3]:
from langsmith import Client

example_inputs = [
("What color is the sky?", "The sky is blue"),
("What color is grass?", "Grass is green"),
("What color is dirt?", "Dirt is brown"),
]

client = Client()
dataset_name = "Sample Questions"

dataset = client.create_dataset(
  dataset_name=dataset_name, description="Sample questions about color",
)

inputs = [{"question": input_prompt} for input_prompt, _ in example_inputs]
outputs = [{"output": output_answer} for _, output_answer in example_inputs]

client.create_examples(
  inputs=inputs,
  outputs=outputs,
  dataset_id=dataset.id,
)

{'example_ids': ['17401f0d-7251-4ab8-b5ce-fbe480863468',
  'fc5aaec3-c9bf-45f0-894e-1dd18da6e7d7',
  'c6e0cf57-292e-41eb-a106-f18eac78443c'],
 'count': 3}

### We can see this when we load our created database into playground
![image.png](attachment:d36ba261-53c5-492d-8a02-f3e2f2406048.png)

### run it to see the outputs, with the same prompt template

![image.png](attachment:717a9c76-c702-43a0-a186-d2cdca719db7.png)

### modifying the prompt, in this case I asked it to give shorter results

![image.png](attachment:d6dd707f-737f-47f3-bb63-1c8293e174f7.png)

## MY OWN DATASET

In [4]:
from langsmith import Client

# Create a dataset about space and astronomy facts
example_inputs = [
    ("What is the largest planet in our solar system?", "Jupiter is the largest planet"),
    ("How many moons does Mars have?", "Mars has two moons"),
    ("What is the closest star to Earth?", "The Sun is the closest star to Earth"),
    ("What galaxy is Earth located in?", "Earth is in the Milky Way galaxy"),
    ("What is the hottest planet in the solar system?", "Venus is the hottest planet"),
    ("How long does it take light from the Sun to reach Earth?", "It takes about 8 minutes"),
    ("What is the smallest planet in our solar system?", "Mercury is the smallest planet"),
    ("How many planets are in our solar system?", "There are eight planets"),
    ("What is the name of Earth's natural satellite?", "The Moon is Earth's natural satellite"),
    ("What causes a solar eclipse?", "The Moon passes between the Sun and Earth"),
]

client = Client()
dataset_name = "Space Astronomy Questions"

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions and answers about space, planets, and astronomy",
)

inputs = [{"question": input_prompt} for input_prompt, _ in example_inputs]
outputs = [{"output": output_answer} for _, output_answer in example_inputs]

client.create_examples(
    inputs=inputs,
    outputs=outputs,
    dataset_id=dataset.id,
)

print(f"Created dataset '{dataset_name}' with {len(example_inputs)} examples")
print(f"Dataset ID: {dataset.id}")


Created dataset 'Space Astronomy Questions' with 10 examples
Dataset ID: bbd971ce-fff4-4ef6-8907-4cce21585621


### loading my dataset into playground and giving the llm a custom prompt template

![image.png](attachment:858a936c-dddf-4550-bd40-2a94766dee3b.png)

### We can see the outputs for all the prompts with the current prompt template along with the sample outputs which we set up


![image.png](attachment:8636ce12-d97d-477c-a998-cb4c5437db2c.png)




### modified the prompt template to give short answer

![image.png](attachment:6c906be5-f2ad-4289-9f0f-3c63e6343d68.png)



### Here I have set it up to compare the same prompt on the same dataset with two different llm models :


![image.png](attachment:a2443cba-bcd7-40e4-adc6-2a1d83c44b96.png)

### We can add an evaluator here to individually assess the quality of responses from the llm for our database.

![image.png](attachment:74dd1f23-0d75-49be-9a52-973118c25804.png)