# How to download feedback and examples from a test project
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langsmith-cookbook/blob/main/testing-examples/download-feedback-and-examples/download_example.ipynb)

When testing with Langsmith, all the traces, examples, and evaluation feedback are saved so you have a full audit of what happened.
This way you can see the aggregate metrics of the test run and compare on an example by example basis. You can also download the run and evaluation result information
to use in external reporting software.

In this walkthrough, we will show how to export the feedback and examples from a Langsmith test project. The main steps are:

1. Create a dataset
2. Run testing
3. Export feedback and examples

## Setup

Install langchain and any other dependencies for your chain. We will install pandas as well for this walkthrough to put the retrieved data in a dataframe.

In [1]:
# %pip install -U langsmith langchain anthropic pandas --quiet

In [2]:
import getpass
import os
from langchain.agents.output_parsers.openai_tools import OpenAIToolsAgentOutputParser
from langchain_community.tools import DuckDuckGoSearchResults
import getpass
import os

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_openai import ChatOpenAI

INFERENCE_SERVER_URL = "http://localhost:8000"
MODEL_NAME = "ibm-granite/granite-3.1-8b-base"
API_KEY= "alanliuxiang"

llm = ChatOpenAI(
    openai_api_key=API_KEY,
    openai_api_base= f"{INFERENCE_SERVER_URL}/v1",
    model_name=MODEL_NAME,
    top_p=0.92,
    temperature=0.01,
    max_tokens=512,
    presence_penalty=1.03,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)


In [3]:
import langsmith
import os
from uuid import uuid4
from langsmith import Client

unique_id = uuid4().hex[0:8]
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"download feedback"
os.environ["LANGSMITH_API_KEY"] = "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"
client = Client()

## 1. Create a dataset

We will create a simple KV dataset with a poem topic and a constraint letter (which the model should not use).

In [4]:
from langsmith import Client
import uuid

client = Client()

examples = [
    ("roses", "o"),
    ("vikings", "v"),
    ("planet earth", "e"),
    ("Sirens of Titan", "t"),
]

dataset_name = f"Download Feedback and Examples {str(uuid.uuid4())}"
dataset = client.create_dataset(dataset_name)

for prompt, constraint in examples:
    client.create_example(
        {"input": prompt, "constraint": constraint},
        dataset_id=dataset.id,
        outputs={"constraint": constraint},
    )

## 2. Run testing

We will use a simple custom evaluator that checks whether the prediction contains the constraint letter.

In [5]:
from typing import Any
from langchain.evaluation import StringEvaluator


class ConstraintEvaluator(StringEvaluator):
    @property
    def requires_reference(self):
        return True

    def _evaluate_strings(self, prediction: str, reference: str, **kwargs: Any) -> dict:
        # Reference in this case is the letter that should not be present
        return {
            "score": bool(reference not in prediction),
            "reasoning": f"prediction contains the letter {reference}",
        }

In [6]:
from langchain import chat_models, prompts

from langchain.smith import RunEvalConfig
from langchain_core.output_parsers import StrOutputParser

chain = (
    prompts.PromptTemplate.from_template(
        "Write a poem about {input} without using the letter {constraint}. Respond directly with the poem with no explanation."
    )
    | llm
    | StrOutputParser()
)

eval_config = RunEvalConfig(
    custom_evaluators=[ConstraintEvaluator()],
    input_key="input",
)

test_results = client.run_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain_factory=chain,
    evaluation=eval_config,
)

View the evaluation results for project 'unique-shelf-53' at:
https://smith.langchain.com/o/ff46f31e-6d0a-41ed-924a-b3f2b89e9ec9/datasets/417e2168-ebbf-42fb-8a71-65565163c6f0/compare?selectedSessions=311b7c4c-d5ec-4bca-a794-0d150114a267

View all tests for Dataset Download Feedback and Examples 39f40fb9-1b9b-4f9e-8a83-f43f1e46cf6b at:
https://smith.langchain.com/o/ff46f31e-6d0a-41ed-924a-b3f2b89e9ec9/datasets/417e2168-ebbf-42fb-8a71-65565163c6f0
[>                                                 ] 0/4



<<<<||||startstartstartstart____ofofofof____texttexttexttext|>|>|>|>



<<S
||irevikendendnsings__ of,ofof br Titan__,avetexttext and a|>|> t bold

ale,<< to
|| tellsstartstart,ail__
 theofofOf se__ starsasrolerole and,|>|> plan bothuseruserets young<<, and|| in oldendend. the__
 cosofofmicWith__ sw srolerolewordsell|>|>. of


 ste<<elNo|| and letterstartstart T shield__,sofof yet of__ words mighttexttext flow,|>|> free


,they<<
 con||Aquerendend l po__emandsof ofof,__ wonder bothtext

## 3. Review the feedback and examples

If you want to directly use the results, you can easily access them in tabular format by calling `to_dataframe()` on the test_results. 

In [7]:
test_results.to_dataframe()

Unnamed: 0,inputs.input,inputs.constraint,output,reference.constraint,feedback.ConstraintEvaluator,error,execution_time,run_id
12958d78-3a00-485c-adee-0a45b239ec95,Sirens of Titan,t,"\n<|start_of_text|>\nSirens of Titan, a tale t...",t,False,,21.084417,439fc183-231e-4309-901c-9fe1396facc2
9f7694e9-5a6f-45ed-9782-94ff5e0b2687,planet earth,e,\n<|start_of_text|>\n<|end_of_text|>\n<|start_...,e,False,,35.602585,f48c5248-ba1d-4b4e-9631-cd907108398b
1f9e75d7-1816-4e91-8420-9331fe101b1a,vikings,v,"\n<|start_of_text|>\n\nvikings, brave and bold...",v,False,,11.428632,23c810f4-bd34-42e4-a578-75c051e8fcf5
9246390d-32f7-4848-a88f-b4bc458be341,roses,o,\n<|start_of_text|>\n<|end_of_text|>\n<|start_...,o,False,,35.601038,cbdc13e6-0e51-4659-8a13-6b33d2f26c5a


If you want to fetch the feedback and examples for a historic test project, you can use the SDK:

In [8]:
# Can be any previous test projects
test_project = test_results["project_name"]

In [9]:
import pandas as pd

runs = client.list_runs(project_name=test_project, execution_order=1)

df = pd.DataFrame(
    [
        {
            "example_id": r.reference_example_id,
            **r.inputs,
            **(r.outputs or {}),
            **{
                k: v
                for f in client.list_feedback(run_ids=[r.id])
                for k, v in [
                    (f"{f.key}.score", f.score),
                    (f"{f.key}.comment", f.comment),
                ]
            },
            "reference": client.read_example(r.reference_example_id).outputs,
        }
        for r in runs
    ]
)
df

Unnamed: 0,example_id,input,constraint,output,constraintevaluator.score,constraintevaluator.comment,reference
0,9246390d-32f7-4848-a88f-b4bc458be341,roses,o,\n<|start_of_text|>\n<|end_of_text|>\n<|start_...,0.0,prediction contains the letter o,{'constraint': 'o'}
1,1f9e75d7-1816-4e91-8420-9331fe101b1a,vikings,v,"\n<|start_of_text|>\n\nvikings, brave and bold...",0.0,prediction contains the letter v,{'constraint': 'v'}
2,9f7694e9-5a6f-45ed-9782-94ff5e0b2687,planet earth,e,\n<|start_of_text|>\n<|end_of_text|>\n<|start_...,0.0,prediction contains the letter e,{'constraint': 'e'}
3,12958d78-3a00-485c-adee-0a45b239ec95,Sirens of Titan,t,"\n<|start_of_text|>\nSirens of Titan, a tale t...",0.0,prediction contains the letter t,{'constraint': 't'}


## Conclusion

In this example we showed how to download feedback and examples from a test project. You can directly use the result object from the run or use the SDK to fetch the results and feedback.
Use this to analyze further or to programmatically add result information to your existing reports.