# Experiments

### Setup

In [60]:
# Or you can use a .env file
from dotenv import load_dotenv
load_dotenv(dotenv_path="../../.env", override=True)

True

Here is the RAG Application that we've been working with throughout this course

In [61]:
import os
import tempfile
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders.sitemap import SitemapLoader
from langchain_community.vectorstores import SKLearnVectorStore
from langchain_openai import OpenAIEmbeddings
from langsmith import traceable
from openai import OpenAI
from typing import List
import nest_asyncio

# TODO: Configure this model!
MODEL_NAME = "gpt-4o"
MODEL_PROVIDER = "openai"
APP_VERSION = 1.0
RAG_SYSTEM_PROMPT = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the latest question in the conversation. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
"""

openai_client = OpenAI()

def get_vector_db_retriever():
    persist_path = os.path.join(tempfile.gettempdir(), "union.parquet")
    embd = OpenAIEmbeddings()

    # If vector store exists, then load it
    if os.path.exists(persist_path):
        vectorstore = SKLearnVectorStore(
            embedding=embd,
            persist_path=persist_path,
            serializer="parquet"
        )
        return vectorstore.as_retriever(lambda_mult=0)

    # Otherwise, index LangSmith documents and create new vector store
    ls_docs_sitemap_loader = SitemapLoader(web_path="https://docs.smith.langchain.com/sitemap.xml", continue_on_failure=True)
    ls_docs = ls_docs_sitemap_loader.load()

    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        chunk_size=500, chunk_overlap=0
    )
    doc_splits = text_splitter.split_documents(ls_docs)

    vectorstore = SKLearnVectorStore.from_documents(
        documents=doc_splits,
        embedding=embd,
        persist_path=persist_path,
        serializer="parquet"
    )
    vectorstore.persist()
    return vectorstore.as_retriever(lambda_mult=0)

nest_asyncio.apply()
retriever = get_vector_db_retriever()

"""
retrieve_documents
- Returns documents fetched from a vectorstore based on the user's question
"""
@traceable(run_type="chain")
def retrieve_documents(question: str):
    return retriever.invoke(question)

"""
generate_response
- Calls `call_openai` to generate a model response after formatting inputs
"""
@traceable(run_type="chain")
def generate_response(question: str, documents):
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    messages = [
        {
            "role": "system",
            "content": RAG_SYSTEM_PROMPT
        },
        {
            "role": "user",
            "content": f"Context: {formatted_docs} \n\n Question: {question}"
        }
    ]
    return call_openai(messages)

"""
call_openai
- Returns the chat completion output from OpenAI
"""
@traceable(
    run_type="llm",
    metadata={
        "ls_provider": MODEL_PROVIDER,
        "ls_model_name": MODEL_NAME
    }
)
def call_openai(messages: List[dict]) -> str:
    return openai_client.chat.completions.create(
        model=MODEL_NAME,
        messages=messages,
    )

"""
langsmith_rag
- Calls `retrieve_documents` to fetch documents
- Calls `generate_response` to generate a response based on the fetched documents
- Returns the model response
"""
@traceable(run_type="chain")
def langsmith_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    return response.choices[0].message.content


### Experiment

Here is a code snippet that should look similar to what you see from the starter code!

There are a few important components here.

1. We have defined an Evaluator
2. We pipe our dataset examples (dict) to the shape of input that our function `langsmith_rag` takes (str) using a target function

In [62]:
from langsmith import evaluate, Client

client = Client()
dataset_name = "RAG Application Golden Dataset"

def is_concise_enough(reference_outputs: dict, outputs: dict) -> dict:
    score = len(outputs["output"]) < 1.5 * len(reference_outputs["output"])
    return {"key": "is_concise", "score": int(score)}

def target_function(inputs: dict):
    return langsmith_rag(inputs["question"])

evaluate(
    target_function,
    data=dataset_name,
    evaluators=[is_concise_enough],
    experiment_prefix="gpt-4o"
)

View the evaluation results for experiment: 'gpt-4o-5a064b4d' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=d2978753-08e3-4331-8339-fd761a03567a




27it [01:15,  2.81s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,"No, LangSmith is not designed for finetuning a...",,"Yes, LangSmith can be used for fine-tuning and...",1,2.193444,61287e02-6887-407d-9abc-b7c2a4288c93,03c13c41-7d19-4724-96b4-76d9ce04c43d
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,2.68794,b4436f86-8cd3-44eb-bd06-1f68f1c94750,946ea2c3-5e9d-4fe9-9968-ccc0b00ffcd5
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. It ...",,"Yes, LangSmith supports online evaluation as a...",1,2.473419,b834b374-bcdd-4fa8-8545-196e4de07362,ab72e8d5-4c71-4be9-9cae-d30a3a973783
3,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,0,2.707106,1bf45575-6078-478b-a209-41f63d169af8,0b655a9b-95f7-45a9-b694-7c5261521baa
4,What testing capabilities does LangSmith have?,LangSmith allows for running multiple experime...,,LangSmith offers capabilities for creating dat...,1,4.847622,244cb784-9bb6-4adf-a064-b072a3e1c5c9,3d7e7e18-745e-4796-9259-e7835787e314
5,How do I pass metadata in with @traceable?,To pass metadata when using the `@traceable` d...,,You can pass metadata with the @traceable deco...,1,3.3926,3533a525-3638-4b78-8f71-b1e52765485e,1fc3d6ce-e4fb-4205-8af6-6e8e2a6c2e95
6,Does LangSmith support offline evaluation?,The provided context does not mention support ...,,"Yes, LangSmith supports offline evaluation thr...",1,2.114019,d0967bfc-73ae-45d2-904f-45a9f8532877,9d63f6ae-8357-449c-bb20-f5e00fa3ac3d
7,What is LangSmith used for in three sentences?,"LangSmith is a platform designed for building,...",,LangSmith is a platform designed for the devel...,1,2.320797,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,4d35cddb-7e3b-470a-a9bf-b59456e869bf
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,2.115625,f54fecb2-fb97-476d-abc6-adf9242dd9cf,efd59643-e5ad-41e4-9004-f5d673f18d40
9,How can I trace with the @traceable decorator?,To trace with the `@traceable` decorator in Py...,,To trace with the @traceable decorator in Pyth...,1,7.897746,f7309db8-9cbd-4564-9d68-fa4222c5b246,3187a9e3-ff71-4ca8-97ad-2be184cdb0c3


In [63]:
dataset_name2 = "MAT_496_1"

### Modifying your Application

Now, let's change our model to gpt-35-turbo and see how it performs!

Make this change, and then run this code snippet!

In [64]:
from langsmith import evaluate, Client
from langsmith.schemas import Example, Run

def target_function(inputs: dict):
    return langsmith_rag(inputs["question"])

evaluate(
    target_function,
    data=dataset_name,
    evaluators=[is_concise_enough],
    experiment_prefix="gpt-3.5-turbo"
)

View the evaluation results for experiment: 'gpt-3.5-turbo-2e91dd96' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=76a524bf-ede1-4732-b3db-913eec03f2b8




27it [01:18,  2.90s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,"No, LangSmith is not designed for finetuning a...",,"Yes, LangSmith can be used for fine-tuning and...",1,2.253412,61287e02-6887-407d-9abc-b7c2a4288c93,b2b69ea7-ac1d-4505-96fa-692acd7b2667
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,2.256826,b4436f86-8cd3-44eb-bd06-1f68f1c94750,4f4a0582-e994-480a-838f-e5c19c48e8e7
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. It ...",,"Yes, LangSmith supports online evaluation as a...",1,4.196638,b834b374-bcdd-4fa8-8545-196e4de07362,50e878dd-4236-41f7-87fa-a35b8d35a94c
3,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,0,2.714089,1bf45575-6078-478b-a209-41f63d169af8,4257162d-5c59-40af-8ffe-9f3fa2b4a441
4,What testing capabilities does LangSmith have?,LangSmith allows you to run multiple experimen...,,LangSmith offers capabilities for creating dat...,1,1.805444,244cb784-9bb6-4adf-a064-b072a3e1c5c9,c7a4179a-7487-495d-8a22-4b070996a23c
5,How do I pass metadata in with @traceable?,"To pass metadata in with `@traceable`, you can...",,You can pass metadata with the @traceable deco...,1,3.21488,3533a525-3638-4b78-8f71-b1e52765485e,38940d00-d511-4f2d-9364-3864c47e1f2f
6,Does LangSmith support offline evaluation?,The provided context does not explicitly menti...,,"Yes, LangSmith supports offline evaluation thr...",1,2.542945,d0967bfc-73ae-45d2-904f-45a9f8532877,f817a17d-7e2a-4f8a-99e5-46c13464a752
7,What is LangSmith used for in three sentences?,LangSmith is a platform designed for building ...,,LangSmith is a platform designed for the devel...,1,2.17721,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,267c2c15-e362-4284-9baf-7362cdf1458b
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,10.541515,f54fecb2-fb97-476d-abc6-adf9242dd9cf,b5f6b3a0-445d-4a98-b086-53f40cbb06df
9,How can I trace with the @traceable decorator?,"To trace with the @traceable decorator, simply...",,To trace with the @traceable decorator in Pyth...,1,3.76136,f7309db8-9cbd-4564-9d68-fa4222c5b246,5e9c6c36-4d0d-4c7e-b305-2231a2297a3b


### Ran it on another custom Dataset made by me

In [65]:
from langsmith import evaluate, Client
from langsmith.schemas import Example, Run

def target_function(inputs: dict):
    return langsmith_rag(inputs["question"])

evaluate(
    target_function,
    data=dataset_name2,
    evaluators=[is_concise_enough],
    experiment_prefix="gpt-3.5-turbo"
)

View the evaluation results for experiment: 'gpt-3.5-turbo-765e572e' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=33b530b7-0871-4c7f-b401-1afc2bf916a2




20it [00:55,  2.78s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Is knowledge justified true belief?,"The concept of knowledge as ""justified true be...",,The traditional definition of knowledge as jus...,1,3.699175,117529de-2f47-4ea5-8f75-fc006c6875d8,00726114-de76-4c9d-82d4-d412b37e7a8b
1,Can we ever truly know reality?,The question of whether we can truly know real...,,This epistemological question divides realists...,1,3.618399,67eee66d-43e6-4e17-b3aa-a18f815db3d1,adbc8f7f-d52d-479f-8ea6-a36266ba6e38
2,What is the meaning of life?,I don't know.,,The meaning of life is a fundamental philosoph...,1,1.270905,78b31e3c-7d35-4583-a123-43d311009fa4,59b54218-4471-4667-b2d3-85fa83e940f1
3,Is free will an illusion?,The question of whether free will is an illusi...,,The free will debate divides philosophers into...,0,3.478607,878ae5e4-43e5-4978-bb8c-5a135c268ab3,23821efd-be70-42a8-81a3-ecb2e18aa88a
4,What is justice?,Justice is the concept of fairness and moral r...,,Justice theories span from Plato's harmony and...,1,2.329779,9cc250b2-8e12-443c-a97f-080feec0c80d,4ea6c731-05f4-42bf-bfba-3a0848f7e71a
5,Is there an objective truth?,The concept of objective truth is a philosophi...,,Objectivists maintain truth exists independent...,1,1.975572,b1bfbfb3-c13a-40e5-96a5-eec6b808ea2e,818b3a16-4429-4bd1-af15-9dbbe959701b
6,Does God exist?,The existence of God is a philosophical and th...,,Classical arguments for God's existence includ...,1,2.383862,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,67e0935d-8278-4b02-ab2f-1824353e1db4
7,What makes an action morally right?,Determining whether an action is morally right...,,Different ethical frameworks offer competing a...,0,2.922428,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,2dd6c9e4-4fe8-4d45-8538-bd0ac5e8a3c1
8,What is consciousness?,Consciousness is the state of being aware of a...,,Consciousness remains one of philosophy's hard...,1,2.300732,cdbbdc85-94b0-486a-a385-71e23525cd32,9c736e1e-9d10-4084-bd77-615462eb703a
9,What is the nature of personal identity?,The nature of personal identity refers to the ...,,Personal identity theories address what makes ...,0,3.094348,ff329e5e-25ea-4906-a698-95aa680f61aa,c6d1299b-1860-4936-a1ef-e3e1f493b6d5


### Running over Different pieces of Data

##### Dataset Version

You can execute an experiment on a specific version of a dataset in the sdk by using the `as_of` parameter in `list_examples`

Let's try running on just our initial dataset.

In [66]:
evaluate(
    target_function,
    data=client.list_examples(dataset_name=dataset_name, as_of="initial"),   # We use as_of to specify a version
    evaluators=[is_concise_enough],
    experiment_prefix="initial dataset version"
)

View the evaluation results for experiment: 'initial dataset version-9ec99541' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=64d6f0fd-9779-48e0-9a84-e83eac35d0db




10it [00:25,  2.57s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,What is LangSmith used for in three sentences?,LangSmith is a platform designed for building ...,,LangSmith is a platform designed for the devel...,1,3.606532,09c8ddd5-5237-4119-a865-738aa44de6c8,6e9bc3a4-b561-41a4-ace0-dc7af056f418
1,Can LangSmith be used for finetuning and model...,LangSmith is primarily designed for monitoring...,,"Yes, LangSmith can be used for fine-tuning and...",1,2.210456,2a570477-3e73-47b1-8908-68a13dcb9f2d,678cc081-1468-4a23-b0c5-988a58735d09
2,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,2.053418,49aed8a1-9aaa-42a8-a5fe-5ed0a890ea16,19b286fe-9224-4cfb-8f9e-1f55f5ed8f43
3,Does LangSmith support offline evaluation?,The provided context mentions support for onli...,,"Yes, LangSmith supports offline evaluation thr...",1,1.760448,55314dea-1e4c-46ab-8b52-2a006198b473,7ad9bac1-0709-425a-9884-7092a6d37966
4,How can I trace with the @traceable decorator?,To trace with the `@traceable` decorator in Py...,,To trace with the @traceable decorator in Pyth...,1,2.333998,5a299649-e576-4b18-962f-ac0cad1ce780,a61d8ffa-d581-4734-8df4-5a26e073c940
5,How do I set up tracing to LangSmith if I'm us...,To set up tracing with LangSmith while using L...,,To set up tracing to LangSmith while using Lan...,0,2.788024,918f0aa0-b002-431b-8806-efd02cbf9f31,154bd897-2755-4fd5-a56c-33854ce1a869
6,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,3.663592,9e6d3fc8-7660-47b6-994c-28597b0467cf,3f7d145e-3d70-46a8-8c85-f07a88033114
7,How do I pass metadata in with @traceable?,"To pass metadata using `@traceable`, you can i...",,You can pass metadata with the @traceable deco...,1,3.409653,c0dd4647-38e6-42bd-bb6f-e9d729f40029,fa50202c-536d-40e7-95cf-027ce5451ff2
8,What testing capabilities does LangSmith have?,LangSmith allows users to run multiple experim...,,LangSmith offers capabilities for creating dat...,1,1.835775,e94d2cdd-02d3-487a-9f09-e8137e5d2df0,d1d89def-0a52-4cf5-89b4-9397ae3b08ec
9,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation to p...",,"Yes, LangSmith supports online evaluation as a...",1,1.534498,eecc6917-ad3f-44e6-8db3-85d72487a379,6d89e3e8-6ab8-410d-aab0-97e337d2c41b


##### Dataset Split

You can run an experiment on a specific split of your dataset, let's try running on the Crucial Examples split.

In [67]:
evaluate(
    target_function,
    data=client.list_examples(dataset_name=dataset_name, splits=["Crucial examples"]),  # We pass in a list of Splits
    evaluators=[is_concise_enough],
    experiment_prefix="Crucial Examples split"
)

View the evaluation results for experiment: 'Crucial Examples split-bb907f59' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=7e0d9dbe-9e45-4db9-af52-b8bb7fc0b380




5it [00:12,  2.50s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Is there a javascript LangSmith SDK,"Yes, there is a JavaScript/TypeScript SDK for ...",,Yes there is a Javascript Langsmith SDK,1,1.296332,031b2e39-0b72-4826-b68f-58f05ede9e86,9a75f6bb-1ac9-4f37-a217-346a2176f0ff
1,Can LangSmith be used for finetuning and model...,"LangSmith focuses on observability, evaluation...",,"Yes, LangSmith can be used for fine-tuning and...",1,2.036265,2a570477-3e73-47b1-8908-68a13dcb9f2d,1f25da31-f534-4fa1-8523-9251e6df53ef
2,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,1.995089,49aed8a1-9aaa-42a8-a5fe-5ed0a890ea16,5a4fc22f-face-4de9-99e6-f86512b51aaa
3,How does LangSmith handle data privacy?,LangSmith complies with the General Data Prote...,,LangSmith prioritizes data privacy by implemen...,1,2.545757,5bd3a574-4c5b-462b-a962-83d065d79b19,1cc8e331-b9fc-4373-90b3-14e9b85d3d29
4,How do I set up tracing to LangSmith if I'm us...,"To set up tracing to LangSmith with LangChain,...",,To set up tracing to LangSmith while using Lan...,0,4.14217,918f0aa0-b002-431b-8806-efd02cbf9f31,e15b434c-1657-4509-9e95-d13e9bfd8b3f


## Ran the experiment on two splits together, instead of just one 

In [68]:
evaluate(
    target_function,
    data=client.list_examples(dataset_name=dataset_name, splits=["Crucial examples", "mat496split"]),  # We pass in a list of Splits
    evaluators=[is_concise_enough],
    experiment_prefix="Crucial Examples split and mat496split"
)

View the evaluation results for experiment: 'Crucial Examples split and mat496split-b20f6999' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=eb7ccf90-0345-4f67-bdd7-40e84112e455




8it [00:21,  2.68s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,"No, LangSmith is not designed for fine-tuning ...",,"Yes, LangSmith can be used for fine-tuning and...",1,2.636241,61287e02-6887-407d-9abc-b7c2a4288c93,3cfb3a6b-9721-4815-9aa2-1a149a824f57
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,3.872404,b4436f86-8cd3-44eb-bd06-1f68f1c94750,2efea0a9-2565-408c-9d79-b4c0e10677bc
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluations, wh...",,"Yes, LangSmith supports online evaluation as a...",1,1.568879,b834b374-bcdd-4fa8-8545-196e4de07362,364e921f-6713-4de2-b5ca-0901b2124575
3,Is there a javascript LangSmith SDK,"Yes, there is a JavaScript (JS/TS) SDK for Lan...",,Yes there is a Javascript Langsmith SDK,1,1.225143,031b2e39-0b72-4826-b68f-58f05ede9e86,7b3c9450-9214-4c1a-9697-7b2be4275b03
4,Can LangSmith be used for finetuning and model...,LangSmith is primarily a platform for LLM obse...,,"Yes, LangSmith can be used for fine-tuning and...",1,2.558171,2a570477-3e73-47b1-8908-68a13dcb9f2d,6985da01-8bbc-436c-b6bb-a1af68b993dc
5,How do I create user feedback with the LangSmi...,To create user feedback using the LangSmith SD...,,To create user feedback with the LangSmith SDK...,1,2.504466,49aed8a1-9aaa-42a8-a5fe-5ed0a890ea16,8fc12dca-5d46-4f07-95ad-fd33c5a5d31e
6,How does LangSmith handle data privacy?,"LangSmith complies with GDPR, is SOC 2 Type 2 ...",,LangSmith prioritizes data privacy by implemen...,1,3.321501,5bd3a574-4c5b-462b-a962-83d065d79b19,e9b06627-6a3e-4acc-bee4-08ee39ab8cff
7,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,0,3.211899,918f0aa0-b002-431b-8806-efd02cbf9f31,402e203a-9a93-4446-917c-1dc5daa3f40f


##### Specific Data Points

You can specify individual data points to run an experiment over as well

In [69]:
evaluate(
    target_function,
    data=client.list_examples(
        dataset_name=dataset_name, 
        example_ids=[ 
            "b834b374-bcdd-4fa8-8545-196e4de07362",
            "f54fecb2-fb97-476d-abc6-adf9242dd9cf"
        ]
    ),
    evaluators=[is_concise_enough],
    experiment_prefix="two specific example ids"
)

View the evaluation results for experiment: 'two specific example ids-b14c3520' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=a7788542-a9a1-4b2c-b2d0-6910280024d5




2it [00:04,  2.26s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. It ...",,"Yes, LangSmith supports online evaluation as a...",1,2.151511,b834b374-bcdd-4fa8-8545-196e4de07362,d64993de-2e9c-4d17-b770-cf4a0e90d5bb
1,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,1.882027,f54fecb2-fb97-476d-abc6-adf9242dd9cf,a685b1da-9c33-406b-89a6-7c10a7b38870



## Added a new example with my own  dataset

In [70]:
evaluate(
    target_function,
    data=client.list_examples(
        dataset_name=dataset_name2, 
        example_ids=[ 
            "878ae5e4-43e5-4978-bb8c-5a135c268ab3",
            "2c6b6d4d-610d-42ab-a3c2-5ddd4d76a9f6",
            "9cc250b2-8e12-443c-a97f-080feec0c80d",
            "c18c4cef-a74e-4cb3-9472-cdc87c7d2f76",
            "ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074"
        ]
    ),
    evaluators=[is_concise_enough],
    experiment_prefix="five specific example ids"
)

View the evaluation results for experiment: 'five specific example ids-0550c432' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=cea572c0-0300-4e0d-8cdc-d80683c72fa0




5it [00:16,  3.35s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,What makes an action morally right?,I don't have information on what makes an acti...,,Different ethical frameworks offer competing a...,1,2.02223,2c6b6d4d-610d-42ab-a3c2-5ddd4d76a9f6,126fa4ac-533a-4c31-8a81-d5eef6ab9128
1,Is free will an illusion?,The question of whether free will is an illusi...,,The free will debate divides philosophers into...,1,2.757566,878ae5e4-43e5-4978-bb8c-5a135c268ab3,b5e0b839-dc12-46eb-9471-2a649741bce3
2,What is justice?,"Justice is a concept related to fairness, equi...",,Justice theories span from Plato's harmony and...,1,5.625375,9cc250b2-8e12-443c-a97f-080feec0c80d,df30d4cb-41f9-434a-a961-c16cfca1f3a8
3,Does God exist?,The question of God's existence is a deeply ph...,,Classical arguments for God's existence includ...,1,2.561609,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,c2757056-ed03-4fb8-b292-a96ccf9fe23e
4,What makes an action morally right?,The concept of what makes an action morally ri...,,Different ethical frameworks offer competing a...,1,3.272274,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,08a32679-8591-411b-8ec7-8a9e7d8bfea3


### Other Parameters

##### Repetitions

You can run an experiment several times to make sure you have consistent results

In [71]:
evaluate(
    target_function,
    data=dataset_name,
    evaluators=[is_concise_enough],
    experiment_prefix="two repetitions",
    num_repetitions=2   # This field defaults to 1
)

View the evaluation results for experiment: 'two repetitions-d1178b6f' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=b163a719-7f30-41cb-b93a-6c649be8b916




54it [02:18,  2.57s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,"No, LangSmith is focused on observability and ...",,"Yes, LangSmith can be used for fine-tuning and...",1,3.661217,61287e02-6887-407d-9abc-b7c2a4288c93,024e4c7f-0477-4933-898f-3a3df7655426
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,3.026069,b4436f86-8cd3-44eb-bd06-1f68f1c94750,23a2c493-a267-4f56-9366-e64317770103
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. It ...",,"Yes, LangSmith supports online evaluation as a...",1,2.280719,b834b374-bcdd-4fa8-8545-196e4de07362,d28e311d-b610-4bfa-a11f-befb4e759296
3,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,0,3.104873,1bf45575-6078-478b-a209-41f63d169af8,cf6da5f7-4bcf-4ad5-8045-27181a4693fb
4,What testing capabilities does LangSmith have?,LangSmith allows users to run multiple experim...,,LangSmith offers capabilities for creating dat...,1,1.718883,244cb784-9bb6-4adf-a064-b072a3e1c5c9,99cd6483-f7b0-4409-88ca-0e070025b5f2
5,How do I pass metadata in with @traceable?,You can pass metadata to the `@traceable` deco...,,You can pass metadata with the @traceable deco...,1,3.614544,3533a525-3638-4b78-8f71-b1e52765485e,cf25aeac-4e6f-4989-9c13-32a63832f5bb
6,Does LangSmith support offline evaluation?,"LangSmith focuses on online evaluation, provid...",,"Yes, LangSmith supports offline evaluation thr...",1,2.110429,d0967bfc-73ae-45d2-904f-45a9f8532877,4c7c00d7-3661-43fb-9cd2-75b8464cb5e6
7,What is LangSmith used for in three sentences?,LangSmith is a platform designed for building ...,,LangSmith is a platform designed for the devel...,1,2.714655,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,d65c7df6-79ef-4660-87a1-8f3a35defa58
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,2.216813,f54fecb2-fb97-476d-abc6-adf9242dd9cf,eeab4944-78d1-4cd8-923a-12197380d93e
9,How can I trace with the @traceable decorator?,"To trace with the `@traceable` decorator, simp...",,To trace with the @traceable decorator in Pyth...,1,2.226589,f7309db8-9cbd-4564-9d68-fa4222c5b246,e7c4311b-c418-4921-8495-0104bc365645


### example with my dataset

In [72]:
evaluate(
    target_function,
    data=dataset_name2,
    evaluators=[is_concise_enough],
    experiment_prefix="three repetitions",
    num_repetitions=3   # This field defaults to 1
)

View the evaluation results for experiment: 'three repetitions-d7d9494b' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=e44b0c38-b9a6-4c44-9e52-049b30b9bc54




60it [02:49,  2.83s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Is knowledge justified true belief?,The traditional definition of knowledge in epi...,,The traditional definition of knowledge as jus...,1,2.81717,117529de-2f47-4ea5-8f75-fc006c6875d8,002f4599-335b-4158-b7eb-e11553780576
1,Can we ever truly know reality?,The question of whether we can truly know real...,,This epistemological question divides realists...,1,2.950104,67eee66d-43e6-4e17-b3aa-a18f815db3d1,985d230f-e9e6-4fa8-b9f3-fcaaaeea5136
2,What is the meaning of life?,I don't know the answer to that.,,The meaning of life is a fundamental philosoph...,1,1.673535,78b31e3c-7d35-4583-a123-43d311009fa4,b9df855d-450f-4a26-be4d-7581f7b6e97f
3,Is free will an illusion?,I don't know.,,The free will debate divides philosophers into...,1,1.057686,878ae5e4-43e5-4978-bb8c-5a135c268ab3,5a7527db-4b5f-4121-b72d-63bf3509b8dd
4,What is justice?,Justice is the concept of fairness and moral r...,,Justice theories span from Plato's harmony and...,1,2.20334,9cc250b2-8e12-443c-a97f-080feec0c80d,3ed22bfe-5c45-4ee9-82c3-fbab4773efe4
5,Is there an objective truth?,The concept of objective truth is a philosophi...,,Objectivists maintain truth exists independent...,1,2.793777,b1bfbfb3-c13a-40e5-96a5-eec6b808ea2e,4915371d-8ddf-48dd-a3cc-667f4d5123be
6,Does God exist?,I don't possess the capability to answer philo...,,Classical arguments for God's existence includ...,1,1.980048,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,c85309cf-5273-4928-9daa-a9cfe5fd7950
7,What makes an action morally right?,The concept of what makes an action morally ri...,,Different ethical frameworks offer competing a...,1,2.292788,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,8d51df87-3c1b-4723-bbbe-1aa5dd6e2776
8,What is consciousness?,Consciousness is the state of being aware of a...,,Consciousness remains one of philosophy's hard...,1,2.402442,cdbbdc85-94b0-486a-a385-71e23525cd32,d3102d35-22fa-49e8-a91b-071db94cbf76
9,What is the nature of personal identity?,The nature of personal identity typically invo...,,Personal identity theories address what makes ...,1,2.689828,ff329e5e-25ea-4906-a698-95aa680f61aa,9243d842-9bec-451b-9bb8-9738d116ef89


##### Concurrency
You can also kick off concurrent threads of execution to make your experiments finish faster!

In [73]:
evaluate(
    target_function,
    data=dataset_name,
    evaluators=[is_concise_enough],
    experiment_prefix="concurrency",
    max_concurrency=3,  # This defaults to None, so this is an improvement!
)

View the evaluation results for experiment: 'concurrency-c383f151' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=401d545b-1306-4fe2-bd6f-ca5d15419df6




27it [00:24,  1.09it/s]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,1.799366,b4436f86-8cd3-44eb-bd06-1f68f1c94750,d2695583-905a-49b3-9386-a3fe551b9b02
1,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation, all...",,"Yes, LangSmith supports online evaluation as a...",1,1.861802,b834b374-bcdd-4fa8-8545-196e4de07362,4dc1461f-cffe-4417-941c-1fdf4c01ed18
2,Can LangSmith be used for finetuning and model...,The provided context does not mention finetuni...,,"Yes, LangSmith can be used for fine-tuning and...",1,2.174315,61287e02-6887-407d-9abc-b7c2a4288c93,24019329-7664-4ae7-853d-19a8ba306870
3,What testing capabilities does LangSmith have?,LangSmith allows you to run multiple experimen...,,LangSmith offers capabilities for creating dat...,1,2.222254,244cb784-9bb6-4adf-a064-b072a3e1c5c9,07550ddb-23aa-42cb-971d-068e215715cc
4,How do I pass metadata in with @traceable?,"To pass metadata with `@traceable`, you can ut...",,You can pass metadata with the @traceable deco...,1,2.195048,3533a525-3638-4b78-8f71-b1e52765485e,87832f25-098e-47a1-b9eb-e825d64102b9
5,Does LangSmith support offline evaluation?,The provided context details LangSmith's suppo...,,"Yes, LangSmith supports offline evaluation thr...",1,2.563764,d0967bfc-73ae-45d2-904f-45a9f8532877,7ca39bef-c3f7-45a8-a289-5ee72244be7c
6,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,1,4.848499,1bf45575-6078-478b-a209-41f63d169af8,7a29b700-bcce-4f84-b894-39f7ffa49b77
7,What is LangSmith used for in three sentences?,LangSmith is a platform designed for building ...,,LangSmith is a platform designed for the devel...,1,3.158744,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,771d1387-1471-41e2-bbdf-ee0c3ba39d58
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,2.462154,f54fecb2-fb97-476d-abc6-adf9242dd9cf,be3ceb07-f89d-4d50-bca1-9599403fb15d
9,How can I trace with the @traceable decorator?,To trace with the @traceable decorator in Lang...,,To trace with the @traceable decorator in Pyth...,1,3.132986,f7309db8-9cbd-4564-9d68-fa4222c5b246,31d7f027-82f0-4ba1-8c7c-ee19d712aea9


### example with my dataset



In [74]:
evaluate(
    target_function,
    data=dataset_name2,
    evaluators=[is_concise_enough],
    experiment_prefix="concurrency",
    max_concurrency=8,  # This defaults to None, so this is an improvement!
)

View the evaluation results for experiment: 'concurrency-babfc354' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=35bc7d66-bbae-4790-9651-bb7f4092f3fd




20it [00:11,  1.71it/s]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Is free will an illusion?,I don't know.,,The free will debate divides philosophers into...,1,1.093875,878ae5e4-43e5-4978-bb8c-5a135c268ab3,55396d9e-3db7-4d3e-81d1-0f11d4eb2588
1,What is the meaning of life?,I don't know the answer to that question.,,The meaning of life is a fundamental philosoph...,1,1.412,78b31e3c-7d35-4583-a123-43d311009fa4,32c3ff67-c849-486a-b4a2-507163e40ed3
2,Does God exist?,I don't know.,,Classical arguments for God's existence includ...,1,1.962234,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,0635d9d4-c3f3-4937-9a44-e320079c559e
3,Can we ever truly know reality?,The question of whether we can truly know real...,,This epistemological question divides realists...,1,2.317825,67eee66d-43e6-4e17-b3aa-a18f815db3d1,b26248eb-ed92-4c1e-9656-6b4d5ebb93c6
4,What makes an action morally right?,Determining what makes an action morally right...,,Different ethical frameworks offer competing a...,1,2.38414,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,431b0052-3e35-4d20-9f05-5f74e759d27b
5,Is there an objective truth?,The existence of an objective truth is a widel...,,Objectivists maintain truth exists independent...,1,3.364156,b1bfbfb3-c13a-40e5-96a5-eec6b808ea2e,66c7027c-7c50-451e-a971-364f57f83022
6,What is justice?,Justice is the concept of fairness and moral r...,,Justice theories span from Plato's harmony and...,1,3.387505,9cc250b2-8e12-443c-a97f-080feec0c80d,23d1ccca-bd1a-4fb0-a538-8f2c7c4c3e79
7,What is consciousness?,Consciousness refers to the state of being awa...,,Consciousness remains one of philosophy's hard...,1,2.6146,cdbbdc85-94b0-486a-a385-71e23525cd32,892768f7-b1c3-41ed-9aaa-822a170408e1
8,Can we ever truly know reality?,Understanding reality is a complex philosophic...,,This epistemological question divides realists...,1,2.359438,54fdf696-a0da-4223-9399-e8fd6b5bca26,42ad9cf1-ce4c-4820-ae99-7e9b730a2f61
9,Is knowledge justified true belief?,"The concept of ""knowledge as justified true be...",,The traditional definition of knowledge as jus...,0,4.521923,117529de-2f47-4ea5-8f75-fc006c6875d8,5428856b-e008-4f55-b9e2-1beff82884b1


##### Metadata 

You can (and should) add metadata to your experiments, to make them easier to find in the UI

In [75]:
evaluate(
    target_function,
    data=dataset_name,
    evaluators=[is_concise_enough],
    experiment_prefix="metadata added",
    metadata={  # We can pass custom metadata for the experiment, such as the model name
        "model_name": MODEL_NAME
    }
)

View the evaluation results for experiment: 'metadata added-1fa66d51' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=9056d2a0-a1f7-40f3-a048-5ccaa53cebc8




27it [01:05,  2.41s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,LangSmith is primarily a platform for building...,,"Yes, LangSmith can be used for fine-tuning and...",1,2.555384,61287e02-6887-407d-9abc-b7c2a4288c93,b6a789b1-7d72-48ff-9200-c4a7fa7cb123
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,1,2.339793,b4436f86-8cd3-44eb-bd06-1f68f1c94750,4a07f014-1334-47ab-b6d1-8a49de76666c
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. It ...",,"Yes, LangSmith supports online evaluation as a...",1,2.168668,b834b374-bcdd-4fa8-8545-196e4de07362,1851dcda-e76f-4aeb-a0c0-cf0b812861e5
3,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,1,2.484966,1bf45575-6078-478b-a209-41f63d169af8,022ca88c-c697-4fbb-b775-dff746d355b9
4,What testing capabilities does LangSmith have?,LangSmith allows you to run multiple experimen...,,LangSmith offers capabilities for creating dat...,1,1.908239,244cb784-9bb6-4adf-a064-b072a3e1c5c9,0255a573-1774-467f-adef-9206b1c6c685
5,How do I pass metadata in with @traceable?,"To pass metadata with `@traceable`, you can sp...",,You can pass metadata with the @traceable deco...,0,3.026719,3533a525-3638-4b78-8f71-b1e52765485e,54585c89-976a-48ec-9782-02b47b214056
6,Does LangSmith support offline evaluation?,The provided context focuses on online evaluat...,,"Yes, LangSmith supports offline evaluation thr...",1,2.562117,d0967bfc-73ae-45d2-904f-45a9f8532877,0f974006-e444-4e32-b26e-8819795f94ce
7,What is LangSmith used for in three sentences?,LangSmith is a platform for building productio...,,LangSmith is a platform designed for the devel...,1,2.634535,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,9947c2a6-78fa-43be-a8a0-c0988dafc87f
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",1,1.787273,f54fecb2-fb97-476d-abc6-adf9242dd9cf,b5b03153-d289-4896-8e6a-b78a63ebad33
9,How can I trace with the @traceable decorator?,To trace using the @traceable decorator in Pyt...,,To trace with the @traceable decorator in Pyth...,1,2.160397,f7309db8-9cbd-4564-9d68-fa4222c5b246,5721b05a-62a3-465b-a925-bb2d099f4545


### Added custom metadata

In [76]:
evaluate(
    target_function,
    data=dataset_name2,
    evaluators=[is_concise_enough],
    experiment_prefix="metadata added 2",
    metadata={  # We can pass custom metadata for the experiment, such as the model name
        "module2_video3": "Custom Experiment",
    }
)

View the evaluation results for experiment: 'metadata added 2-9e894ad7' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=4da4acc0-5109-425e-b70a-21776ea358c1




20it [00:59,  2.96s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.is_concise,execution_time,example_id,id
0,Is knowledge justified true belief?,The idea of knowledge as justified true belief...,,The traditional definition of knowledge as jus...,1,3.644404,117529de-2f47-4ea5-8f75-fc006c6875d8,0b6ec570-0ba3-49d2-86a7-560ac6760146
1,Can we ever truly know reality?,The question of whether we can truly know real...,,This epistemological question divides realists...,1,2.964635,67eee66d-43e6-4e17-b3aa-a18f815db3d1,0b4dde4c-c21c-4769-94ab-f4882c6dafdc
2,What is the meaning of life?,The meaning of life is a philosophical questio...,,The meaning of life is a fundamental philosoph...,1,2.395482,78b31e3c-7d35-4583-a123-43d311009fa4,21a60423-7cbf-4d46-b2bc-0bd2b84c604f
3,Is free will an illusion?,The question of whether free will is an illusi...,,The free will debate divides philosophers into...,1,2.974984,878ae5e4-43e5-4978-bb8c-5a135c268ab3,bee24817-427f-40e4-9109-e8d41c77226a
4,What is justice?,"Justice is a concept encompassing fairness, mo...",,Justice theories span from Plato's harmony and...,1,2.814656,9cc250b2-8e12-443c-a97f-080feec0c80d,70ac1889-6a99-42d3-846f-51f7a2ceae0f
5,Is there an objective truth?,The concept of objective truth refers to the i...,,Objectivists maintain truth exists independent...,1,4.480967,b1bfbfb3-c13a-40e5-96a5-eec6b808ea2e,bcdc3327-0bba-4f5b-b799-f8588770701c
6,Does God exist?,I don't know.,,Classical arguments for God's existence includ...,1,1.312854,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,1bc43182-c73b-451e-badd-8edfb5f6e033
7,What makes an action morally right?,The moral rightness of an action is often dete...,,Different ethical frameworks offer competing a...,0,3.455241,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,f82ea383-20df-443d-857e-18a090d3573a
8,What is consciousness?,Consciousness is the state or quality of aware...,,Consciousness remains one of philosophy's hard...,1,2.743593,cdbbdc85-94b0-486a-a385-71e23525cd32,ee27a6fd-e351-4cd0-8e44-9f779c0e2055
9,What is the nature of personal identity?,I don't know.,,Personal identity theories address what makes ...,1,1.510204,ff329e5e-25ea-4906-a698-95aa680f61aa,6d63dcb4-1e19-4aee-94be-b5f087164b69


# I have created and ran my own experiment which evaluates 3 parameters, 'contains key terms' , 'has proper citation', and 'is not too short' which are all defined in the program below

## At first I ran it on the Dataset named "RAG Application Golden Dataset"

In [77]:
from langsmith import evaluate, Client

client = Client()
dataset_name = "RAG Application Golden Dataset"

def contains_key_terms(reference_outputs: dict, outputs: dict) -> dict:
    """Check if output contains important keywords from reference"""
    # Extract key terms from reference (words longer than 4 chars as a simple heuristic)
    reference_words = set(word.lower() for word in reference_outputs["output"].split() if len(word) > 4)
    output_words = set(word.lower() for word in outputs["output"].split())
    
    # Calculate how many key terms are present
    matching_terms = reference_words.intersection(output_words)
    score = len(matching_terms) / len(reference_words) if reference_words else 0
    
    return {"key": "key_terms_coverage", "score": score}

def has_proper_citations(reference_outputs: dict, outputs: dict) -> dict:
    """Check if output includes source citations or references"""
    output_text = outputs["output"].lower()
    citation_indicators = ["source:", "according to", "reference:", "[", "cited", "from"]
    
    has_citations = any(indicator in output_text for indicator in citation_indicators)
    
    return {"key": "has_citations", "score": int(has_citations)}

def is_not_too_short(reference_outputs: dict, outputs: dict) -> dict:
    """Ensure output isn't too brief compared to reference"""
    min_length_ratio = 0.5
    score = len(outputs["output"]) >= min_length_ratio * len(reference_outputs["output"])
    
    return {"key": "sufficient_length", "score": int(score)}

def target_function(inputs: dict):
    return langsmith_rag(inputs["question"])

evaluate(
    target_function,
    data=dataset_name,
    evaluators=[contains_key_terms, has_proper_citations, is_not_too_short],
    experiment_prefix="gpt-4o-with-citations",
    metadata={"model": "gpt-4o", "test_type": "citation_quality"}
)

View the evaluation results for experiment: 'gpt-4o-with-citations-e1bf77a5' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=87091c06-3adb-4790-8bcd-5059004df0f0




27it [01:04,  2.38s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.key_terms_coverage,feedback.has_citations,feedback.sufficient_length,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,LangSmith is designed for observability and ev...,,"Yes, LangSmith can be used for fine-tuning and...",0.15,0,1,2.588382,61287e02-6887-407d-9abc-b7c2a4288c93,b2b89116-e177-42a8-9a1a-0c43cac630b5
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,0.130435,1,1,2.453912,b4436f86-8cd3-44eb-bd06-1f68f1c94750,dd9d885f-a539-4845-9618-7595104b9af5
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. It ...",,"Yes, LangSmith supports online evaluation as a...",0.333333,0,1,1.815845,b834b374-bcdd-4fa8-8545-196e4de07362,9bf7229b-50c8-45b1-8721-beab7a76cb2d
3,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith using LangChain...,,To set up tracing to LangSmith while using Lan...,0.352941,0,1,2.752228,1bf45575-6078-478b-a209-41f63d169af8,c22ba45c-799b-484c-979e-462c908e85e4
4,What testing capabilities does LangSmith have?,LangSmith allows you to run multiple experimen...,,LangSmith offers capabilities for creating dat...,0.066667,0,1,2.162308,244cb784-9bb6-4adf-a064-b072a3e1c5c9,b53f25ab-6342-48c2-99fd-3b78008fb27f
5,How do I pass metadata in with @traceable?,To pass metadata when using the `@traceable` d...,,You can pass metadata with the @traceable deco...,0.217391,0,1,2.370336,3533a525-3638-4b78-8f71-b1e52765485e,d1e2500e-4fc8-4d21-b601-bc3adc327caa
6,Does LangSmith support offline evaluation?,LangSmith currently supports online evaluation...,,"Yes, LangSmith supports offline evaluation thr...",0.208333,0,1,1.674732,d0967bfc-73ae-45d2-904f-45a9f8532877,fd672641-d592-4fbb-b5c7-4be71df64bbe
7,What is LangSmith used for in three sentences?,LangSmith is a platform designed for building ...,,LangSmith is a platform designed for the devel...,0.172414,0,1,2.459788,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,f4af5041-94ee-44d7-95de-622df6cdf188
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",0.347826,0,1,2.080166,f54fecb2-fb97-476d-abc6-adf9242dd9cf,37ee877e-aa97-4ff9-bd93-7fc492fccba0
9,How can I trace with the @traceable decorator?,To trace with the @traceable decorator in Pyth...,,To trace with the @traceable decorator in Pyth...,0.464286,1,1,2.447378,f7309db8-9cbd-4564-9d68-fa4222c5b246,b71d72e0-3245-4b6c-937a-cd66dde051e2


## I ran my custom experiment on the second dataset which I had created earlier in module 2 video 1 and saw the results

In [78]:
from langsmith import evaluate, Client

client = Client()
dataset_name2 = "MAT_496_1"

def contains_key_terms(reference_outputs: dict, outputs: dict) -> dict:
    """Check if output contains important keywords from reference"""
    # Extract key terms from reference (words longer than 4 chars as a simple heuristic)
    reference_words = set(word.lower() for word in reference_outputs["output"].split() if len(word) > 4)
    output_words = set(word.lower() for word in outputs["output"].split())
    
    # Calculate how many key terms are present
    matching_terms = reference_words.intersection(output_words)
    score = len(matching_terms) / len(reference_words) if reference_words else 0
    
    return {"key": "key_terms_coverage", "score": score}

def has_proper_citations(reference_outputs: dict, outputs: dict) -> dict:
    """Check if output includes source citations or references"""
    output_text = outputs["output"].lower()
    citation_indicators = ["source:", "according to", "reference:", "[", "cited", "from"]
    
    has_citations = any(indicator in output_text for indicator in citation_indicators)
    
    return {"key": "has_citations", "score": int(has_citations)}

def is_not_too_short(reference_outputs: dict, outputs: dict) -> dict:
    """Ensure output isn't too brief compared to reference"""
    min_length_ratio = 0.5
    score = len(outputs["output"]) >= min_length_ratio * len(reference_outputs["output"])
    
    return {"key": "sufficient_length", "score": int(score)}

def target_function(inputs: dict):
    return langsmith_rag(inputs["question"])

evaluate(
    target_function,
    data=dataset_name2,
    evaluators=[contains_key_terms, has_proper_citations, is_not_too_short],
    experiment_prefix="gpt-4o-with-citations",
    metadata={"model": "gpt-4o", "test_type": "citation_quality"}
)

View the evaluation results for experiment: 'gpt-4o-with-citations-299fd527' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=a5654dd6-8b31-4255-aaab-46c037f93d96




20it [00:49,  2.46s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.key_terms_coverage,feedback.has_citations,feedback.sufficient_length,execution_time,example_id,id
0,Is knowledge justified true belief?,"The concept of knowledge as ""justified true be...",,The traditional definition of knowledge as jus...,0.296296,1,1,3.078593,117529de-2f47-4ea5-8f75-fc006c6875d8,33d42b49-9276-4fd7-be46-c45303b3228e
1,Can we ever truly know reality?,The question of whether we can truly know real...,,This epistemological question divides realists...,0.111111,0,1,2.861663,67eee66d-43e6-4e17-b3aa-a18f815db3d1,c2b2a041-a44c-4dfe-b3d2-069f4942182e
2,What is the meaning of life?,I don't know.,,The meaning of life is a fundamental philosoph...,0.0,0,0,1.168776,78b31e3c-7d35-4583-a123-43d311009fa4,19eeaf6d-de8d-404d-8305-ed8b53a613b5
3,Is free will an illusion?,I don't know.,,The free will debate divides philosophers into...,0.0,0,0,1.15207,878ae5e4-43e5-4978-bb8c-5a135c268ab3,a8cd71d3-b8a4-4d70-a540-470cf272e253
4,What is justice?,Justice is a concept of moral rightness based ...,,Justice theories span from Plato's harmony and...,0.035714,0,1,2.240879,9cc250b2-8e12-443c-a97f-080feec0c80d,d6032b39-a08d-4765-9da1-66036de58aea
5,Is there an objective truth?,The question of whether there is an objective ...,,Objectivists maintain truth exists independent...,0.166667,0,1,2.209639,b1bfbfb3-c13a-40e5-96a5-eec6b808ea2e,bd7adb59-c3aa-4fa2-815e-ade2490f9242
6,Does God exist?,The existence of God is a deeply philosophical...,,Classical arguments for God's existence includ...,0.142857,0,1,2.095427,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,e1b06c1e-85f8-44f4-a59c-ece25f936875
7,What makes an action morally right?,Determining what makes an action morally right...,,Different ethical frameworks offer competing a...,0.25,0,1,2.73112,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,f980e634-24a2-47c2-8e68-7761429e95ad
8,What is consciousness?,Consciousness is commonly understood as the st...,,Consciousness remains one of philosophy's hard...,0.04,0,1,2.556373,cdbbdc85-94b0-486a-a385-71e23525cd32,6ab1af82-178c-473b-92c3-cc84f59db767
9,What is the nature of personal identity?,The nature of personal identity involves under...,,Personal identity theories address what makes ...,0.375,0,1,4.870287,ff329e5e-25ea-4906-a698-95aa680f61aa,97dab958-c476-4fae-87ea-60ec855d201d


## Added custom Metadata on my own experiment 

In [79]:
evaluate(
    target_function,
    data="RAG Application Golden Dataset",  # Using the original dataset name
    evaluators=[contains_key_terms, has_proper_citations, is_not_too_short],  # All three evaluators
    experiment_prefix="gpt-4o-with-citations",
    metadata={  # Custom metadata for filtering/organizing experiments
        "model": "gpt-4o",
        "test_type": "citation_quality",
        "evaluator_count": 3,
        "includes_citation_check": True
    }
)

View the evaluation results for experiment: 'gpt-4o-with-citations-26a909de' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/5089da53-878c-4884-be32-ae08af9099ee/compare?selectedSessions=a4346041-6ba7-4e8e-8ce1-db91d07127a9




27it [01:13,  2.73s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.key_terms_coverage,feedback.has_citations,feedback.sufficient_length,execution_time,example_id,id
0,Can LangSmith be used for finetuning and model...,"No, LangSmith is primarily for observability a...",,"Yes, LangSmith can be used for fine-tuning and...",0.2,0,1,2.183506,61287e02-6887-407d-9abc-b7c2a4288c93,7993eb71-9608-497b-8293-28bb8dc3ede0
1,How do I create user feedback with the LangSmi...,To create user feedback with the LangSmith SDK...,,To create user feedback with the LangSmith SDK...,0.130435,0,1,2.325114,b4436f86-8cd3-44eb-bd06-1f68f1c94750,b92adf8a-8768-4355-9e15-35c0b19c90b6
2,Does LangSmith support online evaluation?,"Yes, LangSmith supports online evaluation. You...",,"Yes, LangSmith supports online evaluation as a...",0.333333,0,1,2.340469,b834b374-bcdd-4fa8-8545-196e4de07362,45cd309d-44c6-4df5-aef0-ee109ce95321
3,How do I set up tracing to LangSmith if I'm us...,To set up tracing to LangSmith while using Lan...,,To set up tracing to LangSmith while using Lan...,0.352941,0,1,2.280083,1bf45575-6078-478b-a209-41f63d169af8,fc11d646-b4ef-45ac-b1e6-14fd91dca1b0
4,What testing capabilities does LangSmith have?,LangSmith allows you to run multiple experimen...,,LangSmith offers capabilities for creating dat...,0.1,0,1,2.050552,244cb784-9bb6-4adf-a064-b072a3e1c5c9,fb71660b-59ee-47ac-a1f0-2651aaee1ca3
5,How do I pass metadata in with @traceable?,You can pass metadata in with `@traceable` by ...,,You can pass metadata with the @traceable deco...,0.130435,0,1,2.837263,3533a525-3638-4b78-8f71-b1e52765485e,ce937ab0-19b2-4675-9d3a-ca4bcb7b03c5
6,Does LangSmith support offline evaluation?,The provided context does not explicitly menti...,,"Yes, LangSmith supports offline evaluation thr...",0.208333,0,1,2.198561,d0967bfc-73ae-45d2-904f-45a9f8532877,3df42ef6-c3bb-4b4f-8ed6-94fdd8fd7c28
7,What is LangSmith used for in three sentences?,LangSmith is a platform designed for building ...,,LangSmith is a platform designed for the devel...,0.172414,0,1,2.078909,dc60f4f0-c31e-4a69-9d5e-df2176401eb1,f368f1cd-3e38-4a02-a453-4ac4c2389ee6
8,Can LangSmith be used to evaluate agents?,"Yes, LangSmith can be used to evaluate agents....",,"Yes, LangSmith can be used to evaluate agents....",0.173913,0,1,2.173619,f54fecb2-fb97-476d-abc6-adf9242dd9cf,9870988f-174f-4220-8990-050a2e554a74
9,How can I trace with the @traceable decorator?,To trace with the @traceable decorator in Pyth...,,To trace with the @traceable decorator in Pyth...,0.464286,0,1,15.963412,f7309db8-9cbd-4564-9d68-fa4222c5b246,b0bc5075-72f8-4bfa-bcdf-70af9317462a


## Running evaluation on a custom set of Examples using my own Experiment

In [80]:
evaluate(
    target_function,
    data=client.list_examples(
        dataset_name=dataset_name2, 
        example_ids=[ 
            "878ae5e4-43e5-4978-bb8c-5a135c268ab3",
            "78b31e3c-7d35-4583-a123-43d311009fa4",
            "2c6b6d4d-610d-42ab-a3c2-5ddd4d76a9f6",
            "c18c4cef-a74e-4cb3-9472-cdc87c7d2f76",
            "ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074"
        ]
    ),
     evaluators=[contains_key_terms, has_proper_citations, is_not_too_short],
 experiment_prefix="five specific example ids"
)

View the evaluation results for experiment: 'five specific example ids-b5d8d80e' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=eded0992-eeb6-4479-91cf-58cf40ecbdcc




5it [00:11,  2.27s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.key_terms_coverage,feedback.has_citations,feedback.sufficient_length,execution_time,example_id,id
0,What makes an action morally right?,Determining whether an action is morally right...,,Different ethical frameworks offer competing a...,0.09375,1,1,3.674037,2c6b6d4d-610d-42ab-a3c2-5ddd4d76a9f6,084bacc4-ec50-45cf-b2ff-3ccb01c1731e
1,What is the meaning of life?,I don't know.,,The meaning of life is a fundamental philosoph...,0.0,0,0,1.043771,78b31e3c-7d35-4583-a123-43d311009fa4,03b399c4-f373-4276-8496-03f3678d879c
2,Is free will an illusion?,The question of whether free will is an illusi...,,The free will debate divides philosophers into...,0.16,0,1,2.000097,878ae5e4-43e5-4978-bb8c-5a135c268ab3,24108eaa-6789-4658-98fa-a12c135ae05b
3,Does God exist?,I don't know. The question of God's existence ...,,Classical arguments for God's existence includ...,0.107143,0,1,1.688201,c18c4cef-a74e-4cb3-9472-cdc87c7d2f76,a75d2cb0-9674-497c-94a6-27a91fb169de
4,What makes an action morally right?,Determining what makes an action morally right...,,Different ethical frameworks offer competing a...,0.15625,1,1,2.432187,ccd7bc96-bbd3-4f5f-9ac4-4f68b26cd074,3e28f27f-8a65-47f3-b76d-ac673b4155a8


## Running evaluation on a custom split using my own Experiment

In [81]:
evaluate(
    target_function,
    data=client.list_examples(dataset_name=dataset_name2, splits=["Crucial examples"]),  # We pass in a list of Splits
    evaluators=[contains_key_terms, has_proper_citations, is_not_too_short],
    experiment_prefix="Crucial Examples split for 2nd experiment"
)

View the evaluation results for experiment: 'Crucial Examples split for 2nd experiment-f3c1f9e3' at:
https://smith.langchain.com/o/58237f5e-f0c5-4c78-b71c-186c54d72106/datasets/53be0cd4-34f9-453f-b4af-90dd9188017c/compare?selectedSessions=f05777e1-3de9-42c9-9bd9-ffbad766c2e6




3it [00:22,  7.65s/it]


Unnamed: 0,inputs.question,outputs.output,error,reference.output,feedback.key_terms_coverage,feedback.has_citations,feedback.sufficient_length,execution_time,example_id,id
0,Is free will an illusion?,The question of whether free will is an illusi...,,The free will debate divides philosophers into...,0.2,0,1,2.727635,1675fa4d-4879-49b7-88bf-0e7e76b54dd1,b566ef39-3767-4ea7-a887-47b58730644d
1,What is the nature of personal identity?,I don't know.,,Personal identity theories address what makes ...,0.0,0,0,1.142104,4e3a24ac-822d-4a94-8eed-0dc9c8407371,09f919e9-7413-4fc3-8c8d-4f87d803dc75
2,What is justice?,Justice is the concept of moral righteousness ...,,Justice theories span from Plato's harmony and...,0.071429,0,1,18.557804,8c01e154-e8a4-4108-8ea3-775756d94a0b,73269fd1-c34a-4ada-93d1-1ed3d571acf4
