# Loading Retriever Runs and Feedback

For any Q&A application or RAG pipeline, the content and structure of the retrieved documents strongly influences your application's end-to-end quality. Filtering on user feedback is a good way to focus on cases you should improve. 

This walkthrough shows how to load retriever runs within traces that received negative user scores. We will read them into a pandas dataframe to easily interact with the data locally. 

This fetches runs in a similar fashion to that shown in the LangSmith app clip below, but it lets you programmatically handle batches of runs with your own code.

![LangSmith app filter by feedback then view](./img/filter_then_view.gif)

Before we start, ensure you have a LangChain project with retriever results with some logged traces. If you do not have a project prepared, you can run the first section of the  generate some using the [Testing QA Correctness](../../testing-examples/qa-correctness/qa-correctness.ipynb) notebook to get started. 


#### Setup

First, install langsmith and pandas and set your langsmith API key to connect to your project.

In [None]:
# %pip install -U langsmith pandas --quiet

In [None]:
# %env LANGCHAIN_API_KEY=""

In [1]:
from langsmith import Client

client = Client()

## 1. Query Runs

With the environment ready, load the run data from LangSmith. In a typical RAG application, the retriever is called within a chain or runnable sequence. However, user feedback is typically assigned to the _root run_ of a trace, since the user normally responds to end-to-end behavior.

To fetch retriever runs within these traces, make two calls:

1. Fetch root runs with feedback.
2. Fetch Retriever runs for those traces.

Please reference the [docs](https://docs.smith.langchain.com/tracing/use-cases/export-runs/local) for guidance on more complex filters (using metadata, tags, dates, and other attributes).

Below, start by fetching root traces in your project with "user_score" feedback less than or equal to 0.25.

In [2]:
from langsmith import Client

client = Client()

project_name = "chat-langchain" # Set to your project
feedback_key = "user_score"

root_runs_with_feedback = client.list_runs(
    project_name=project_name,
    execution_order=1,
    filter=f'and(eq(feedback_key, "{feedback_key}"), lte(feedback_score, 0.25))',
)

ids_with_feedback = {
    r.id: r.feedback_stats[feedback_key]['avg']
    for r in root_runs_with_feedback
}

Next, fetch retriever runs in the same project, selecting those in the traces filtered above.

In [8]:
retriever_runs = client.list_runs(
    project_name=project_name,
    run_type="retriever",
)

Finally, load into a dataframe to display locally! Here we add the feedback to the table from the `ids_with_feedback` dictionary above.

In [9]:
import pandas as pd

df = pd.DataFrame(
    [
        {
            **r.inputs,
            **(r.outputs or {}),
            feedback_key: ids_with_feedback[r.parent_run_ids[-1]],
        } for r in retriever_runs
        if r.parent_run_ids and r.parent_run_ids[-1] in ids_with_feedback
    ]
)

df.head(5)

Unnamed: 0,query,documents,user_score
0,What model is being used?,[{'metadata': {'title': 'Model I/O | 🦜️🔗 Langc...,0.0
1,How can you call initialize_agent with a custo...,[{'metadata': {'title': 'Prompt pipelining | 🦜...,0.0
2,What is the error message encountered in the c...,[{'metadata': {'title': 'Conversation Summary ...,0.0
3,How do I use a RecursiveUrlLoader to load cont...,[{'metadata': {'title': 'GitBook | 🦜️🔗 Langcha...,0.0
4,How can I create a prompt template?,[{'metadata': {'title': 'Prompt templates | 🦜️...,0.0


With the data stored locally, you can add additional columns. Let's check to see how  many documents were retrieved for each query, and display the page source of the first retrieved document.

In [15]:
df['num_retrieved_docs'] = df['documents'].apply(len)
df['first_source'] = df['documents'].apply(lambda x: x[0]['metadata'].get('source'))
df.head(5)

Unnamed: 0,query,documents,user_score,num_retrieved_docs,first_source
0,What model is being used?,[{'metadata': {'title': 'Model I/O | 🦜️🔗 Langc...,0.0,3,https://python.langchain.com/docs/modules/mode...
1,How can you call initialize_agent with a custo...,[{'metadata': {'title': 'Prompt pipelining | 🦜...,0.0,3,https://python.langchain.com/docs/modules/mode...
2,What is the error message encountered in the c...,[{'metadata': {'title': 'Conversation Summary ...,0.0,3,https://python.langchain.com/docs/modules/memo...
3,How do I use a RecursiveUrlLoader to load cont...,[{'metadata': {'title': 'GitBook | 🦜️🔗 Langcha...,0.0,3,https://python.langchain.com/docs/integrations...
4,How can I create a prompt template?,[{'metadata': {'title': 'Prompt templates | 🦜️...,0.0,3,https://python.langchain.com/docs/modules/mode...


## Conclusion

Congratulations! In this walkthrough, you loaded retriever runs for traces that received negative user scores. You can use this approach to better understand common failure modes of your RAG application.