# Few Shot Datasets Quickstart

Thank you for agreeing to beta test our new product around few shot dynamic example selection!

This notebook is designed to help you get started. As a prereq, please do the following:
* Set `LANGCHAIN_API_KEY` in your environment
* Set `LANGCHAIN_TRACING_V2` to true in your environment
* Set `LANGCHAIN_PROJECT` to a unique project for debugging. It's helpful to view langsmith traces to understand how the few shot examples are getting retrieved
* Set an `OPENAI_API_KEY` in your environment (or replace the openai calls with another chat model)
* Confirm with Jake Rachleff (via email with jake@langchain.dev or via slack) that you have been allowlisted for this feature


## Problem Setup
When writing your few shot prompt, you have to take into account the input/output schemas of your past data that flowed through your application. In the following example code, our application answers questions about langchain. Our dataset is formulated as follows:

### Inputs
```
{ "question": <a question about langchain> }
```

### Outputs
```
{ "answer": <an answer to the question> }
```

## Before Few Shot Prompting
We'll ask a question below that the LLM does not know the answer to. This will show how past examples can increase the LLM's ability to answer questions correctly.

In [None]:
from openai import Client

openai = Client()

incoming_question = {
    "question": "Why would I use a runnable lambda in LangChain? What does it do?"
}

system_prompt = """
You are an expert on the LLM framework LangChain. Your job is to answer questions provided by users of the framework 
as succinctly and accurately as possible."
"""

messages = [
    {
        "role": "system",
        "content": system_prompt
    },
    {"role": "user", "content": incoming_question["question"]},
]
result = openai.chat.completions.create(
    messages=messages, model="gpt-3.5-turbo", temperature=0
)
result.choices[0].message.content

## Create Dataset
Make sure you have set up the data.json file that was distributed with this notebook!

In [None]:
from langsmith import Client
import json 

langsmith_client = Client()

with open('data.json', 'r') as f:
    examples = json.load(f)

test_dataset_name = "LangSmith Few Shot Datasets Notebook"

has_dataset = langsmith_client.has_dataset(dataset_name=test_dataset_name)
print(has_dataset)
if not has_dataset:
    dataset_id = langsmith_client.create_dataset(dataset_name=test_dataset_name).id
    langsmith_client.create_examples(
        dataset_name = test_dataset_name,
        inputs=[e['inputs'] for e in examples],
        outputs=[e['outputs'] for e in examples],
    )
else:
    dataset_id = list(langsmith_client.list_datasets(dataset_name=test_dataset_name))[0].id

## Indexing your Dataset for Dynamic Example Selection

<div class="alert alert-block alert-info">
⚠️ You'll need to access the LangSmith UI to do the following steps! They are available via API
    as well of course, but this walk through assumes you do them via UI.
</div>

1. **Navigate to the Datasets page in LangSmith**
2. **Click on the Few-Shot Index button on the top navigation**
3. **Wait until the indexing process completes**
   Note: you can click on/off the button to make the current indexing state auto-reload. For small datasets this should happen almost instantly.

## Helper code - feel free to skip
This section sets up code that can be used to query a langsmith for few shot examples. A polished version of this will end up in the LangSmith SDK so you can skip to the next section. It will:
1. creates a langsmith api client for our REST API
2. creates a small helpers for searching our new indexed dataset via REST API. We decorate this to be traced into LangSmith so you can see the example selection process.
3. creates a new langchain example selector using this REST API

In [None]:
import httpx
import uuid
import os

langsmith_api_client = httpx.Client(
    transport=httpx.HTTPTransport(
        retries=5,  # this applies only to ConnectError, ConnectTimeout
        limits=httpx.Limits(
            max_keepalive_connections=20,
            keepalive_expiry=120.0,
        ),
    ),
    timeout=httpx.Timeout(
        connect=5.0,
        read=10.0,
        write=30.0,
        pool=30.0,
    ),
    headers={
        "x-api-key": os.environ.get("LANGCHAIN_API_KEY")
    },
    base_url="https://api.smith.langchain.com",
)

In [None]:
from typing import Any

def search_similar_examples(
    dataset_id: str, 
    inputs_dict: dict[str, Any], 
    limit: int = 5) -> list[dict[str, Any]]:
    few_shot_resp = langsmith_api_client.post(
        f"datasets/{dataset_id}/search",
        json={
            "inputs": inputs_dict,
            "limit": limit
        }
    )
    few_shot_resp.raise_for_status()
    return few_shot_resp.json()

In [None]:
from langchain_core.example_selectors import BaseExampleSelector
from typing import Dict, List

from langsmith import traceable

class LangSmithDatasetExampleSelector(BaseExampleSelector):

    def __init__(
        self,
        dataset_id,
        num_examples: int = 5):
        self.dataset_id = dataset_id
        self.num_examples = num_examples

    def add_example(self, example: Dict[str, str]) -> Any:
        raise NotImplementedError()

    @traceable
    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        return search_similar_examples(
            dataset_id=self.dataset_id,
            inputs_dict=input_variables,
            limit=self.num_examples
        )["examples"]

## Setup Shared Parameters
Below, we will offer two ways to use few shot examples:
1. with LangChain
2. without LangChain

You can opt to use whichever path you prefer. 

In [None]:
num_examples = 3
example_selector = LangSmithDatasetExampleSelector(dataset_id, num_examples)

## Few Shot Examples with LangChain

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate, SystemMessagePromptTemplate
from langchain_core.prompts.few_shot import FewShotPromptTemplate
from langchain_openai.chat_models import ChatOpenAI


example_prompt = PromptTemplate.from_template(
    """
        <example>
            <question> {{inputs.question}} </question>
            <answer> {{outputs.answer}} </answer>
        </example>
    """,
    template_format="mustache"
)

prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="You are an expert on LangChain, an LLM development framework. Answer the question succinctly and correctly.",
    suffix="",
    input_variables=["question"]

)
final_prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessagePromptTemplate(prompt=prompt),
        ("human", "{question}"),
    ]
)

chain = final_prompt | ChatOpenAI() | StrOutputParser()


In [None]:
for chunk in chain.stream(incoming_question):
    print(chunk, end="")

## Few Shot Examples without LangChain

In [None]:
# Please overwrite the prompts as you see fit

from langsmith import traceable, wrappers

@traceable
def generate_example_prompt(example: Dict[str, Any]):
    # TODO: FILL IN WITH YOUR FEW SHOT PROMPT
    return f"""
        <example>
            <question> {example['inputs']['question']}</question>
            <answer> {example['outputs']['answer']} </answer>
        </example>
        """

# TODO: FILL IN WITH YOUR BASE PROMPT
base_prompt = """
    You are an expert on the LLM framework LangChain. Your job is to answer questions provided by users of the 
    framework as succinctly and accurately as possible.
"""

In [None]:
from openai import Client
from langsmith import traceable, wrappers

openai = wrappers.wrap_openai(Client())


@traceable
def run_few_shot_prompt(inputs: Dict[str, Any]):
    examples = search_similar_examples(
        dataset_id=dataset_id,
        inputs_dict=inputs,
        limit=num_examples
    )["examples"]

    example_prompt = "\n".join(generate_example_prompt(example) for example in examples)
    
    messages = [
        {
            "role": "system",
            "content": f"""
                You are an expert on LangChain, an LLM development framework. Answer the question succinctly and correctly.

                {example_prompt}
            """
        },
        {"role": "user", "content": inputs["question"]},
    ]
    result = openai.chat.completions.create(
        messages=messages, model="gpt-3.5-turbo", temperature=0
    )
    return result.choices[0].message.content

In [None]:
run_few_shot_prompt(incoming_question)