## TODOs
* Metadata up with examples
* Version datasets
* Take advantage of session level feedback

First, we'll do some setup. Create a LangSmith API Key by navigating to the settings page in LangSmith, then set the following environment variables.
```
OPENAI_API_KEY=<YOUR OPENAI API KEY>
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=<YOUR PROJECT NAME>
LANGCHAIN_API_KEY=<YOUR LANGSMITH API KEY>

In [2]:
from langsmith import Client

client = Client()

In [3]:
toxic_examples = [
    ("Shut up, idiot", "Toxic"),
    ("You're a wonderful person", "Not toxic"),
    ("This is the worst thing ever", "Toxic"),
    ("I had a great day today", "Not toxic"),
    ("Nobody likes you", "Toxic"),
    ("This movie is a masterpiece", "Not toxic"),
    ("Go away and never come back", "Toxic"),
    ("Thank you for your help", "Not toxic"),
    ("This is so dumb", "Toxic"),
    ("I appreciate your efforts", "Not toxic"),
    ("This is a waste of time", "Toxic"),
    ("This movie blows", "Toxic"),
    ("This is unacceptable. I want to speak to the manager.", "Toxic")
]

toxic_dataset_name = "Toxic Queries Dataset"
toxic_dataset = client.create_dataset(dataset_name=toxic_dataset_name)
for query, label in toxic_examples:
    client.create_example(inputs={"text": query}, outputs={"label": label}, dataset_id=toxic_dataset.id)

In [4]:
# Define multi-turn examples
multi_turn_examples = [
    (["Recommend some family-friendly movies for tonight", "Do any of these have an educational theme?", "Which one has the highest ratings?"],
     ["Some family-friendly movies available are 'The Lion King', 'Finding Nemo', and 'The Incredibles'", "'The Lion King' and 'Finding Nemo' have educational themes about the circle of life and the importance of family", "'The Incredibles' has the highest ratings among them with a 94% on Rotten Tomatoes"]),
    (["What are the top sci-fi movies on your service?", "Any recent ones?", "Can you suggest one that involves time travel?"],
     ["Top sci-fi movies include 'Blade Runner 2049', 'Interstellar', and 'The Martian'", "A recent hit is 'Tenet', released in 2020", "'Interstellar' involves complex time travel themes and is highly recommended"]),
    (["I'm looking for movies directed by Christopher Nolan", "Which one would you recommend for a movie night?", "What's the plot of 'Inception'?"],
     ["Christopher Nolan movies available include 'Inception', 'Dunkirk', and 'Interstellar'", "'Inception' is a great pick for a movie night, offering a mix of action, drama, and mind-bending storytelling", "'Inception' is about a thief who steals corporate secrets through dream-sharing technology and is given the inverse task of planting an idea into the mind of a CEO"]),
    (["Show me some popular romantic comedies", "Any classics in the list?", "Tell me more about 'When Harry Met Sally'"],
     ["Popular romantic comedies include 'Crazy Rich Asians', 'The Big Sick', and 'When Harry Met Sally'", "'When Harry Met Sally' is considered a classic in the romantic comedy genre", "'When Harry Met Sally' explores the question of whether men and women can just be friends, through the story of its titular characters over the years"]),
    (["Do you have documentaries on nature?", "Which one focuses on marine life?", "How long is 'Blue Planet II'?"],
     ["Yes, we have 'Planet Earth II', 'Blue Planet II', and 'Our Planet'", "'Blue Planet II' focuses extensively on marine life, exploring the deep ocean, coral reefs, and the open sea", "'Blue Planet II' is approximately 7 hours long, spread across 7 episodes"])
]

multi_turn_dataset_name = "Multi-Turn Queries"
multi_turn_dataset = client.create_dataset(dataset_name=multi_turn_dataset_name)
for queries, answers in multi_turn_examples:
    client.create_example(inputs={"queries": queries}, outputs={"answers": answers}, dataset_id=multi_turn_dataset.id)

In [5]:
structured_input_examples = [
    ({
        "user_preferences": ["Sci-Fi", "Action"],
        "watch_history": ["The Matrix", "Inception"],
        "search_query": "What to watch next?"
     }, 
     "Based on your love for Sci-Fi and Action movies, and considering you've recently watched 'The Matrix' and 'Inception', you might enjoy 'Blade Runner 2049' for its deep narrative and stunning visuals."),
    
    ({
        "user_preferences": ["Drama", "Historical"],
        "watch_history": ["The Crown", "Downton Abbey"],
        "search_query": "Looking for a movie with a strong storyline"
     }, 
     "Given your interest in Drama and Historical themes, and your watch history, 'The King's Speech' offers a compelling storyline with remarkable performances."),
    
    ({
        "user_preferences": ["Comedy", "Romance"],
        "watch_history": ["Friends", "The Big Bang Theory"],
        "search_query": "Need a light-hearted movie"
     }, 
     "Considering your preference for Comedy and Romance, along with enjoying shows like 'Friends', you'd likely enjoy 'Crazy Rich Asians' for its humor and heartwarming romance."),
    
    ({
        "user_preferences": ["Thriller", "Mystery"],
        "watch_history": ["Sherlock", "Mindhunter"],
        "search_query": "Suggest a suspenseful movie"
     }, 
     "With your taste leaning towards Thriller and Mystery, and considering you've watched 'Sherlock' and 'Mindhunter', 'Gone Girl' would be an excellent choice for its suspense and plot twists."),
    
    ({
        "user_preferences": ["Documentary", "Nature"],
        "watch_history": ["Planet Earth", "Blue Planet II"],
        "search_query": "Want to watch something about wildlife"
     }, 
     "Your interest in Documentaries and Nature, along with watching 'Planet Earth' and 'Blue Planet II', suggests you would enjoy 'The Serengeti Rules', which beautifully captures wildlife and ecosystems."),
    
    ({
        "user_preferences": ["Fantasy", "Adventure"],
        "watch_history": ["Harry Potter series", "The Hobbit"],
        "search_query": "Fantasy movies for the weekend?"
     }, 
     "Given your love for Fantasy and Adventure, having watched the 'Harry Potter series' and 'The Hobbit', 'The Witcher' series would be a fantastic choice for your weekend binge."),
    
    ({
        "user_preferences": ["Animation", "Family"],
        "watch_history": ["Finding Nemo", "Toy Story"],
        "search_query": "Animated movies that are fun for all ages?"
     }, 
     "With a preference for Animation and Family-friendly content, and given your history with 'Finding Nemo' and 'Toy Story', 'Coco' is highly recommended for its fun story and universal appeal."),
    
    ({
        "user_preferences": ["Horror", "Supernatural"],
        "watch_history": ["The Haunting of Hill House", "Stranger Things"],
        "search_query": "Scary movies that aren’t too gory?"
     }, 
     "As a fan of Horror and Supernatural genres, and having enjoyed 'The Haunting of Hill House' and 'Stranger Things', 'A Quiet Place' offers suspense without relying on gore."),
    
    ({
        "user_preferences": ["Musical", "Drama"],
        "watch_history": ["La La Land", "The Greatest Showman"],
        "search_query": "Musicals with a strong emotional core?"
     }, 
     "Your enjoyment of Musicals and Drama, seen in 'La La Land' and 'The Greatest Showman', means you might find 'Les Misérables' to be a powerful experience with its deep emotional resonance."),
    
    ({
        "user_preferences": ["Crime", "Legal Drama"],
        "watch_history": ["Breaking Bad", "Better Call Saul"],
        "search_query": "Engaging legal dramas?"
     }, 
     "Considering your interest in Crime and Legal Drama, with 'Breaking Bad' and 'Better Call Saul' in your watch history, 'The Trial of the Chicago 7' is recommended for its engaging narrative and historical significance.")
]

structured_input_dataset_name = "Structured Inputs"
structured_input_dataset = client.create_dataset(dataset_name=structured_input_dataset_name)
for inputs, answer in structured_input_examples:
    client.create_example(inputs=inputs, outputs={"answer": answer}, dataset_id=structured_input_dataset.id)

## Toxic Queries

In [6]:
from openai import OpenAI
from langsmith import wrappers, traceable

openai = wrappers.wrap_openai(OpenAI())

@traceable
def label_query(query):
    messages = [
        {"role": "system", "content": "Given the following user query, assess whether it contains toxic content. Please provide a simple 'Toxic' or 'Not toxic' response based on your assessment."},
        {"role": "user", "content": query}
    ]
    result = openai.chat.completions.create(messages=messages, model="gpt-3.5-turbo", temperature=0)
    return result.choices[0].message.content

@traceable
def label_query_alternate_prompt(query):
    messages = [
        {"role": "system", "content": "Please review the user query below and determine if it contains any form of toxic behavior, such as insults, threats, or highly negative comments. Respond with 'Toxic' if it does, and 'Not toxic' if it doesn't."},
        {"role": "user", "content": query}
    ]
    result = openai.chat.completions.create(messages=messages, model="gpt-3.5-turbo", temperature=0)
    return result.choices[0].message.content

In [10]:
from langchain.smith import RunEvalConfig
from langsmith.evaluation import EvaluationResult, run_evaluator

@run_evaluator
def correct_label(run, example) -> EvaluationResult:
    score = run.outputs.get("output") == example.outputs.get("label")
    return EvaluationResult(key="correct_label", score=int(score))

eval_config = RunEvalConfig(
    custom_evaluators=[correct_label]
)

results_1 = client.run_on_dataset(
    dataset_name=toxic_dataset_name,
    llm_or_chain_factory=lambda input: label_query(input["text"]),
    evaluation=eval_config,
    project_name="toxic queries prompt 001",
)

results_2 = client.run_on_dataset(
    dataset_name=toxic_dataset_name,
    llm_or_chain_factory=lambda input: label_query_alternate_prompt(input["text"]),
    evaluation=eval_config,
    project_name="toxic queries prompt 002",
)

View the evaluation results for project 'toxic queries prompt 001' at:
https://smith.langchain.com/o/02564c5f-72ba-4057-8130-2b43b785bbd8/datasets/deb8f54b-c67c-4634-ac89-55d6c78d56e7/compare?selectedSessions=1011711d-4963-4d2c-a8ae-73d8fb2d7c46

View all tests for Dataset Toxic Queries Dataset at:
https://smith.langchain.com/o/02564c5f-72ba-4057-8130-2b43b785bbd8/datasets/deb8f54b-c67c-4634-ac89-55d6c78d56e7
[------------------------------------------------->] 13/13View the evaluation results for project 'toxic queries prompt 002' at:
https://smith.langchain.com/o/02564c5f-72ba-4057-8130-2b43b785bbd8/datasets/deb8f54b-c67c-4634-ac89-55d6c78d56e7/compare?selectedSessions=ac65c588-d5f1-4eaa-80cf-a2cbd9963651

View all tests for Dataset Toxic Queries Dataset at:
https://smith.langchain.com/o/02564c5f-72ba-4057-8130-2b43b785bbd8/datasets/deb8f54b-c67c-4634-ac89-55d6c78d56e7
[------------------------------------------------->] 13/13

## Multi-Turn Queries

In [84]:
import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "retrieve_movies",
            "description": "Retrieve a list of relevant movies and their metadata from a movie database.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The query used to retrieve movies from the movie database, for example 'Christopher Nolan films'"
                    },
                },
                "required": ["query"],
            },
        }
    },
]

system_prompt = """
Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.
Note that if the question does not require additional search and can be answered using the chat history, simply respond with the answer.
Don't make up content that's not supplied in chat history.
"""

@traceable
def generate_movie_search(chat_history, query):
    messages = [
        {"role": "system", "content": system_prompt},
    ] + chat_history + [{"role": "user", "content": query}]
    result = openai.chat.completions.create(
        messages=messages, 
        model="gpt-3.5-turbo-0613",
        tools=tools
    )
    return result.choices[0].message

def _convert_docs(results):
    return [
        {
            "page_content": r,
            "type": "Document",
        } for r in results
    ]

@traceable(run_type="retriever")
def retrieve_movies(query):
    # Foo retriever. In production, this would search an actual database
    if "family-friendly" in query.lower():
        return _convert_docs(["Lion King", "Finding Nemo", "The Incredibles"])
    elif "sci-fi" in query.lower():
        return _convert_docs(["Blade Runner 2049", "Interstellar", "The Martian"])
    elif "nature" in query.lower():
        return _convert_docs(['Planet Earth II', 'Blue Planet II', 'Our Planet'])
    elif "christopher nolan" in query.lower():
        return _convert_docs(['Inception', 'Dunkirk', 'Interstellar'])
    else:
        return _convert_docs(['Crazy Rich Asians', 'The Big Sick', 'When Harry Met Sally'])

@traceable
def execute_function_call(message):
    if message.tool_calls[0].function.name == "retrieve_movies":
        query = json.loads(message.tool_calls[0].function.arguments)["query"]
        results = retrieve_movies(query)
    else:
        results = f"Error: function {message.tool_calls[0].function.name} does not exist"
    return results

@traceable
def generate_answer(question, context):
    messages = [
        {"role": "system", "content": f"Answer the user's question based only on the content below:\n\n{context}"},
        {"role": "user", "content": question}
    ]
    result = openai.chat.completions.create(messages=messages, model="gpt-3.5-turbo", temperature=0)
    return result.choices[0].message.content

@traceable
def rag_pipeline(chat_history, question):
    message = generate_movie_search(chat_history, question)
    if message.tool_calls is None:
        return message.content
    else:
        docs = execute_function_call(message)
        context = "\n".join([doc["page_content"] for doc in docs])
        return generate_answer(question, context)

In [85]:
for turns, _ in multi_turn_examples[:1]:
    chat_history = []
    for turn in turns:
        output = rag_pipeline(chat_history, turn)
        print(output)
        chat_history.append({"role": "user", "content": turn})
        chat_history.append({"role": "assistant", "content": output})

I recommend watching "Lion King," "Finding Nemo," or "The Incredibles." These are all great family-friendly movies that everyone can enjoy together.
Yes, "Finding Nemo" has an educational theme. The movie explores the underwater world and teaches about different types of marine life and their habitats. It also highlights the importance of family, friendship, and perseverance.
The Big Sick has a 98% rating on Rotten Tomatoes, making it the highest rated film among the three mentioned.


In [70]:
multi_turn_examples[:1]

[(['Recommend some family-friendly movies for tonight',
   'Do any of these have an educational theme?',
   'Which one has the highest ratings?'],
  ["Some family-friendly movies available are 'The Lion King', 'Finding Nemo', and 'The Incredibles'",
   "'The Lion King' and 'Finding Nemo' have educational themes about the circle of life and the importance of family",
   "'The Incredibles' has the highest ratings among them with a 94% on Rotten Tomatoes"])]

## Structured Inputs