# Hands-on Session Chatbot
This is the hands-on session accompanying the workshop on LangChain fundamentals. This is inspired by the more extensive LangChain Cookbook Part 1.

Copyright (c) 2023 Michael Neumayr

## Setup

### 0. Set up the Colab in your drive

- Load this Colab from Github
- Run the first cell to install all required packages (this takes a moment)
- During installation jump to section "Set OpenAI API Key" and put the key we provide you instead of "PUT_YOUR_KEY_HERE"

### 1. Required python packages

In [None]:
# install required packages; this may take some minutes; ignore dependency warnings it should work anyway
%pip install openai
%pip install langchain
%pip install pypdf
%pip install tiktoken

### 2. Load the workshop github

In [None]:
!git clone https://github.com/michaelnoi/venture_labs_build.git

In [None]:
!cd venture_labs_build && git checkout only_static_files

### 3. OpenAI API key

In [None]:
import os

openai_api_key = os.getenv('OPENAI_API_KEY', 'PUT_YOUR_KEY_HERE')

## Project: Interactive Chatbot

### 0. Remember the list of messages

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import SystemMessage, HumanMessage, AIMessage

chat = ChatOpenAI(openai_api_key=openai_api_key)

In [None]:
chat(
    [
        SystemMessage(content="Answer in Chinese."),
        HumanMessage(content="When is the Oktoberfest in Munich usually?"),
        AIMessage(content='The Oktoberfest in Munich usually begins in late September and lasts for 16-18 days, ending on the first Sunday in October or on October 3rd, German Unity Day, if it falls on a Monday.'),
        HumanMessage(content="And do you have recommendattions what to wear?"),
    ]
)

Let's set up a proper interactive chatbot that stores the messages dynamically.

### 1. Conversation chain for a chatbot

<div class="alert alert-info">
    <b>The ConversationChain includes</b>
    <ol>
        <li>a pre-defined prompt template for a conversation with the LLM. The template is filled with the user input and the chat history. See the template below. This saves you the prompt engineering part for a good conversation. If you like more flexibility, however, you can also use your own prompt template.</li>
        <li>per default a memory that stores the conversation history and append it to the prompt.</li>
    </ol>
</div>
</div>

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain

chat = ChatOpenAI(openai_api_key=openai_api_key)
chain = ConversationChain(llm=chat) #, verbose=True)

exit_conditions = ("quit", "exit")

The memory is empty at the beginning. The conversation chain will fill it automatically with the conversation history.

In [None]:
print(type(chain.memory.buffer))

Remember prompt templates. Here, the user input is automatically prefixed by the "Human: " keyword.

In [None]:
print(chain.prompt.template)

Now let's set up a loop for a conversation with the LLM. The loop should ask the user for input, send the input to the LLM, and print the response. The loop should stop if the user enters "quit" or "exit".

In [None]:
while True:
    query = input("Human: ")
    if query in exit_conditions:
        print()
        print("AI: Goodbye!")
        break
    else:
        response = chain.predict(input=query)
        print()
        print(f"AI: {response}")
        print()

Now check the memory again:

<div class="alert alert-warning">
  <p>Now check the memory of the chain to see if it stored your conversation correctly. You can get a more readible format if you use buffer_as_messages.</p>
</div>

In [None]:
### TODO: print the stored memory of your conversation



### 2. Longer conversations

<div class="alert alert-info">
    <b>Hitting the token limit</b>
    <p>Like with handling large documents, you can also hit the token limit with conversations. But that is not just the case when your input is too large but also when the conversation is too long. This happens because you add the whole history to every prompt you make, so the history grows linearly with the number of interactions. This is bad for long conversations and there are multiple ways to fix this.</p>
    </p>
</div>

In [None]:
from langchain import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.chains.conversation.memory import ConversationSummaryMemory, ConversationBufferWindowMemory

llm = OpenAI(openai_api_key=openai_api_key)
chat = ChatOpenAI(openai_api_key=openai_api_key)
chain = ConversationChain(llm=chat) #, verbose=True)

exit_conditions = ("quit", "exit")

#### i) Cut the conversation off after a specified limit

This is the simplest version. You can just cut off the conversation after a specified number of interactions. This won't run into any token limit problems but it will also not be able to handle long conversations.

<div class="alert alert-warning">
  <p>The keyword <code style="color:red">k</code> is the window size, i.e. how many last interaction will be stored. Play around with k and tell the model something in a conversation and and ask it at some later point again.</p>
</div>

In [None]:
conversation = ConversationChain(
	llm=chat,
	memory=ConversationBufferWindowMemory(k=1)
)

In [None]:
while True:
    query = input("Human: ")
    if query in exit_conditions:
        print()
        print("AI: Goodbye!")
        break
    else:
        response = conversation.predict(input=query)
        print()
        print(f"AI: {response}")
        print()

#### ii) Summarize the conversation and append the summary to the prompt

Another method is to summarize the conversation so far and append that instead of whole raw history to every prompt. This uses more tokens at the beginning but scales better for larger conversations.

<div class="alert alert-warning">
  <p>Now utilize <code style="color:red">ConversationSummaryMemory</code> instead of the window memory. You can set it up in the same way, the only difference is, that the summary memory needs an llm (for the summary) as input instead of a window size. The keywork is <code style="color:red">llm=</code>.</p>
</div>

In [None]:
### TODO: set up the conversation chain with a ConversationSummaryMemory here



Now let's try it out and print intermediate summaries

In [None]:
while True:
    query = input("Human: ")
    if query in exit_conditions:
        print()
        print("AI: Goodbye!")
        break
    else:
        response = conversation.predict(input=query)
        print()
        print(f"AI: {response}")
        print()

In [None]:
print(conversation.memory.buffer)

#### iii) Compare token usage

There are also combinations of these techniques and more advanced methods, see below how the token usage scales with the number of interactions.

<img src="static/token_usage.png" width="1000"/>

Source: https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/

### 3. Streaming output, the real ChatGPT experience

<div class="alert alert-info">
    <b>Streaming output</b>
    <p>Previously, you always had to wait for the full answer and only then was the result printed. Now, we want to have the real ChatGPT experience and also stream the output token by token as soon at is ready.</p>
</div>

We use the same conversation model, we just change how we call the chain. For streaming the outputs, we leverage the <code style="color:gray">streaming</code> keyword. This will return a generator that yields the tokens as they are generated. We can then print them one by one.

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.schema import SystemMessage, HumanMessage, AIMessage
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], openai_api_key=openai_api_key)
chain = ConversationChain(llm=chat)

In [None]:
answer = chat(
    [
        HumanMessage(content="Give me a poem about natural language processing"),
    ]
)

### 4. Chatbot with streaming output

<div class="alert alert-warning">
  <p>Put together all the parts from above to create an interactive chatbot that has some memory and streams the output like you know from ChatGPT.</p>
</div>

In [None]:
### TODO: put your imports here



In [None]:
### TODO: put your initializations here



In [None]:
### TODO: put your loop here, you don't need to add different arguments to predict for streaming to work as it is already set up in the chat model



## More ressources

- Documentation: https://python.langchain.com/docs/get_started/introduction
- Really comprehensive tutorials: https://github.com/gkamradt/langchain-tutorials
- Deep dive conversational memory: https://www.pinecone.io/learn/series/langchain/langchain-conversational-memory/