## Manage the conversion history

One important concept to understand when building chatbots is how to manage conversation history. If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.
'trim_messages' helper to reduce how many messages we're sending to the model. The trimmer allows us to specify how many tokens we want to keep, along with other parameters like if we want to always keep the system message and whether to allow partial messages

In [1]:
from langchain_groq import ChatGroq
from langchain.messages import SystemMessage, HumanMessage, AIMessage, trim_messages
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

from dotenv import load_dotenv

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


True

In [2]:
model = ChatGroq(model="llama-3.1-8b-instant")

In [3]:
# history store
store = {}

In [4]:
# create a function to get the chat history based on session id

def get_chat_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

In [5]:
# setup config
config = {'configurable': {'session_id': 'chat_1'}}

In [6]:
# adding more complexity to the chain
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant. Answer all the question to the best of your ability in {language} Language."),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model

In [7]:
chain_with_history = RunnableWithMessageHistory(chain, get_session_history=get_chat_history, input_messages_key="messages")

In [8]:
response = chain_with_history.invoke(
    {
        "messages": [
            HumanMessage(content="Hi, My name is srini!"),
        ],
        "language": "Hindi",
    },
    config=config
)

response.content

'नमस्ते स्रीनि! मैं आपकी मदद करने के लिए तैयार हूँ। क्या आप किसी विशिष्ट विषय या समस्या पर चर्चा करना चाहते हैं?'

Let us setup `trim_messages`

In [18]:
trimmer = trim_messages(
    max_tokens=45,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human"
)

In [19]:
messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]
trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

In [22]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

# lets build a chain
chain = RunnablePassthrough.assign(messages=itemgetter("messages") | trimmer) | prompt | model

In [25]:
chain.invoke(
    {
        "messages": messages + [HumanMessage(content="Which ice cream do I like?")], 
        "language": "English"
    }
)

AIMessage(content="I don't have any information about your preferences. You haven't mentioned it before, so I'm not sure which ice cream you like. If you'd like to share, I'd be happy to chat about it!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 45, 'prompt_tokens': 114, 'total_tokens': 159, 'completion_time': 0.066053254, 'completion_tokens_details': None, 'prompt_time': 0.007197251, 'prompt_tokens_details': None, 'queue_time': 0.005520229, 'total_time': 0.073250505}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_9ca2574dca', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--42e8f89c-ad15-496a-ab83-222c24709602-0', usage_metadata={'input_tokens': 114, 'output_tokens': 45, 'total_tokens': 159})

In [26]:
chain.invoke(
    {
        "messages": messages + [HumanMessage(content="What is the math problem, did I ask?")], 
        "language": "English"
    }
)

AIMessage(content='You asked me a simple math problem: 2 + 2.', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 117, 'total_tokens': 132, 'completion_time': 0.019149302, 'completion_tokens_details': None, 'prompt_time': 0.006373331, 'prompt_tokens_details': None, 'queue_time': 0.005516247, 'total_time': 0.025522633}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_4dea31877a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--1eed44db-f5b4-4711-9063-de5efcacaec9-0', usage_metadata={'input_tokens': 117, 'output_tokens': 15, 'total_tokens': 132})

In [27]:
# now lets wrap in with message history

chain_with_history_v3 = RunnableWithMessageHistory(chain, get_session_history=get_chat_history, input_messages_key="messages")

In [28]:
config2 = {'configurable': {'session_id': 'chat_2'}}

In [29]:
chain_with_history_v3.invoke(
    {
        "messages": messages + [HumanMessage(content="What is my name?")], 
        "language": "English"
    },
    config=config2
)

AIMessage(content="I don't know your name. You haven't told me yet.", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 112, 'total_tokens': 127, 'completion_time': 0.028131973, 'completion_tokens_details': None, 'prompt_time': 0.006053253, 'prompt_tokens_details': None, 'queue_time': 0.005164167, 'total_time': 0.034185226}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_4dea31877a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--4d9994e7-5220-409d-8804-5a8d2c33fab3-0', usage_metadata={'input_tokens': 112, 'output_tokens': 15, 'total_tokens': 127})

In [30]:
chain_with_history_v3.invoke(
    {
        "messages": messages + [HumanMessage(content="What is the math problem, did I ask?")], 
        "language": "English"
    },
    config=config2
)

AIMessage(content='You asked me for the result of the math problem "2 + 2".', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 117, 'total_tokens': 134, 'completion_time': 0.020218324, 'completion_tokens_details': None, 'prompt_time': 0.006469782, 'prompt_tokens_details': None, 'queue_time': 0.005558881, 'total_time': 0.026688106}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_4dea31877a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--048ce99e-5556-4946-8002-7ec75497e160-0', usage_metadata={'input_tokens': 117, 'output_tokens': 17, 'total_tokens': 134})

In [31]:
chain_with_history_v3.invoke(
    {
        "messages": messages + [HumanMessage(content="Which ice cream do I like?")], 
        "language": "English"
    },
    config=config2
)

AIMessage(content="I'm a large language model, I don't have any information about your personal preferences. We just started our conversation, and I don't have any knowledge about your likes or dislikes. Would you like to share what kind of ice cream you like?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 51, 'prompt_tokens': 114, 'total_tokens': 165, 'completion_time': 0.072506609, 'completion_tokens_details': None, 'prompt_time': 0.006159653, 'prompt_tokens_details': None, 'queue_time': 0.005163103, 'total_time': 0.078666262}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_4dea31877a', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--ab862813-966a-4e2a-9485-17f2041a51b2-0', usage_metadata={'input_tokens': 114, 'output_tokens': 51, 'total_tokens': 165})