# Managing the Conversation History
One important concept to understand when building chatbots is how to manage conversation history. 

If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.

Using trim_messages is a smart way to control the length of chat history that gets passed to the model — especially when your chat becomes long and might exceed token limits.



## Load GROQ key

In [1]:
from dotenv import load_dotenv
load_dotenv()

import os
groq_api_key = os.getenv('GROQ_API_KEY')

## Initialize model

In [2]:
from langchain_groq import ChatGroq

model =ChatGroq(model="gemma2-9b-it", groq_api_key = groq_api_key)
model

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x0000026524F92C20>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x0000026524FC40D0>, model_name='gemma2-9b-it', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [4]:
from langchain_core.messages import trim_messages
trimmer = trim_messages(
    max_tokens=45,              # Max tokens to allow in trimmed message list
    strategy="last",            # Start trimming from the oldest
    token_counter=model,        # Uses the model to count tokens
    include_system=True,        # Include the system prompt in the trimmed set
    allow_partial=False,        # Don't allow partial messages if limit is reached
    start_on="human"            # Start trimming from the first human message
)

In [6]:
from langchain_core.messages import SystemMessage, HumanMessage,AIMessage
messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm sati"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]
trimmed = trimmer.invoke(messages)
trimmed

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

It will start from the last human message, moving backward until the total token count reaches <= 45.

It'll keep whole messages only (because allow_partial=False).

It will include the SystemMessage since include_system=True.

In [7]:
# Token counting helper
def print_token_count(messages, model):
    print("Token Count Per Message:")
    for msg in messages:
        tokens = model.get_num_tokens(msg.content)
        print(f"{msg.type:>6}: {tokens:>3} tokens | {msg.content}")
    total = sum(model.get_num_tokens(msg.content) for msg in messages)
    print(f"\nTotal tokens: {total}")

In [8]:
print_token_count(messages, model)

Token Count Per Message:
system:   5 tokens | you're a good assistant
 human:   6 tokens | hi! I'm sati
    ai:   2 tokens | hi!
 human:   5 tokens | I like vanilla ice cream
    ai:   1 tokens | nice
 human:   5 tokens | whats 2 + 2
    ai:   1 tokens | 4
 human:   1 tokens | thanks
    ai:   3 tokens | no problem!
 human:   3 tokens | having fun?
    ai:   2 tokens | yes!

Total tokens: 34


In [9]:
print_token_count(trimmed, model)

Token Count Per Message:
system:   5 tokens | you're a good assistant
 human:   5 tokens | I like vanilla ice cream
    ai:   1 tokens | nice
 human:   5 tokens | whats 2 + 2
    ai:   1 tokens | 4
 human:   1 tokens | thanks
    ai:   3 tokens | no problem!
 human:   3 tokens | having fun?
    ai:   2 tokens | yes!

Total tokens: 26


### pass trimmed the messages to chain

In [10]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser

# A passthrough step that trims the messages
preprocessor = RunnablePassthrough.assign(
    messages=lambda x: trimmer.invoke(x["messages"])
)

# Prompt with messages placeholder
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer in {language}."),
    MessagesPlaceholder(variable_name="messages")
])


# Final chain
chat_chain = preprocessor | prompt | model | StrOutputParser()

In [12]:
response = chat_chain.invoke({
    "language": "Telugu",
    "messages": messages  + [HumanMessage(content="What ice cream do i like")] # untrimmed messages
})
response

"I don't know what your favorite ice cream is! I'm just a language model, I can't remember past conversations or personal information about you.  \n\nTell me, what's your favorite flavor? 🍦\n"

In [14]:
chat_chain.invoke(
    {
        "messages": messages + [HumanMessage(content="what math problem did i ask")],
        "language": "English",
    }
)

'You asked "what\'s 2 + 2".  🧮  \n\n\n\nLet me know if you want to try another one!\n'

In [15]:
## Lets wrap this in the MEssage History

from langchain_community.chat_message_histories import ChatMessageHistory # to tracks messages in memory.
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# In-memory session store
store={}

# Get/Create chat history for a session
def get_session_history(session_id:str)->BaseChatMessageHistory:
    if session_id not in store:
        store[session_id]=ChatMessageHistory() # ensures each session ID gets its own message history.
    return store[session_id]

# wraps the model with message history, so every invocation keeps context based on the session.
with_message_history = RunnableWithMessageHistory(
    chat_chain,
    get_session_history,
    input_messages_key="messages",
)
config={"configurable":{"session_id":"chat1"}}

In [16]:
with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="whats my name?")],
        "language": "English",
    },
    config=config,
)


"As an AI, I have no memory of past conversations and don't know your name.  \n\nWhat's your name? 😊\n"