### ***Building A Chatbot***

***How to design and implement an LLM-powered chatbot. This chatbot will be able to have a conversation and remember previous interactions.***

***Note that this chatbot that we build will only use the language model to have a conversation. There are several other related concepts that you may be looking for:***
- *Conversational RAG: Enable a chatbot experience over an external source of data - later*
- *Agents: Build a chatbot that can take actions - later*

#### ***Load Env Variables and Keys***

In [1]:
import os
from dotenv import load_dotenv
load_dotenv() ## aloading all the environment variable

groq_api_key = os.getenv("GROQ_API_KEY")

#### ***Import `ChatGroq` - LangChain wrapper***

***`ChatGroq` is an integration that lets you use Groq's ultra-fast inference platform (powered by their custom LPM chips) as a chat model inside LangChain. Groq's hardware is designed for deterministic, high-throughput inference, so ChatGroq can deliver low latency responses compared to GPU based providers.***

***https://docs.langchain.com/oss/python/integrations/chat/groq***

In [2]:
from langchain_groq import ChatGroq
model=ChatGroq(model="llama-3.1-8b-instant",groq_api_key=groq_api_key)
model

ChatGroq(profile={'max_input_tokens': 131072, 'max_output_tokens': 8192, 'image_inputs': False, 'audio_inputs': False, 'video_inputs': False, 'image_outputs': False, 'audio_outputs': False, 'video_outputs': False, 'reasoning_output': False, 'tool_calling': True}, client=<groq.resources.chat.completions.Completions object at 0x11d8c1030>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x11d8c0f40>, model_name='llama-3.1-8b-instant', model_kwargs={}, groq_api_key=SecretStr('**********'))

***Model Parameters***

- `max_input_tokens` - takes 131K tokens in single request
- `max_output_tokens` - can generate 8k tokens in the response
- `image_inputs/audio_inputs/video_inputs`: False - means, its not multimodal, only text
- `image_outputs/audio_outputs/video_outputs`: False - cannot generate images, audio or video, only text
- `reasoning_output` - no special structured reasoning output model
- `tool_calling` - supports structured tool calling (like function calling in OpenAI)
- `async_client` - asynchronous version fo the client
- `model_kwargs={}` - model configuration - currently empty

#### ***Importing `HumanMessage` class***

***We can import these classes when we are invoking the model, to tell the model as who is speaking***

- `SystemMessage` - This is more like setting the role/instructions
- `HumanMessage` - Contains User Input, actual question
- `AIMessage` - Message from AI Model


In [3]:
from langchain_core.messages import HumanMessage
model.invoke([HumanMessage(content="Hi I am NK and learning how to create AI Agents!")])

AIMessage(content="Hello NK, it's great to meet you. Learning to create AI Agents can be a fascinating and rewarding experience. AI Agents are a fundamental concept in Artificial Intelligence, and understanding them can open doors to a wide range of applications, from robotics and game playing to expert systems and autonomous vehicles.\n\nTo get started, what level of experience do you have in programming and AI-related concepts? Are you familiar with any programming languages or AI frameworks, such as Python, TensorFlow, or PyTorch?\n\nAlso, are you interested in a specific type of AI Agent, such as:\n\n1. Rule-based Agents (e.g., decision trees, expert systems)\n2. Machine Learning Agents (e.g., neural networks, reinforcement learning)\n3. Hybrid Agents (combining rule-based and machine learning approaches)\n4. Other (please specify)\n\nLet me know, and I'll be happy to guide you through the process!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 182,

***Output Breakdown***

- `AIMessage` - The assistant's reply
    - `content` - actual generated text
    - `additional_kwargs={}` - empty here, but it is sometimes used for structured outputs like tool calls or function responses. 
- `response_metadata` - contains multiple items
    - `token_usage`
        - completion_tokens=282 → The model generated 282 tokens in its answer.
        - prompt_tokens=47 → Your input message consumed 47 tokens.
        - total_tokens=329 → Combined usage.
        - completion_time=0.37s → Time taken to generate the response.
        - queue_time=0.006s → Time spent waiting in Groq’s inference queue.
        - total_time=0.374s → End‑to‑end latency.
    - `usage_metadata`
        - 'input_tokens': 47,
        - 'output_tokens': 282,
        - 'total_tokens': 329

#### ***Multiple Message when invoking the model***

In [4]:
from langchain_core.messages import AIMessage
model.invoke(
    [
        HumanMessage(content="Hi I am NK and learning how to create AI Agents!"),
        AIMessage(content="Nice to meet you, NK. Learning to create AI agents can be a fascinating and rewarding experience. There are several types of AI agents, including rule-based agents, machine learning agents, and hybrid agents.\n"),
        HumanMessage(content="Hey What's my name and what do I do?")
    ]
)

AIMessage(content="Your name is NK, and you're learning how to create AI agents.", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 109, 'total_tokens': 125, 'completion_time': 0.01732335, 'completion_tokens_details': None, 'prompt_time': 0.007533376, 'prompt_tokens_details': None, 'queue_time': 0.005171025, 'total_time': 0.024856726}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_9ca2574dca', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b2880-ce5a-7520-8007-59a925df7679-0', usage_metadata={'input_tokens': 109, 'output_tokens': 16, 'total_tokens': 125})

***Important thing to notice***

***If you see both the calls to the model, `id` is different for that***

#### ***Message History***

***We can use a Message History class to wrap our model and make it stateful. This will keep track of inputs and outputs of the model, and store them in some datastore. Future interactions will then load those messages and pass them into the chain as part of the input. Let's see how to use this!***

***Info on Imports***

- `ChatMessageHistory` - Provides implementations of chat history storage (in memory, files, databases, Streamlit session state)
    - Useful when you want to persist conversations across sessions or store them externally. 
    - `FileChatMessageHistory` stores messages in a JSON file
    - `StreamlitChatMessageHistory` stores them in Streamlit session state
- `BaseChatMessageHistory` - Defines the interface for chat history. Any custom history implementation must extend this class
- `RunnableWithMessageHistory` - Wrapper that takes any runnable (like a chain or LLM) and automatically manages chat history for it.
    - Useful for building chatbots where you want the conversation state to persist without manually handling history

In [5]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

# Setting up per-session chat history store
store = {}

# Function to get or create chat history for a session
def get_session_history(session_id:str)->BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

with_message_history = RunnableWithMessageHistory(model, get_session_history)

In [7]:
config = {"configurable":{"session_id":"chat1"}}

response = with_message_history.invoke(
    [HumanMessage(content = "Hi I am NK and learning how to create AI Agents!")],
    config = config
)

In [8]:
response.content

"Hello NK.  I'd be happy to help you learn about creating AI Agents. Creating AI Agents involves several key components, including decision-making processes, knowledge representation, and interaction with the environment.\n\nThere are several types of AI Agents, including:\n\n1. **Simple Reflex Agents**: These agents make decisions based only on the current state of the environment. They do not have memory or the ability to learn from past experiences.\n2. **Model-Based Agents**: These agents maintain an internal model of the environment, which they use to make predictions and decisions.\n3. **Goal-Based Agents**: These agents have clear goals and use knowledge and reasoning to achieve those goals.\n4. **Utility-Based Agents**: These agents make decisions based on a utility function that measures the expected outcome of each action.\n5. **Learning Agents**: These agents can learn from their experiences and adapt to new situations.\n6. **Multi-Agent Systems**: These agents interact with

In [10]:
with_message_history.invoke(
    [HumanMessage(content="What's my name and what do I do?")],
    config=config,
)

AIMessage(content="Your name is NK, and you're learning how to create AI Agents!", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 581, 'total_tokens': 598, 'completion_time': 0.025463836, 'completion_tokens_details': None, 'prompt_time': 0.045098767, 'prompt_tokens_details': None, 'queue_time': 0.010463668, 'total_time': 0.070562603}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_6c980774ec', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b2882-454e-76d0-865d-ce70059f7720-0', usage_metadata={'input_tokens': 581, 'output_tokens': 17, 'total_tokens': 598})

In [12]:
store.keys()

dict_keys(['chat1'])

In [13]:
## change the config-->session id
config1 = {"configurable":{"session_id":"chat2"}}
response = with_message_history.invoke(
    [HumanMessage(content="Whats my name")],
    config=config1
)
response.content

"I don't have any information about you, so I don't know your name. I'm a conversational AI, and our conversation just started. If you'd like to share your name with me, I'd be happy to chat with you."

In [14]:
response=with_message_history.invoke(
    [HumanMessage(content="Hey My name is John")],
    config=config1
)
response.content

"It's nice to meet you, John. How's your day going so far?"

In [15]:
response=with_message_history.invoke(
    [HumanMessage(content="Whats my name")],
    config=config1
)
response.content

'Your name is John.'

In [19]:
# print the dictionary store
for key, value in store.items():
    print(f"Session ID: {key}")
    print("Chat History:")
    for message in value.messages:
        print(f"- {message}")
    print()  # Add an empty line between sessions   

Session ID: chat1
Chat History:
- content='Hi I am NK and learning how to create AI Agents!' additional_kwargs={} response_metadata={}
- content="Hello NK.  I'd be happy to help you learn about creating AI Agents. Creating AI Agents involves several key components, including decision-making processes, knowledge representation, and interaction with the environment.\n\nThere are several types of AI Agents, including:\n\n1. **Simple Reflex Agents**: These agents make decisions based only on the current state of the environment. They do not have memory or the ability to learn from past experiences.\n2. **Model-Based Agents**: These agents maintain an internal model of the environment, which they use to make predictions and decisions.\n3. **Goal-Based Agents**: These agents have clear goals and use knowledge and reasoning to achieve those goals.\n4. **Utility-Based Agents**: These agents make decisions based on a utility function that measures the expected outcome of each action.\n5. **Lear

#### ***Prompt templates***

***Prompt Templates help to turn raw user information into a format that the LLM can work with. In this case, the raw user input is just a message, which we are passing to the LLM. Let's now make that a bit more complicated. First, let's add in a system message with some custom instructions (but still taking messages as input). Next, we'll add in more input besides just the messages.***

- `ChatPromptTemplate` - Template for constructing prompts made up of multiple chat messages (System, Human, AI)
- `MessagesPlaceholder` - Special Placeholder that lets you inject a list of messages (like past conversation history) into the prompt dynamically

In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
prompt = ChatPromptTemplate.from_messages(
    [
        ("system","You are a helpful assistant.Answer all the question to the best of your ability"),
        MessagesPlaceholder(variable_name="messages")
    ]
)

# by MessagesPlaceholder, you can pass the chat history dynamically at runtime.
# messages should be given with a list of HumanMessage / AI Message - key value pair

chain = prompt | model

In [21]:
chain.invoke({"messages":[HumanMessage(content="Hi My name is NK")]})

AIMessage(content="Hello NK. It's nice to meet you. Is there something I can help you with or would you like to chat?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 26, 'prompt_tokens': 56, 'total_tokens': 82, 'completion_time': 0.026865983, 'completion_tokens_details': None, 'prompt_time': 0.004668943, 'prompt_tokens_details': None, 'queue_time': 0.06880162, 'total_time': 0.031534926}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_6b5c123dd9', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b28da-1801-7923-b5aa-b73c752e61d5-0', usage_metadata={'input_tokens': 56, 'output_tokens': 26, 'total_tokens': 82})

In [22]:
with_message_history = RunnableWithMessageHistory(chain, get_session_history)

In [None]:
config = {"configurable": {"session_id": "chat3"}}
response = with_message_history.invoke(
    [HumanMessage(content="Hi My name is NK")],
    config = config
)

response

AIMessage(content='Nice to meet you, NK. How are you today? Is there anything I can help you with?', additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 22, 'prompt_tokens': 56, 'total_tokens': 78, 'completion_time': 0.034498413, 'completion_tokens_details': None, 'prompt_time': 0.003009483, 'prompt_tokens_details': None, 'queue_time': 0.005380644, 'total_time': 0.037507896}, 'model_name': 'llama-3.1-8b-instant', 'system_fingerprint': 'fp_29e590f0c0', 'service_tier': 'on_demand', 'finish_reason': 'stop', 'logprobs': None, 'model_provider': 'groq'}, id='lc_run--019b28da-f7e3-7091-8734-e6787c7695fc-0', usage_metadata={'input_tokens': 56, 'output_tokens': 22, 'total_tokens': 78})

In [24]:
response = with_message_history.invoke(
    [HumanMessage(content="What's my name?")],
    config=config,
)

response.content

'Your name is NK.'

In [25]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system","You are a helpful assistant.Answer all the question to the best of your ability in {language}"),
        MessagesPlaceholder(variable_name="messages")
    ]
)

chain = prompt | model

In [26]:
response = chain.invoke({"messages":[HumanMessage(content="Hi My name is NK")],"language":"Hindi"})
response.content

'नमस्ते NK, मैं आपकी सहायता के लिए यहाँ हूँ। क्या मुझे कुछ बताना है या आपके पास कोई प्रश्न है?'

Let's now wrap this more complicated chain in a Message History class. This time, because there are multiple keys in the input, we need to specify the correct key to use to save the chat history.

In [27]:
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages"
)

In [28]:
config = {"configurable": {"session_id": "chat4"}}
repsonse=with_message_history.invoke(
    {'messages': [HumanMessage(content="Hi,I am NK")],"language":"Hindi"},
    config=config
)
repsonse.content

'नमस्ते NK, मैं आपकी मदद करने के लिए तैयार हूँ। क्या मैं आपकी किसी विशिष्ट समस्या या प्रश्न का समाधान करने में आपकी सहायता कर सकता हूँ?'

In [29]:
response = with_message_history.invoke(
    {"messages": [HumanMessage(content="whats my name?")], "language": "Hindi"},
    config=config,
)

In [30]:
response.content

'तुम्हारा नाम NK है।'

#### ***Managing the Conversation History***

- One important concept to understand when building chatbots is how to manage conversation history. If left unmanaged, the list of messages will grow unbounded and potentially overflow the context window of the LLM. Therefore, it is important to add a step that limits the size of the messages you are passing in.
- `trim_messages` helper to reduce how many messages we're sending to the model. The trimmer allows us to specify how many tokens we want to keep, along with other parameters like if we want to always keep the system message and whether to allow partial messages

In [31]:
from langchain_core.messages import SystemMessage, trim_messages
trimmer = trim_messages(
    max_tokens=45,
    strategy="last",
    token_counter=model,
    include_system=True,
    allow_partial=False,
    start_on="human"
)

messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

trimmer.invoke(messages)

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='I like vanilla ice cream', additional_kwargs={}, response_metadata={}),
 AIMessage(content='nice', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='whats 2 + 2', additional_kwargs={}, response_metadata={}),
 AIMessage(content='4', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

In [32]:
from operator import itemgetter

from langchain_core.runnables import RunnablePassthrough

chain = (
    RunnablePassthrough.assign(messages=itemgetter("messages") | trimmer)
    | prompt
    | model
    
)

response = chain.invoke(
    {
    "messages":messages + [HumanMessage(content="What ice cream do i like")],
    "language":"English"
    }
)
response.content

"I'm not aware of your preferences, but I can help you explore different types of ice cream if you'd like. Do you have a favorite flavor or type of ice cream?"

In [33]:
response = chain.invoke(
    {
        "messages": messages + [HumanMessage(content="what math problem did i ask")],
        "language": "English",
    }
)
response.content

'You asked the math problem 2 + 2.'

In [34]:
## Lets wrap this in the MEssage History
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="messages",
)
config={"configurable":{"session_id":"chat5"}}

In [35]:
response = with_message_history.invoke(
    {
        "messages": messages + [HumanMessage(content="whats my name?")],
        "language": "English",
    },
    config=config,
)

response.content

"You didn't tell me your name, so I don't know what it is. Would you like to share?"

In [36]:
response = with_message_history.invoke(
    {
        "messages": [HumanMessage(content="what math problem did i ask?")],
        "language": "English",
    },
    config=config,
)

response.content

"You didn't ask a math problem yet. Our conversation has just started. What would you like to ask or discuss?"