<a href="https://colab.research.google.com/github/jchen8000/DemystifyingLLMs/blob/main/6_Deployment/Chatbot_LangChain_Groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6. Deployment of LLMs

## 6.11 Chatbot, Example of LLM-Powered Application

### **Building a Simple Chatbot with LangChain and GROQ model**

This notebook demonstrates how to use **LangChain** to build a simple, interactive chatbot powered by a **GroqCloud-hosted large language model**.  
The example shows how to structure conversational prompts, maintain chat history, and generate context-aware responses using modern LangChain patterns.

### Model and API Requirements

- The chatbot uses a language model served by **GroqCloud**
- A valid **`GROQ_API_KEY`** is required to authenticate requests to the Groq API, see https://console.groq.com/docs/quickstart  
- The API key is expected to be provided via your runtime environment (for example, as an environment variable or through notebook secrets)

### What this example covers

- Initializing a Groq-backed chat model with LangChain
- Structuring conversations using system, human, and assistant messages
- Maintaining short-term chat history for coherent multi-turn conversations
- Invoking the model to generate responses in an interactive loop

### Compatibility Note

> **Important**  
> This code has been tested and confirmed to work as of **January 2026**.  
> However, both **LangChain** and **GroqCloud models / SDKs** are actively evolving.  
> Future changes to package versions, APIs, or available models may require updates to this notebook for continued compatibility.

This example is intended as a **learning and reference implementation** to illustrate current best practices rather than a permanently stable API contract.


### 1. Install required packages

In [None]:
%pip install -q \
  groq==0.37.1 \
  langchain==1.2.3 \
  langchain-core==1.2.6 \
  langchain_groq==1.1.1

### 2. API Key From Groq Cloud

GroqCloud is a cloud-based platform developed by Groq, designed to provide high-speed AI inference capabilities. It supports a variety of large language models (LLMs) from different developers.

You will need to sign up for an account on the platform to obtain an API key, at the time this example is created, you can get an API key for free.

This example shows how to use the **Colab Secret** supported by Google Colab, it is designed to help users securely manage sensitive information like API keys, environment variables, etc. On the left sidebar of Google Colab, you’ll see a key icon. Click on it to open the Secrets section, and add a new Secret, give it a name (in this case, **GROQ_API_KEY**) and enter the value which is the API Key you got from Groq Cloud.

Below code snippet shows how to retrieve it.

Note, if you don't use Google Colab, use other ways to manage the keys, and make sure it's secure and not exposed to others.

In [None]:
from google.colab import userdata
from groq import Groq

client = Groq(
    api_key = userdata.get('GROQ_API_KEY')
)

GROQ_MODEL = "llama-3.1-8b-instant"

### 3. A Sample Call to Groq

Reference: https://console.groq.com/docs/api-reference#chat-create

Note, you can specify the optional parameters like *temperature*, *max_tokens*, *top_p*, etc.

In [None]:
model = GROQ_MODEL
completion = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Who are you and please introduce yourself."
        }
    ],

    # Controls randomness: lowering results in less random completions.
    # As the temperature approaches zero, the model will become deterministic
    # and repetitive.
    temperature=1,

    # The maximum number of tokens to generate. Requests can use up to
    # 2048 tokens shared between prompt and completion.
    max_tokens=2048,

    # Controls diversity via nucleus sampling: 0.5 means half of all
    # likelihood-weighted options are considered.
    top_p=1,

    # If set, partial message deltas will be sent.
    stream=True,

    # A stop sequence is a predefined or user-specified text string that
    # signals an AI to stop generating content, ensuring its responses
    # remain focused and concise. Examples include punctuation marks and
    # markers like "[end]".
    stop=None,
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")



I'm LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but a computer program designed to simulate conversation and answer questions to the best of my knowledge based on the data I've been trained on.

I'm a large language model, which means I've been trained on a massive dataset of text from various sources, including books, articles, research papers, and more. This training allows me to understand and generate human-like language, making me seem intelligent and conversational.

When you interact with me, I can:

1. Answer questions: I can provide information on a wide range of topics, from science and history to entertainment and culture.
2. Generate text: I can create text based on a prompt or topic, which can be useful for writing, proofreading, or even generating language samples.
3. Chat: I can engage in conversations, responding to questions and statements in a conversational manner.
4. Translat

### 4. Create a function interacting with Groq.

In [None]:
def ask_groq(input_text,
             model=model,
             temperature=1,
             max_tokens=2048,
             top_p=1,
             stream=True,
             stop=None):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": input_text
            }],

        temperature=temperature,
        max_tokens=max_tokens,
        top_p=top_p,
        stream=stream,
        stop=stop
        )
    response = ""
    for chunk in completion:
        response += chunk.choices[0].delta.content or ""

    return response

response = ask_groq("Tell a story about a mermaid flying on the sky.")
print(response)

What a fantastical tale I have for you! In a land far, far away, where the ocean meets the sky, there lived a mermaid named Luna. She was a curious and adventurous soul, with shimmering scales that shone like the moon and sparkling hair that rippled like the seaweed in the ocean currents.

One day, while swimming near the surface of the sea, Luna gazed longingly at the sky above. She had always dreamed of flying, just like the birds she had seen soaring overhead. The sea breeze rustled her hair as she watched a majestic swan glide effortlessly across the horizon.

Driven by her curiosity, Luna sought out the wisest sea sage in the land, an ancient octopus named Aethon. She asked him for secret knowledge on how to defy gravity and touch the sky. Aethon, with his twinkling eyes and wiggly tentacles, revealed that only a select few mermaids possessed the gift of aeromancy, the magic of air and wind.

Aethon handed Luna a delicate, intricately carved seashell, imbued with the essence of a 

### 5. Example of Chatbot using LangChain with chat history

This example shows how to build a simple chatbot with LangChain to maintain user chat history, allowing for more context-aware and coherent conversations.

---

#### Objects and concepts used in this example

* **ChatGroq**  
  Initializes the connection to the Groq API using the provided API key and model name.  
  It is responsible for sending a list of structured chat messages to the Groq-hosted model and returning the model’s response.

* **SystemMessage / HumanMessage / AIMessage**  
  These message objects define the conversational structure:
  - `SystemMessage` sets persistent behavior and instructions for the assistant
  - `HumanMessage` represents user input
  - `AIMessage` represents the model’s response  

  Together, they form the conversational context passed to the model on each turn.

* **Manual chat history (application-managed memory)**  
  Instead of `ConversationBufferWindowMemory`, the conversation history is stored in a Python list containing `HumanMessage` and `AIMessage` objects.  
  A rolling window is enforced to keep only the most recent interactions (e.g. the last 5 turns), ensuring:
  - Controlled token usage
  - Relevant short-term context
  - Predictable behavior across versions

* **ChatPromptTemplate (optional)**  
  When used, this object defines how system messages, historical messages, and the current user input are combined into a prompt.  
  In simpler cases, messages can be passed directly to the model without a template.

* **Model invocation (`invoke`)**  
  The chatbot sends the full conversational context—including the system message, chat history, and current user input—to the model using `invoke()`.  
  The returned response is then printed and appended to the chat history for subsequent turns.

---

This example reflects the **recommended LangChain pattern in 2026** for building chatbots that maintain conversational context without relying on deprecated abstractions.

In [None]:
from google.colab import userdata
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from langchain_groq import ChatGroq

def chat_with_groq(model):
    groq_api_key = userdata.get("GROQ_API_KEY")

    llm = ChatGroq(
        groq_api_key=groq_api_key,
        model_name=model,
    )

    chat_history = []  # <-- YOU manage memory

    prompt = ChatPromptTemplate.from_messages([
        SystemMessage(content="You are a helpful AI assistant."),
        *chat_history,
        ("human", "{human_input}")
    ])

    print("Chatbot: How can I help you? (type 'quit' to stop)\n")

    while True:
        user_input = input("You: ")
        if user_input.lower() == "quit":
            break

        response = llm.invoke([
            SystemMessage(content="You are a helpful AI assistant."),
            *chat_history,
            HumanMessage(content=user_input)
        ])

        chat_history.append(HumanMessage(content=user_input))
        chat_history.append(AIMessage(content=response.content))

        # keep last 5 turns
        chat_history[:] = chat_history[-10:]

        print("Chatbot:", response.content)
        print()

model = GROQ_MODEL
chat_with_groq(model)

Chatbot: How can I help you? (type 'quit' to stop)!

You: How are you?


Chatbot: I'm functioning properly, thank you for asking. I'm a computer program designed to assist and communicate with users, so I don't have feelings or emotions like humans do. However, I'm here to help answer any questions or provide information on a wide range of topics, so please feel free to ask me anything. How can I assist you today?



You: what is the distance from Tokyo to New York?
Chatbot: The distance from Tokyo, Japan to New York City, USA is approximately 6,760 miles (10,880 kilometers). This is the straight-line distance, also known as the "as the crow flies" distance.

However, if you're referring to the distance by air, which is more relevant for flights, the approximate distance is:

* 6,742 miles (10,846 kilometers) for a direct flight from Tokyo's Narita International Airport (NRT) to New York's John F. Kennedy International Airport (JFK)
* 6,784 miles (10,916 kilometers) for a direct flight from Tokyo's Haneda Airport (HND) to New York's John F. Kennedy Internat

### 6. Use RunnableSequence instead of LLMChain

The following example demonstrates how to use **RunnableSequence** (LCEL) instead of the deprecated `LLMChain`.  

RunnableSequence is created using the `|` operator to chain a prompt template and a language model into a single executable pipeline.

---

#### Objects used in this example

* **ChatGroq**  
  Initializes the connection to the Groq API using the provided API key and model name. It is responsible for sending the formatted messages to the Groq-hosted LLM and returning the model’s response.

* **ChatPromptTemplate**  
  Defines how messages are structured before being sent to the model.  
  It includes:
  - A persistent system message
  - A placeholder for chat history
  - A template for the current user input

* **MessagesPlaceholder**  
  Acts as an insertion point for historical messages. The application supplies the current `chat_history` list at runtime.

* **RunnableSequence**  
  Created using the `|` operator:
  ```python
  conversation = prompt | groq_chat



In [None]:
from google.colab import userdata
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import (
    SystemMessage,
    HumanMessage,
    AIMessage,
)
from langchain_groq import ChatGroq


def chat_with_groq2(model):
    groq_api_key = userdata.get("GROQ_API_KEY")

    groq_chat = ChatGroq(
        groq_api_key=groq_api_key,
        model_name=model,
    )

    print("Chatbot: How can I help you? (type 'quit' to stop)!\n")

    chat_history = []

    prompt = ChatPromptTemplate.from_messages([
        SystemMessage(content="You are a helpful AI assistant."),
        MessagesPlaceholder(variable_name="chat_history"),
        ("human", "{human_input}"),
    ])

    conversation = prompt | groq_chat  # <-- RunnableSequence

    MAX_TURNS = 5
    MAX_MESSAGES = MAX_TURNS * 2

    while True:
        user_question = input("You: ")
        if user_question.lower() == "quit":
            break

        response = conversation.invoke({
            "human_input": user_question,
            "chat_history": chat_history,
        })

        print("Chatbot:", response.content)
        print()

        # Update manual memory
        chat_history.append(HumanMessage(content=user_question))
        chat_history.append(AIMessage(content=response.content))

        # Keep last k turns
        chat_history[:] = chat_history[-MAX_MESSAGES:]


chat_with_groq2(model=GROQ_MODEL)


Chatbot: How can I help you? (type 'quit' to stop)!

You: What's your name, and please introduce yourself.
Chatbot: Nice to meet you! My name is Lumin, and I'm a friendly AI assistant designed to provide helpful and informative responses to your questions and engage in conversations. I'm here to assist you with any topics you'd like to discuss, from science and technology to entertainment and culture.

I'm a large language model, which means I've been trained on a massive dataset of text from the internet and can generate human-like responses. I'm constantly learning and improving, so please bear with me if I make any mistakes.

My goal is to provide accurate and helpful information, answer your questions to the best of my abilities, and even offer suggestions or ideas when needed. I'm a great listener, so feel free to share your thoughts, ask for advice, or simply chat with me about your day.

I'm excited to get to know you better and assist you in any way I can. What's on your mind t