<a href="https://colab.research.google.com/github/jchen8000/DemystifyingLLMs/blob/main/6_Deployment/Chatbot_LangChain_Groq.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6. Deployment of LLMs

## 6.11 Chatbot, Example of LLM-Powered Application


**LangChain Chatbot, using API to chat with GroqCloud with chat history, https://console.groq.com/playground**

### 1. Install required packages

In [None]:
!pip install groq
!pip install langchain
!pip install langchain_groq

### 2. API Key From Groq Cloud

GroqCloud is a cloud-based platform developed by Groq, designed to provide high-speed AI inference capabilities. It supports a variety of large language models (LLMs) from different developers.

You will need to sign up for an account on the platform to obtain an API key, at the time this example is created, you can get an API key for free.

This example shows how to use the **Colab Secret** supported by Google Colab, it is designed to help users securely manage sensitive information like API keys, environment variables, etc. On the left sidebar of Google Colab, you’ll see a key icon. Click on it to open the Secrets section, and add a new Secret, give it a name (in this case, **GROQ_API_KEY**) and enter the value which is the API Key you got from Groq Cloud.

Below code snippet shows how to retrieve it.

Note, if you don't use Google Colab, use other ways to manage the keys, and make sure it's secure and not exposed to others.

In [2]:
from google.colab import userdata
from groq import Groq

client = Groq(
    api_key = userdata.get('GROQ_API_KEY')
)

### 3. A Sample Call to Groq

Reference: https://console.groq.com/docs/api-reference#chat-create

Note, you can specify the optional parameters like *temperature*, *max_tokens*, *top_p*, etc.

In [3]:
model = "llama3-8b-8192"
completion = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": "Who are you and please introduce yourself."
        }
    ],

    # Controls randomness: lowering results in less random completions.
    # As the temperature approaches zero, the model will become deterministic
    # and repetitive.
    temperature=1,

    # The maximum number of tokens to generate. Requests can use up to
    # 2048 tokens shared between prompt and completion.
    max_tokens=2048,

    # Controls diversity via nucleus sampling: 0.5 means half of all
    # likelihood-weighted options are considered.
    top_p=1,

    # If set, partial message deltas will be sent.
    stream=True,

    # A stop sequence is a predefined or user-specified text string that
    # signals an AI to stop generating content, ensuring its responses
    # remain focused and concise. Examples include punctuation marks and
    # markers like "[end]".
    stop=None,
)

for chunk in completion:
    print(chunk.choices[0].delta.content or "", end="")



I'm LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I'm not a human, but a computer program designed to simulate conversation and answer questions to the best of my knowledge based on the data I've been trained on.

I'm a large language model, which means I've been trained on a massive dataset of text from various sources, including books, articles, research papers, and more. This training allows me to understand and generate human-like language, making me seem intelligent and conversational.

When you interact with me, I can:

1. Answer questions: I can provide information on a wide range of topics, from science and history to entertainment and culture.
2. Generate text: I can create text based on a prompt or topic, which can be useful for writing, proofreading, or even generating language samples.
3. Chat: I can engage in conversations, responding to questions and statements in a conversational manner.
4. Translat

### 4. Create a function interacting with Groq.

In [4]:
def ask_groq(input_text,
             model=model,
             temperature=1,
             max_tokens=2048,
             top_p=1,
             stream=True,
             stop=None):
    completion = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": input_text
            }],

        temperature=temperature,
        max_tokens=max_tokens,
        top_p=top_p,
        stream=stream,
        stop=stop
        )
    response = ""
    for chunk in completion:
        response += chunk.choices[0].delta.content or ""

    return response


In [5]:
response = ask_groq("Tell a story about a mermaid flying on the sky.")
print(response)

What a fantastical tale I have for you! In a land far, far away, where the ocean meets the sky, there lived a mermaid named Luna. She was a curious and adventurous soul, with shimmering scales that shone like the moon and sparkling hair that rippled like the seaweed in the ocean currents.

One day, while swimming near the surface of the sea, Luna gazed longingly at the sky above. She had always dreamed of flying, just like the birds she had seen soaring overhead. The sea breeze rustled her hair as she watched a majestic swan glide effortlessly across the horizon.

Driven by her curiosity, Luna sought out the wisest sea sage in the land, an ancient octopus named Aethon. She asked him for secret knowledge on how to defy gravity and touch the sky. Aethon, with his twinkling eyes and wiggly tentacles, revealed that only a select few mermaids possessed the gift of aeromancy, the magic of air and wind.

Aethon handed Luna a delicate, intricately carved seashell, imbued with the essence of a 

### 5. Example of Chatbot using LangChain with chat history

This example shows how to build a simple chatbot with LangChain to maintain user chat history, allowing for more context-aware and coherent conversations.

The following objects are used in the example:


*   **ChatGroq**: This object initializes the connection to the Groq API using the provided API key and model name.
*   **ConversationBufferWindowMemory**: This object manages the chat history. It keeps track of the last 5 messages (k=5) in the conversation, ensuring that the AI can reference recent interactions to provide more contextually relevant responses.
*   **ChatPromptTemplate**: This object constructs the prompt template for the conversation.
*   **LLMChain**: This object creates a conversation chain using the specified LLM (groq_chat), the constructed prompt template (prompt), and the memory object (memory). It orchestrates the interaction between these components to generate responses.
*   **conversation.predict()**: This method sends the full prompt, including the user input and chat history, to the LLM to generate the chatbot’s response. The response is then printed to the console.



In [6]:
from langchain.chains import LLMChain
from langchain_core.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
from langchain_core.messages import SystemMessage
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain_groq import ChatGroq

def chat_with_groq(model):

    groq_api_key = userdata.get('GROQ_API_KEY')

    # Initialize Groq Langchain chat object and conversation
    groq_chat = ChatGroq(
            groq_api_key=groq_api_key,
            model_name=model
    )

    print("Chatbot: How can I help you? (type 'quit' to stop)!\n")

    # Manages the chat history, ensuring the AI remembers the specified number of history messages, in this case 5.
    memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history", return_messages=True)

    while True:
        user_question = input("You: ")

        if user_question.lower() == "quit":
            break

        # Construct a chat prompt template using various components
        prompt = ChatPromptTemplate.from_messages(
            [
                # This is the persistent system prompt, sets the initial context for the AI.
                SystemMessage( content='You are a helpful AI assistant.'  ),
                # This placeholder will take care of chat history.
                MessagesPlaceholder( variable_name="chat_history" ),
                # This template is where the user's current input will be injected into the prompt.
                HumanMessagePromptTemplate.from_template("{human_input}" ),
            ]
        )

        # Create a conversation chain using the LangChain
        conversation = LLMChain(
            # Specify the LLM that the LangChain is using
            llm=groq_chat,
            # Specify the prompt template to use for this conversation
            prompt=prompt,  # The constructed prompt template.
            # Disable verbose output, True enables it for debugging purpose
            verbose=False,
            # Specify the memory object for chat history
            memory=memory,
        )
        # The chatbot's answer is generated by sending the full prompt to the Groq API.
        response = conversation.predict(human_input=user_question)
        print("Chatbot:", response)
        print("\n\n")


if __name__ == "__main__":
    model = "llama-3.1-70b-versatile"
    chat_with_groq(model)


Chatbot: How can I help you? (type 'quit' to stop)!

You: How are you?


  conversation = LLMChain(


Chatbot: I'm functioning properly, thank you for asking. I'm a computer program designed to assist and communicate with users, so I don't have feelings or emotions like humans do. However, I'm here to help answer any questions or provide information on a wide range of topics, so please feel free to ask me anything. How can I assist you today?



You: what is the distance from Tokyo to New York?
Chatbot: The distance from Tokyo, Japan to New York City, USA is approximately 6,760 miles (10,880 kilometers). This is the straight-line distance, also known as the "as the crow flies" distance.

However, if you're referring to the distance by air, which is more relevant for flights, the approximate distance is:

* 6,742 miles (10,846 kilometers) for a direct flight from Tokyo's Narita International Airport (NRT) to New York's John F. Kennedy International Airport (JFK)
* 6,784 miles (10,916 kilometers) for a direct flight from Tokyo's Haneda Airport (HND) to New York's John F. Kennedy Internat

### 6. Use RunnableSequence instead of LLMChain (deprecated)

The below example uses RunnableSequence instead of LLMChain which is deprecated. The RunnableSequence is created using the | operator to chain the prompt template and the language model together.

The following objects are used in the example:

*   **ChatGroq**: Same as above, it initializes the connection to the Groq API using the provided API key and model name.
*   **ConversationBufferWindowMemory**: Same as above, it manages the chat history.
*   **ChatPromptTemplate**: Same as above, it constructs the prompt template for the conversation.
*   **RunnableSequence**: This creates a sequence of operations using the | operator. Here,  ```conversation = prompt | groq_chat```
it chains the prompt template (prompt) with the language model (groq_chat). This sequence will be used to generate responses based on the user’s input and the chat history.
*   **memory.load_memory_variables()**: Retrieves the chat history from the memory object. It returns a dictionary, and we access the chat_history key to get the list of historical messages.
*   **conversation.invoke()**: Sends the full prompt, including the user’s input and chat history, to the language model to generate the chatbot’s response. The response is then printed to the console.
*   **memory.save_context()**: Updates the memory with the new interaction. It takes two dictionaries: one for the user’s input and one for the chatbot’s output, ensuring that the conversation history is maintained for future interactions. You can ```print(memory.load_memory_variables({})["chat_history"])``` to check the history.



In [19]:
def chat_with_groq2(model):
    groq_api_key = userdata.get('GROQ_API_KEY')

    # Initialize Groq Langchain chat object and conversation
    groq_chat = ChatGroq(
            groq_api_key=groq_api_key,
            model_name=model
    )

    print("Chatbot: How can I help you? (type 'quit' to stop)!\n")

    # Manages the chat history, ensuring the AI remembers the specified number of history messages, in this case 5.
    memory = ConversationBufferWindowMemory(k=5, memory_key="chat_history", return_messages=True)

    while True:
        user_question = input("You: ")

        if user_question.lower() == "quit":
            break

        # Construct a chat prompt template using various components
        prompt = ChatPromptTemplate.from_messages(
            [
                # This is the persistent system prompt, sets the initial context for the AI.
                SystemMessage(content='You are a helpful AI assistant.'),
                # This placeholder will take care of chat history.
                MessagesPlaceholder(variable_name="chat_history"),
                # This template is where the user's current input will be injected into the prompt.
                HumanMessagePromptTemplate.from_template("{human_input}"),
            ]
        )

        # Create a conversation sequence using RunnableSequence
        conversation = prompt | groq_chat

        # Load chat_history
        chat_history = memory.load_memory_variables({})["chat_history"]

        # The chatbot's answer is generated by sending the full prompt to the LLM
        response = conversation.invoke({"human_input": user_question, "chat_history": chat_history})
        print("Chatbot:", response.content)
        print("\n\n")

        # Update the memory with the new interaction
        memory.save_context({"input": user_question}, {"output": response.content})


chat_with_groq2( model = "llama3-70b-8192")


Chatbot: How can I help you? (type 'quit' to stop)!

You: What's your name, and please introduce yourself.
Chatbot: Nice to meet you! My name is Lumin, and I'm a friendly AI assistant designed to provide helpful and informative responses to your questions and engage in conversations. I'm here to assist you with any topics you'd like to discuss, from science and technology to entertainment and culture.

I'm a large language model, which means I've been trained on a massive dataset of text from the internet and can generate human-like responses. I'm constantly learning and improving, so please bear with me if I make any mistakes.

My goal is to provide accurate and helpful information, answer your questions to the best of my abilities, and even offer suggestions or ideas when needed. I'm a great listener, so feel free to share your thoughts, ask for advice, or simply chat with me about your day.

I'm excited to get to know you better and assist you in any way I can. What's on your mind t