# Conversational Interface - Chatbot with Claude LLM

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

In this notebook, we will build a chatbot using the Foundation Models (FMs) in Amazon Bedrock. For our use-case we use Claude as our FM for building the chatbot. We use OpenSearch for a vector database to use RAG. We use LangChain as framework to make it all work.

## Overview

This demo shows how using RAG context with the CSA v4.0 security guide for cloud computing will provide more relevant responses compared to using a LLM directly.

This solution will demostrate using a vector database with RAG to reduce hallucination and provided context to prompts for better results. First we will call the LLM with no prompt engineering or RAG to see results are generic and not what we are looking for. Next we will add a prompt template using Langchain to improve our prompt. Then we will first lookup results in a vector database to provde better context with our prompt before calling the LLM. Results will now be related to our intended topic. Finally, we will string prompts together as a chat history to provide even more context to our prompts using previous questions.

## Initialization

Setup our environment and create the boto3 bedrock client


In [31]:
import warnings
warnings.filterwarnings('ignore')

In [32]:
import json
import os
import sys

import boto3

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."


boto3_bedrock = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-west-2.amazonaws.com)


Setup LangChain objects for text LLM with Bedrock and Claude, embeddings model with Bedrock and Amazon Titan embeddings

In [None]:
# We will be using the Titan Embeddings Model to generate our Embeddings.
from langchain.embeddings import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock
from langchain.load.dump import dumps

# - create the Anthropic Model
llm = Bedrock(
    model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={"max_tokens_to_sample": 1000}
)
bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)

## Q&A

#### Q&A (Basic - without context)

We use [CoversationChain](https://python.langchain.com/en/latest/modules/models/llms/integrations/bedrock.html?highlight=ConversationChain#using-in-a-conversation-chain) from LangChain to start the conversation. We also use the [ConversationBufferMemory](https://python.langchain.com/en/latest/modules/memory/types/buffer.html) for storing the messages. We can also get the history as a list of messages (this is very useful in a chat model).

Chatbots needs to remember the previous interactions. Conversational memory allows us to do that. There are several ways that we can implement conversational memory. In the context of LangChain, they are all built on top of the ConversationChain.

In [33]:
#question="What is resource pooling?"
question="What are security best practices?"

In [34]:
from langchain.chains import ConversationChain
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory
modelId = "anthropic.claude-v2"
cl_llm = Bedrock(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs={"max_tokens_to_sample": 1000},
)
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=cl_llm, verbose=True, memory=memory
)

try:
    
    print_ww(conversation.predict(input = question))

except ValueError as error:
    if  "AccessDeniedException" in str(error):
        print(f"\x1b[41m{error}\
        \nTo troubeshoot this issue please refer to the following resources.\
         \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
         \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")      
        class StopExecution(ValueError):
            def _render_traceback_(self):
                pass
        raise StopExecution        
    else:
        raise error



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: What are security best practices?
AI:[0m

[1m> Finished chain.[0m
 Here are some cybersecurity best practices for organizations and individuals:

- Keep software and systems updated - Install the latest patches, updates, and software versions as
soon as they are released to fix vulnerabilities and improve security.

- Use strong passwords - Create long, complex passwords and passphrases for all accounts and
systems. Avoid reusing passwords across accounts. Use a password manager to generate and store
unique passwords.

- Enable two-factor authentication - Add an extra layer of security beyond just a password by
requiring a second step of

What happens here? We asked about resource sharing and recieved back a generic response. This is due a standard LangChain template with no prompt engineering. This is due to the fact that the default prompt used by Langchain ConversationChain is not well designed for Claude. An [effective Claude prompt](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design) should contain `\n\nHuman:` at the beginning and also contain `\n\nAssistant:` in the prompt sometime after the `\n\nHuman:` (optionally followed by other text that you want to [put in Claude's mouth](https://docs.anthropic.com/claude/docs/human-and-assistant-formatting#use-human-and-assistant-to-put-words-in-claudes-mouth)). Let's fix this.

To learn more about how to write prompts for Claude, check [Anthropic documentation](https://docs.anthropic.com/claude/docs/introduction-to-prompt-design).

#### Q&A using prompt template (Langchain)

LangChain provides several classes and functions to make constructing and working with prompts easy. We are going to use the [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/getting_started.html) class to construct the prompt from a f-string template. The prompt also uses chat history to provide some context for follow up questions.

In [35]:
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

# turn verbose to true to see the full logs and documents
conversation= ConversationChain(
    llm=cl_llm, verbose=False, memory=ConversationBufferMemory() #memory_chain
)

# langchain prompts do not always work with all the models. This prompt is tuned for Claude
claude_prompt = PromptTemplate.from_template("""

Human: The following is a friendly conversation between a human and an AI.
The AI is talkative and provides lots of specific details from its context. If the AI does not know
the answer to a question, it truthfully says it does not know.

Current conversation:
<conversation_history>
{history}
</conversation_history>

Here is the human's next reply:
<human_reply>
{input}
</human_reply>

Assistant:
""")

conversation.prompt = claude_prompt

print_ww(conversation.predict(input=question))

 Here are some security best practices:

- Use strong and unique passwords - Passwords should be long, complex, and different for each
account. Consider using a password manager.

- Enable two-factor authentication - Adding an extra layer of security beyond just a password is
important for protecting accounts.

- Keep software updated - Maintaining the latest security patches is critical to fix
vulnerabilities. Enable automatic updates when possible.

- Be wary of phishing attempts - Look out for suspicious emails, links, and requests for
information. Verify legitimacy before providing info.

- Use a VPN for public WiFi - Public networks can be insecure. A VPN encrypts your connection to
keep browsing more private.

- Backup your data - Having backups protects against data loss from malware or hardware failure. Use
external hard drives or cloud backup services.

- Use antivirus and anti-malware software - Install this software to scan for and remove malicious
programs. Schedule regular

#### Build on the questions

Let's ask a question about AWS to see if model can understand previous conversation and provide a response related to resource sharing 

In [36]:
print_ww(conversation.predict(input="How does this work in AWS?"))

 Here are some security best practices specifically for using AWS (Amazon Web Services):

- Use IAM policies to grant least privilege - Configure IAM policies to give users and systems the
minimum permissions they need to perform their functions. Avoid the use of root accounts.

- Enable MFA on all accounts - Add multi-factor authentication to all IAM user accounts and root
accounts for extra security. Use virtual MFA devices where possible.

- Use roles for EC2 instances - Provide permissions to EC2 instances by assigning IAM roles instead
of storing access keys on the instances.

- Use security groups to restrict access - Use security groups as firewalls to control inbound and
outbound traffic to your EC2 instances and other resources.

- Encrypt EBS volumes and S3 objects - Leverage encryption features to protect data at rest in
Elastic Block Store volumes and S3 buckets.

- Enable AWS CloudTrail logging - CloudTrail provides a record of all API calls made to your AWS
account for au

### Interactive session using ipywidgets

The following utility class allows us to interact with Claude in a more natural way. We write out the question in an input box, and get Claude's answer. We can then continue our conversation.

In [None]:
import ipywidgets as ipw
from IPython.display import display, clear_output

class ChatUX:
    """ A chat UX using IPWidgets
    """
    def __init__(self, qa, retrievalChain = False):
        self.qa = qa
        self.name = None
        self.b=None
        self.retrievalChain = retrievalChain
        self.out = ipw.Output()


    def start_chat(self):
        print("Starting chat bot")
        display(self.out)
        self.chat(None)


    def chat(self, _):
        if self.name is None:
            prompt = ""
        else: 
            prompt = self.name.value
        if 'q' == prompt or 'quit' == prompt or 'Q' == prompt:
            print("Thank you , that was a nice chat!!")
            return
        elif len(prompt) > 0:
            with self.out:
                thinking = ipw.Label(value="Thinking...")
                display(thinking)
                try:
                    if self.retrievalChain:
                        result = self.qa.run({'question': prompt })
                    else:
                        result = self.qa.run({'input': prompt }) #, 'history':chat_history})
                except:
                    result = "No answer"
                thinking.value=""
                print_ww(f"AI:{result}")
                self.name.disabled = True
                self.b.disabled = True
                self.name = None

        if self.name is None:
            with self.out:
                self.name = ipw.Text(description="You:", placeholder='q to quit')
                self.b = ipw.Button(description="Send")
                self.b.on_click(self.chat)
                display(ipw.Box(children=(self.name, self.b)))

Test how this works in a longer conversation

In [None]:
chat = ChatUX(conversation)
chat.start_chat()

## Chatbot with persona

AI assistant will play the role of a career coach. Role Play Dialogue requires user message to be set in before starting the chat. ConversationBufferMemory is used to pre-populate the dialog

In [None]:
# store previous interactions using ConversationalBufferMemory and add custom prompts to the chat.
memory = ConversationBufferMemory()
memory.chat_memory.add_user_message("You will be acting as a career coach. Your goal is to give career advice to users")
memory.chat_memory.add_ai_message("I am a career coach and give career advice")
cl_llm = Bedrock(model_id="anthropic.claude-v2",client=boto3_bedrock)
conversation = ConversationChain(
     llm=cl_llm, verbose=True, memory=memory
)

conversation.prompt = claude_prompt

print_ww(conversation.predict(input="What are the career options in AI?"))

In [None]:
print_ww(conversation.predict(input="What these people really do? Is it fun?"))

##### Let's ask a question that is not specialty of this Persona and the model shouldn't answer that question and give a reason for that

In [None]:
conversation.verbose = False
print_ww(conversation.predict(input="How to fix my car?"))

## Chatbot with Context 
In this use case we will ask the Chatbot to answer question from some external corpus it has likely never seen before. To do this we apply a pattern called RAG (Retrieval Augmented Generation): the idea is to index the corpus in chunks, then look up which sections of the corpus might be relevant to provide an answer by using semantic similarity between the chunks and the question. Finally the most relevant chunks are aggregated and passed as context to the ConversationChain, similar to providing a history.

We will take a pdf file and use **Titan Embeddings Model** to create vectors for each line of the csv. This vector is then stored in OpenSearch serverless When the chatbot is asked a question, we query OSS with the question and retrieve the text which is semantically closest. This will be our answer. 

#### Titan embeddings Model

Embeddings are a way to represent words, phrases or any other discrete items as vectors in a continuous vector space. This allows machine learning models to perform mathematical operations on these representations and capture semantic relationships between them.

Embeddings are for example used for the RAG [document search capability](https://labelbox.com/blog/how-vector-similarity-search-works/) 


## RAG document processing
You can skip this section if you have already created a vector database

Copy pdf(s) from S3 to local

In [None]:
#make a local directory and download the CSA v4.0

os.makedirs("data", exist_ok=True)
boto3_s3 = boto3.client('s3')
boto3_s3.download_file('372228100697-ea-risk-eval-chatbot-input',
                        'Security-Guidance-v4.0-9-20-21.pdf',
                        'data/Security-Guidance-v4.0-9-20-21.pdf'
                      )

Read the files and chunk the text

In [None]:
import numpy as np
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("data/")

documents = loader.load()
# - in our testing Character split works better with this PDF data set
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size=5000,
    chunk_overlap=200,
)
docs = text_splitter.split_documents(documents)

Quick test to check the first document chunk before loading into vector database

In [None]:
try:
    
    sample_embedding = np.array(bedrock_embeddings.embed_query(docs[0].page_content))
    modelId = bedrock_embeddings.model_id
    print("Embedding model Id :", modelId)
    print("Sample embedding of a document chunk: ", sample_embedding)
    print("Size of the embedding: ", sample_embedding.shape)

except ValueError as error:
    if  "AccessDeniedException" in str(error):
        print(f"\x1b[41m{error}\
        \nTo troubeshoot this issue please refer to the following resources.\
         \nhttps://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_access-denied.html\
         \nhttps://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html\x1b[0m\n")      
        class StopExecution(ValueError):
            def _render_traceback_(self):
                pass
        raise StopExecution        
    else:
        raise error

In [None]:
%pip install -U opensearch-py==2.3.1 langchain==0.0.309 "pypdf>=3.8,<4" \
    apache-beam \
    datasets \
    tiktoken

## Create the OpenSearch vector store
This only needs to be done once, if you have an existing OpenSearch instance vector store then it can be re-used

host can simply be set to your OpenSearch endpoint DNS with port 443 (do not include https)

In [None]:
import boto3
import time
vector_store_name = 'bedrock-workshop-rag'
index_name = "bedrock-workshop-rag-index"
encryption_policy_name = "bedrock-workshop-rag-sp"
network_policy_name = "bedrock-workshop-rag-np"
access_policy_name = 'bedrock-workshop-rag-ap'
identity = boto3.client('sts').get_caller_identity()['Arn']

aoss_client = boto3.client('opensearchserverless')

security_policy = aoss_client.create_security_policy(
    name = encryption_policy_name,
    policy = json.dumps(
        {
            'Rules': [{'Resource': ['collection/' + vector_store_name],
            'ResourceType': 'collection'}],
            'AWSOwnedKey': True
        }),
    type = 'encryption'
)

network_policy = aoss_client.create_security_policy(
    name = network_policy_name,
    policy = json.dumps(
        [
            {'Rules': [{'Resource': ['collection/' + vector_store_name],
            'ResourceType': 'collection'}],
            'AllowFromPublic': True}
        ]),
    type = 'network'
)

collection = aoss_client.create_collection(name=vector_store_name,type='VECTORSEARCH')

while True:
    status = aoss_client.list_collections(collectionFilters={'name':vector_store_name})['collectionSummaries'][0]['status']
    if status in ('ACTIVE', 'FAILED'): break
    time.sleep(10)

access_policy = aoss_client.create_access_policy(
    name = access_policy_name,
    policy = json.dumps(
        [
            {
                'Rules': [
                    {
                        'Resource': ['collection/' + vector_store_name],
                        'Permission': [
                            'aoss:CreateCollectionItems',
                            'aoss:DeleteCollectionItems',
                            'aoss:UpdateCollectionItems',
                            'aoss:DescribeCollectionItems'],
                        'ResourceType': 'collection'
                    },
                    {
                        'Resource': ['index/' + vector_store_name + '/*'],
                        'Permission': [
                            'aoss:CreateIndex',
                            'aoss:DeleteIndex',
                            'aoss:UpdateIndex',
                            'aoss:DescribeIndex',
                            'aoss:ReadDocument',
                            'aoss:WriteDocument'],
                        'ResourceType': 'index'
                    }],
                'Principal': [identity],
                'Description': 'Easy data policy'}
        ]),
    type = 'data'
)

host = collection['createCollectionDetail']['id'] + '.' + os.environ.get("AWS_DEFAULT_REGION", None) + '.aoss.amazonaws.com:443'
print(host)

Create the document search vector object, set the host if you've previously created and populated the vector database

In [37]:
host = "https://9yw7n4vom05uo27uhhw7.us-west-2.aoss.amazonaws.com"
vector_store_name = 'bedrock-workshop-rag'
index_name = "bedrock-workshop-rag-index"
encryption_policy_name = "bedrock-workshop-rag-sp"
network_policy_name = "bedrock-workshop-rag-np"
access_policy_name = 'bedrock-workshop-rag-ap'

In [38]:
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from langchain.vectorstores import OpenSearchVectorSearch

service = 'aoss'
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAuth(credentials, os.environ.get("AWS_DEFAULT_REGION", None), service)

docsearch = OpenSearchVectorSearch.from_documents(
    docs,
    bedrock_embeddings,
    opensearch_url=host,
    http_auth=auth,
    timeout = 100,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection,
    index_name=index_name,
    engine="faiss",
)

Test querying directly against the vector database without using LLM

In [39]:
query = question

results = docsearch.similarity_search(query, k=3)  # our search query  # return 3 most relevant docs
print(dumps(results, pretty=True))

[
  {
    "lc": 1,
    "type": "constructor",
    "id": [
      "langchain",
      "schema",
      "document",
      "Document"
    ],
    "kwargs": {
      "page_content": "Security Guidance v4.0 \u00a9 Copyright 2021, Cloud Security Alliance. All rights reserved76\n6.2 Recommendations\n \u2022Management plane (metastructure) security\n\u2022 Ensure there is strong perimeter security for API gateways and web consoles.\n\u2022 Use strong authentication and MFA.\n\u2022 Maintain tight control of primary account holder/root account credentials and consider \ndual-authority to access them.\n\u2022 Establishing multiple accounts with your provider will help with account granularity \nand to limit blast radius (with IaaS and PaaS).\n\u2022 Use separate super administrator and day-to-day administrator accounts instead of root/\nprimary account holder credentials.\n\u2022 Consistently implement least privilege accounts for metastructure access.\n\u2022 This is why you separate development and

#### Query LLM with vector search context and get response with vector context references

In [40]:
from langchain.chains import RetrievalQA
from langchain.chains import RetrievalQAWithSourcesChain

qa_with_sources = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(search_kwargs={'k': 3}),return_source_documents=True)
print(dumps(qa_with_sources(query), pretty=True))

{
  "query": "What are security best practices?",
  "result": " Based on the given context, some security best practices include:\n\n- Use strong authentication and multi-factor authentication (MFA) for management plane access. \n\n- Maintain tight control of root account credentials and consider dual authority access.\n\n- Establish multiple accounts to limit blast radius. \n\n- Use separate administrator accounts instead of root credentials.\n\n- Implement least privilege access controls.\n\n- Separate development, test and production accounts.\n\n- Ensure strong perimeter security for APIs and web consoles.\n\n- Take a risk-based approach to availability and business continuity. \n\n- Design for high availability within the cloud provider.\n\n- Leverage cloud provider specific features for availability and resilience.\n\n- Consider cross-region or cross-cloud options for redundancy.\n\n- Plan for graceful failure and recovery from cloud outages.\n\nThe key security best practices fo

#### OCR CSA v4.0 using textract to S3. Just an example, don't need to use this

In [None]:
boto3_textract = boto3.client('textract')

#try:
    
response = boto3_textract.start_document_text_detection(
    DocumentLocation={
        'S3Object': {
            'Bucket': '372228100697-ea-risk-eval-chatbot-input',
            'Name': 'Security-Guidance-v4.0-9-20-21.pdf'
        }
    },
    NotificationChannel={
        'SNSTopicArn': 'arn:aws:sns:us-west-2:372228100697:AmazonTextract-372228100697-usw2-chat',
        'RoleArn': 'arn:aws:iam::372228100697:role/textract-notifications'
    },
    OutputConfig={
        'S3Bucket': '372228100697-ea-risk-eval-chatbot-output'
    }
)

print(response)
#except Exception as error:

#### Semantic search

We can use a Wrapper class provided by LangChain to query the vector data base store and return to us the relevant documents. Behind the scenes this is only going to run a RetrievalQA chain.

#### Memory
In any chatbot we will need a QA Chain with various options which are customized by the use case. But in a chatbot we will always need to keep the history of the conversation so the model can take it into consideration to provide the answer. In this example we use the [ConversationalRetrievalChain](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db) from LangChain, together with a ConversationBufferMemory to keep the history of the conversation.

Source: https://python.langchain.com/docs/modules/chains/popular/chat_vector_db

Set `verbose` to `True` to see all the what is going on behind the scenes.

In [None]:
from langchain.chains.conversational_retrieval.prompts import CONDENSE_QUESTION_PROMPT

print_ww(CONDENSE_QUESTION_PROMPT.template)

#### Parameters used for ConversationRetrievalChain
* **retriever**: We used `VectorStoreRetriever`, which is backed by a `VectorStore`. To retrieve text, there are two search types you can choose: `"similarity"` or `"mmr"`. `search_type="similarity"` uses similarity search in the retriever object where it selects text chunk vectors that are most similar to the question vector.

* **memory**: Memory Chain to store the history 

* **condense_question_prompt**: Given a question from the user, we use the previous conversation and that question to make up a standalone question

* **chain_type**: If the chat history is long and doesn't fit the context you use this parameter and the options are `stuff`, `refine`, `map_reduce`, `map-rerank`

If the question asked is outside the scope of context, then the model will reply it doesn't know the answer

**Note**: if you are curious how the chain works, uncomment the `verbose=True` line.

In [41]:
# turn verbose to true to see the full logs and documents
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

memory_chain = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
qa = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=docsearch.as_retriever(search_kwargs={'k': 3}), 
    memory=memory_chain,
    condense_question_prompt=CONDENSE_QUESTION_PROMPT,
    #verbose=True, 
    chain_type='stuff', # 'refine',
    #max_tokens_limit=300
)

In [42]:
query = question
print(dumps(qa(query), pretty=True))

{
  "question": "What are security best practices?",
  "chat_history": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "messages",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "What are security best practices?"
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "messages",
        "AIMessage"
      ],
      "kwargs": {
        "content": " Based on the context provided, some security best practices mentioned are:\n\n- Use strong authentication and multi-factor authentication (MFA) for management plane/metastructure access. \n\n- Maintain tight control of root account credentials and consider dual authority access.\n\n- Establish multiple accounts with granular privileges to limit blast radius. \n\n- Separate administrator accounts instead of using root credentials.\n\n- Implement least privilege access controls.\n\n- Separa

In [43]:
query = "How does that apply to AWS"
print(dumps(qa(query), pretty=True))

{
  "question": "How does that apply to AWS",
  "chat_history": [
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "messages",
        "HumanMessage"
      ],
      "kwargs": {
        "content": "What are security best practices?"
      }
    },
    {
      "lc": 1,
      "type": "constructor",
      "id": [
        "langchain",
        "schema",
        "messages",
        "AIMessage"
      ],
      "kwargs": {
        "content": " Based on the context provided, some security best practices mentioned are:\n\n- Use strong authentication and multi-factor authentication (MFA) for management plane/metastructure access. \n\n- Maintain tight control of root account credentials and consider dual authority access.\n\n- Establish multiple accounts with granular privileges to limit blast radius. \n\n- Separate administrator accounts instead of using root credentials.\n\n- Implement least privilege access controls.\n\n- Separate deve

In [None]:
chat = ChatUX(qa, retrievalChain=True)
chat.start_chat()

Your mileage might vary, but after 2 or 3 questions you will start to get some weird answers. In some cases, even in other languages.
This is happening for the same reasons outlined at the beginning of this notebook: the default langchain prompts are not optimal for Claude. 
In the following cell we are going to set two new prompts: one for the question rephrasing, and one to get the answer from that rephrased question.

In [44]:
# turn verbose to true to see the full logs and documents
from langchain.chains import ConversationalRetrievalChain
from langchain.schema import BaseMessage


# We are also providing a different chat history retriever which outputs the history as a Claude chat (ie including the \n\n)
_ROLE_MAP = {"human": "\n\nHuman: ", "ai": "\n\nAssistant: "}
def _get_chat_history(chat_history):
    buffer = ""
    for dialogue_turn in chat_history:
        if isinstance(dialogue_turn, BaseMessage):
            role_prefix = _ROLE_MAP.get(dialogue_turn.type, f"{dialogue_turn.type}: ")
            buffer += f"\n{role_prefix}{dialogue_turn.content}"
        elif isinstance(dialogue_turn, tuple):
            human = "\n\nHuman: " + dialogue_turn[0]
            ai = "\n\nAssistant: " + dialogue_turn[1]
            buffer += "\n" + "\n".join([human, ai])
        else:
            raise ValueError(
                f"Unsupported chat history format: {type(dialogue_turn)}."
                f" Full chat history: {chat_history} "
            )
    return buffer

# the condense prompt for Claude
condense_prompt_claude = PromptTemplate.from_template("""{chat_history}

Answer only with the new question.


Human: How would you ask the question considering the previous conversation: {question}


Assistant: Question:""")

# recreate the Claude LLM with more tokens to sample - this provides longer responses but introduces some latency
#cl_llm = Bedrock(model_id="anthropic.claude-v2", client=boto3_bedrock, model_kwargs={"max_tokens_to_sample": 1000})
memory_chain = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
qa = ConversationalRetrievalChain.from_llm(
    llm=llm, 
    retriever=docsearch.as_retriever(search_kwargs={'k': 3}),
    #retriever=vectorstore_faiss_aws.as_retriever(search_type='similarity', search_kwargs={"k": 8}),
    memory=memory_chain,
    get_chat_history=_get_chat_history,
    #return_source_documents=True,
    #verbose=True,
    condense_question_prompt=condense_prompt_claude, 
    chain_type='stuff', # 'refine',
    #max_tokens_limit=300
)

# the LLMChain prompt to get the answer. the ConversationalRetrievalChange does not expose this parameter in the constructor
qa.combine_docs_chain.llm_chain.prompt = PromptTemplate.from_template("""
{context}

Human: Use at maximum 5 sentences to answer the question inside the <q></q> XML tags. 

<q>{question}</q>

Do not use any XML tags in the answer. If the answer is not in the context say "Sorry, I don't know as the answer was not found in the context"

Assistant:""")

In [45]:
chat = ChatUX(qa, retrievalChain=True)
chat.start_chat()

Starting chat bot


Output()

#### Do some prompt engineering

You can "tune" your prompt to get more or less verbose answers. For example, try to change the number of sentences, or remove that instruction all-together. You might also need to change the number of `max_tokens_to_sample` (eg 1000 or 2000) to get the full answer.

### In this demo we used Claude LLM to create conversational interface with following patterns:

1. Chatbot (Basic - without context)

2. Chatbot using prompt template(Langchain)

3. Chatbot with personas

4. Chatbot with context