### This notebook contains solutions for the choose your own adventure notebook.

In [None]:
import vertexai
from google.cloud import aiplatform

print(f"Vertex AI SDK version: {aiplatform.__version__}")

import langchain

print(f"LangChain version: {langchain.__version__}")

from langchain.embeddings import VertexAIEmbeddings
from langchain.llms import VertexAI
from langchain.chat_models import ChatVertexAI
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain import LLMChain
from langchain.chains import RetrievalQA
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from pydantic import BaseModel

from utils.matching_engine import MatchingEngine
from utils.matching_engine_utils import MatchingEngineUtils

### Project and matching engine settings
Note: copy these from notebook 01 for maximum efficiecy

In [None]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
# PROJECT_ID = "YOUR_PROJECT_HERE"  # @param {type:"string"}
REGION = "us-central1"  # @param {type:"string"}
ME_REGION = "us-central1"
ME_INDEX_NAME = f"{PROJECT_ID}-me-index"  # @param {type:"string"}
ME_EMBEDDING_DIR = f"{PROJECT_ID}-me-bucket"  # @param {type:"string"}
ME_DIMENSIONS = 768  # when using Vertex PaLM Embedding

In [None]:
mengine = MatchingEngineUtils(PROJECT_ID, ME_REGION, ME_INDEX_NAME)

In [None]:
ME_INDEX_ID, ME_INDEX_ENDPOINT_ID = mengine.get_index_and_endpoint()
print(f"ME_INDEX_ID={ME_INDEX_ID}")
print(f"ME_INDEX_ENDPOINT_ID={ME_INDEX_ENDPOINT_ID}")

In [None]:
# create embeddings object
embeddings = VertexAIEmbeddings()
# initialize vector store
me = MatchingEngine.from_components(
    project_id=PROJECT_ID,
    region=ME_REGION,
    gcs_bucket_name=f"gs://{ME_EMBEDDING_DIR}".split("/")[2],
    embedding=embeddings,
    index_id=ME_INDEX_ID,
    endpoint_id=ME_INDEX_ENDPOINT_ID,
)

### Vector lookup
Perform a direct lookup of the question "What are video localized narratives?" using k=2.  Take a look at https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.matching_engine.MatchingEngine.html#langchain.vectorstores.matching_engine.MatchingEngine.similarity_search for reference!

In [None]:
#Your code goes here

### Building a retriever
Next, build a retriever from your Vertex Vector Search, take a look at https://python.langchain.com/docs/modules/data_connection/retrievers/vectorstore for reference.  Hint: your vector store object is "me". Perform a retriever lookup of the same question - "What are video localized narratives?" using similarity search with 10 results and a score of 0.6.

In [None]:
#Your code goes here

### Do a lookup against the retriever

Use the same question, "What are video localized narratives?"

What is different than the direct search against the vector store?  Why?

In [None]:
#Your code goes here

### Create model objects
Create a model object named "text_bison" using text-bison, with a temperature of 0.2, max output tokens of 1024, top_k of 40, and top_p of 0.95.  
Hint 1: https://api.python.langchain.com/en/latest/llms/langchain.llms.vertexai.VertexAI.html, https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm
Hint 2: You can pass arguments into VertexAI()


In [None]:
#Your code goes here

### Build a basic LLMChain
Using https://python.langchain.com/docs/modules/chains/foundational/llm_chain and the model object you created, ask text-bison what year Google went public and print the response.

Hint: use a simple prompt template, don't overthink it!

In [None]:
#Your code goes here

### Build a better prompt template
https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/

Build a prompt template with two input variables, {question} and {context}.  {question} will be the question you're asking the model, and {context} will be the documents retrieved from the vector store.  Remember all of the thing you know about good prompt writing!

It can be helpful to offset the context with delimiters like "==========" or to bound the context with pseudo-markdown.  Write the prompt template so that you're instructing the model to ONLY respond from the provided context to minimize
hallucinations.  

In [None]:
#Your code goes here:

prompt_template = """ """

### Build a QA chain

Using the model object for text-bison and the prompt_template you created previously, create a RetrievalQA chain that combines the prompt, model, and retreiver you've already created.
https://python.langchain.com/docs/use_cases/question_answering/how_to/vector_db_qa.  

Hint: Use the following parameters after setting the llm parameter:

chain_type="stuff",
retriever=retriever,
return_source_documents=True,
verbose=False,
chain_type_kwargs={
    "prompt": PromptTemplate(
        template=prompt_template,
        input_variables=["context", "question"],
    ),
},


In [None]:
#Your code goes here

### Ask the QA chain 
Ask the question "What are video localized narratives?"

Hint: use similar syntax as when you called the LLMChain above.

In [None]:
#Your code goes here

### Parse the response
Print only the response to the question.  

Hint: the dict key is "result".

In [None]:
# Your code goes here

### Try and break the context

Try different question combinations to see if you can break out of the prompt context.  Ask things like "what is apple pie" and "how to I make a peanut butter and jelly sandwich?"

What happened, and why?

If you weren't able to break out of your prompt, return to the prompt creation cell and edit the prompt until it tells you how to make a delicious peanut butter and jelly sandwich.

In [None]:
# Your code goes here

### Implementing memory

Create a ConversationBufferMemory object https://python.langchain.com/docs/modules/memory/types/buffer named "memory" with memory_key='chat_history', input_key='question', and output_key='answer'.

Set return_messages to False.

In [None]:
# Your code goes here

### Creating a new prompt

This one is a little challenging - create a new prompt using a ChatPromptTemplate.  Specify the context, chat_history, and question as variables in the prompt as the system template.

Create another prompt as the user template with the variable question.

Try and create a prompt maximally grounded in the context provided from the vector lookup.

https://api.python.langchain.com/en/latest/prompts/langchain.prompts.chat.ChatPromptTemplate.html

In [None]:
general_system_template = """ """
general_user_template = """ """
messages = [
            SystemMessagePromptTemplate.from_template(general_system_template),
            HumanMessagePromptTemplate.from_template(general_user_template)
]
qa_prompt = ChatPromptTemplate.from_messages(messages)

### Create a ConversationalRetrievalChain

Create a ConversationalRetrievalChain using the text_bison object as the llm and the following parameters:

retriever=retriever,
verbose=False, 
chain_type="stuff",
memory=memory,
get_chat_history=lambda h : h,
return_source_documents=True,
combine_docs_chain_kwargs={'prompt': qa_prompt}

https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db



In [None]:
#Your code goes here

### Ask a question

Ask conv_qa_chain the question, "What are video localized narratives?"

In [None]:
#Your code goes here

### Follow-up questions

Ask a follow-up question, "What do they empower?"

Make sure you take a look at the chat_history!

In [None]:
# Your code goes here

### Rebuild the conv_qa_chain and clear the memory
Set verbose=True

To clear memory, take a look at https://github.com/langchain-ai/langchain/issues/6585#issuecomment-1602935899

In [None]:
#Your code goes here

### Ask the same question

"What are video localized narratives?"

Read through the (extremely verbose) response.  What is happening?

In [None]:
#Your code goes here

### Ask the same follow-up:

Ask the follow-up question, "What do they empower?"

Look very closely at the end of the chain where it says "Human: Question:" - was the actual question submitted what you wrote?  Why or why not?  Hint: https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db#using-a-different-model-for-condensing-the-question


In [None]:
# Your code goes here

### Future/bonus content

Consider how you might use https://www.gradio.app/ or https://streamlit.io/ to build demos for customers.  

Note - using them in a Vertex notebook is not a great experience, you're usually better off using local development.
