<a href="https://colab.research.google.com/github/sharmaprateek/scripts/blob/master/Praxa_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Lesson 3: Vector Database


##Getting Started
If you're new to Google Colab, download and review the [Getting Started with Colab](https://uploads.smart.ly/assets/49f329a834468c6f6e9010cbf337a2753b22d35c245e49fc00d4b89e4ceb10fa/original/49f329a834468c6f6e9010cbf337a2753b22d35c245e49fc00d4b89e4ceb10fa.pdf) guide.

Your code and data will run in the `/content` directory. Create a subdirectory in `/content` called `context_data` and upload the [context documents for the course](https://uploads.smart.ly/assets/9af2030979d9b37119354aa47b0ee7e7746e124406400f75a1588f40379b43a1/original/9af2030979d9b37119354aa47b0ee7e7746e124406400f75a1588f40379b43a1.zip) into `context_data`.

You'll also need an API key from Hugging Face. Visit their [signup page](https://huggingface.co/join), enter your email and a password, then complete your profile. Once you have an account and are signed in, go to [Settings | Access Tokens](https://huggingface.co/settings/tokens) and select "New token." Write tokens allow you to post to Hugging Face, which you won't be doing here, so you only need a read-type token.

Once you have your token, enter it below and run the code in the cell by clicking the play button on its left. Note that all commands at the shell prompt, such as `pip` below, should be preceded with a bang `!`.

In [None]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = "your-API-token"

LangChain touches all aspects of this app, so let's go ahead and install it now.

In [None]:
!pip install langchain==0.1.13 langchain-community==0.0.29 langchain-core==0.1.40 pypdf==4.1.0 langchain-text-splitters==0.0.1 sentence_transformers==2.6.1 langchain-chroma==0.1.0 huggingface_hub==0.23.5 transformers==4.38.2 streamlit

##Loading Context Documents
The first step in building the vector database is to load the context documents. Load them into a variable named `context_data`.

Now let's verify that the documents loaded by printing the content of each page. Scroll to the end of a line to see what metadata the document loader includes.

In [None]:
for page in context_data:
  print(page)

##Chunking
Now it's time to split the documents into chunks that will work with the LLM's context window. Store them in a variable named `chunks`.

Verify it worked by exploring how the documents were chunked.

In [None]:
print(f"Total Document Chunks: {len(chunks)}")
print("-----")
print("Length of each chunk:")

for num, chunk in enumerate(chunks):
  print(f"Chunk {num} (from page {chunk.metadata['page'] + 1}): {len(chunk.page_content)} characters")

print("-----")
print("Chunk 0")
print(chunks[0].metadata)
print(chunks[0].page_content)
print("-----")
print("Chunk 1")
print(chunks[1].metadata)
print(chunks[1].page_content)


##Embedding

Now it's time to set up the embedding function. Assign it to a variable named `embedding_function`.

Make sure your model works by finding the embedding for a test sentence.

In [None]:
embedding = embedding_function.embed_query("This is a test sentence.")
print(f"Embedding length: {len(embedding)}")
embedding = embedding_function.embed_query("This is a longer test sentence.")
print(f"Embedding length: {len(embedding)}")

##Persisting

Now it's time for the vector store. Assign it the name `chromadb`.

Now test it by executing a similarity search.

In [None]:
retrieved_chunks = chromadb.similarity_search("Two people who take a vacation together.")
print(f"Query retrieved {len(retrieved_chunks)} chunks.")
for chunk in retrieved_chunks:
  print(f"Chunk content: {chunk.page_content}")
  print(f"Chunk metadata: {chunk.metadata}")

#Lesson 4: LangChain and Language Models

###Getting the LLM
We use the `HuggingFaceHub` API to instantiate the LLM.

Let's invoke the LLM with a few prompts it should be able to handle. Take note of the answers, which are based solely on the model's training data.

In [None]:
response = llm.invoke("List Tawfiq al-Hakim's plays by title as a comma-separated list.")
print(response)
response = llm.invoke("List Jez Butterworth's plays.")
print(response)
response = llm.invoke("What Broadway plays have had over 10,000 performances?")
print(response)

###Setting up a Prompt Template
We'll now build a simple prompt template to make our interface with the LLM a bit more generic.

Let's test it out!

In [None]:
print(prompt)
response = llm.invoke(prompt.format(playwright="Jez Butterworth"))
print(response)

###Output Parsers
While we're exploring the Model I/O module let's take a quick look at how the output parser in the Quickstart works.

In [None]:
from langchain.output_parsers import CommaSeparatedListOutputParser
output_parser = CommaSeparatedListOutputParser()
response = output_parser.parse(llm.invoke(prompt.format(playwright="Jez Butterworth")))
print(response)

## LangChain Expression Language (LCEL)
The "Chain" in "LangChain" refers to the ability to chain several actions into one invocation. This replaces your nested calls to `output_parser()`, `llm.invoke()`, and `prompt.format()`. Try to build a chain for what you have here.

#Lesson 5: RAG Using LangChain

##Build a Prompt Template
We'll start with a prompt template that combines the context and original question and provides instructions to the model on how to use both.

To get the context, we'll use a *retriever*. It takes a string as the input query and returns a `list` of `Document` objects.

Run it to see what it outputs.

In [None]:
docs = retriever.get_relevant_documents("List Jez Butterworth's plays.")
print(f"Found {len(docs)} documents:")

for doc in docs:
  print(doc)

The final form we're going for is `chain.invoke(user_question)`. We'll need the `user_question` for two things in this prompt: the question itself and finding the context from the vector database. Doing multiple things to one input is the job of a `RunnableParallel`. Let's create one that does that.

Let's see what that looks like.

In [None]:
context_and_question.invoke("List Jez Butterworth's plays.")

To use the context docs in a prompt, we're going to need to convert them to a string. We'll use a `RunnablePassthrough` to assign that string to the `context` key the prompt needs. Note that the `question` attribute from `context_docs_and_question` gets passed through.

In [None]:
def convert_context_docs(to_convert):
    # Take the page_content attribute of each Document object
    # and join them into one string, separated by two newlines.
    return "\n\n".join(doc.page_content for doc in to_convert["context_docs"])



Let's see how all this works with our prompt.

In [None]:
complete_prompt_chain = context_and_question | convert_context | prompt
complete_prompt_chain.invoke("List Jez Butterworth's plays.")

Now we'll build the final chain for our app.

And run it to see what results we get. Here we should see that "The Hills of California" is included in the list of plays, even though it occurred after the training cutoff of the model.

If you don't see "The Hills of California" at the bottom, try starting a new Colab runtime and running the code in the notebook again. This resets the model and

In [None]:
result = chain.invoke("List Jez Butterworth's plays.")
print(result)

Now we'll build a chain that passes the source citations, which were in the metadata field of the `list` of `Document` objects returned from the retriever. We'll use `RunnableParallel` to pass the `list` to the end of the chain while also passing it to a chain that builds the prompt and invokes the model.

Now run it to see what we got. When we asked this question without context, the model told us that "The Phantom of the Opera" was the only play that had more than 10,000 performances. When the context data is included, we add "Chicago," "The Lion King," and "Wicked."

In [None]:
result = chain_with_sources.invoke("What Broadway plays have had over 10,000 performances?")
print("The docs used in this answer:")
print("\n".join(doc.metadata.__repr__() for doc in result["context_docs"]))
print("\nThe answer:")
print(result["answer"])

#Lesson 6: User Interface
While not directly related to LLMs or AI in general, user interfaces are essential to making an app approachable. We'll use Streamlit to build a basic front end for our app.
##Getting Started
First we need to install an npm package that will allow us to expose the Colab runtime to IP traffic.

Now we'll create a simple "Hello World" app. This illustrates how simple Streamlit is to use.

Now we need to run the app and view it in a browser. There are a few steps to this:
* Start the Streamlit server using the *app.py* script.
* Set up a local tunnel to get a URL that connects to the Colab runtime.
* Get the public IP of the Colab runtime to gain access to the localtunnel-created URL.

We'll do this all on one command line. The command will display the public IP of the Colab runtime then a link to the Streamlit server. When you click the link you'll be asked for the tunnel password, which is the Colab runtime's public IP.

Note that this cell will continue running the server until you manually stop it. No other cells in the notebook can run while this cell is running. Stop the cell by selecting the stop button to its left.

In [None]:
!streamlit run app.py &>/content/logs.txt & npx localtunnel --port 8501 & curl ipv4.icanhazip.com

##Building the Backend
Up to this point we've been running the Python instructions in interactive mode in the Colab notebook. For our app to work as a backend, we need to make the code available in a module that the front end can import. Let's go back through our code and copy the needed elements into a single Python file.

In [None]:
%%writefile backend.py
from langchain.prompts import PromptTemplate
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema.runnable import RunnablePassthrough, RunnableParallel
from langchain_community.llms import HuggingFaceHub

prompt = PromptTemplate.from_template("""
You are an assistant providing answers to questions about
the theater. In addition to your training data, you are to
use the additional context provided below to provide
up-to-date information.
Question: {question}
Context: {context}
Answer:
""")

embedding_function = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectordb = Chroma(persist_directory="./chromadb",
                  embedding_function=embedding_function)
retriever = vectordb.as_retriever()

context_and_question = RunnableParallel(
    {"context_docs": retriever, "question": RunnablePassthrough()}
)

def convert_context_docs(to_convert):
    # Take the page_content attribute of each Document object
    # and join them into one string, separated by two newlines.
    return "\n\n".join(doc.page_content for doc in to_convert["context_docs"])

convert_context = RunnablePassthrough.assign(context=convert_context_docs)

llm = HuggingFaceHub(
    repo_id="mistralai/Mixtral-8x7B-Instruct-v0.1",
    task="text-generation",
    model_kwargs={
        "max_new_tokens": 512,
        "top_k": 30,
        "temperature": 0.1,
        "repetition_penalty": 1.03,
    },
)

answer_chain = convert_context | prompt | llm
chain_with_sources = context_and_question.assign(answer=answer_chain)

def answer_and_sources(question):
    result = chain_with_sources.invoke(question)
    response_text = result["answer"]
    answer_index = response_text.rfind("Answer:")
    answer_text = response_text[answer_index + len("Answer:"):].strip()
    sources = "\n\n".join(f"{doc.metadata['source']}, page {doc.metadata['page']}" for doc in result["context_docs"])
    return {"answer": answer_text,
            "sources": sources
            }


Now test the back end manually to make sure it works.

In [None]:
import backend, importlib
importlib.reload(backend)
print(backend.answer_and_sources("List Jez Butterworth's plays."))

##Building the Interface
Now let's use Streamlit's example chat app to build the interface for Praxa.

Now we run the whole app.

In [None]:
!streamlit run praxa.py &>/content/logs.txt & npx localtunnel --port 8501 & curl ipv4.icanhazip.com