# ReAct Zero-Shot from YouTube Transcript Chat Bot with LangChain and Gradio UI

This notebook shows how to use the YouTube transcript API to turn content into embeddings, store those embeddings in a persistent local database, query the database directly for relevant pieces of text, and then how to use the database as a source for an LLM hooked into LangChain to create a zero-shot ReAct agent you can ask questions to.

- https://python.langchain.com/docs/integrations/providers/youtube
- https://python.langchain.com/docs/integrations/document_loaders/youtube_transcript

## ChromaDB Persistence

Each notebook that uses ChromaDB follows the same pattern for persistence.

If the directory already exists that ChromaDB would be writing it's data to, it will load the existing database. If the directory does not exist, it will create a new database.

If you change parameters that affect the embeddings generation (like swapping in a new YouTube URL), you'll need to delete the database directory to force a new database to be created.

This can be done by running the following from the root of the repository. If the ChromaDB directory is `data/chromadb/youtube_transcripts`, you'd run the following to delete it:

```sh
rm -rf data/chromadb/youtube_transcripts
```

or if you run into permissions issues:

```sh
sudo rm -rf data/chromadb/youtube_transcripts
```

In [1]:
import os

# ****************** [START] Google Cloud project settings ****************** #
project =  os.getenv('GCP_PROJECT')
location = os.environ.get('GCP_REGION', 'us-central1')
# ******************* [END] Google Cloud project settings ******************* #


# *********************** [START] Embeddings config ************************* #
# set rate limiting options for Vertex AI embeddings
embeddings_requests_per_minute = 100
embeddings_num_instances_per_batch = 5
# *********************** [END] Embeddings config *************************** #


# ********************** [START] data directory config ********************** #
from helpers.files import get_data_dir
data_dir = get_data_dir()
chroma_db_dir = f'{data_dir}/chromadb'
chroma_db_youtube_transcript_dir = f'{chroma_db_dir}/youtube_transcripts'
# *********************** [END] data directory config *********************** #


# ********************** [START] LLM data config **************************** #
from helpers.files import file_exists

collection_name = 'youtube-transcript'
load_documents = True
if file_exists(chroma_db_youtube_transcript_dir):
    load_documents = False
# *********************** [END] LLM data config ***************************** #


# *********************** [START] LLM parameter config ********************** #
# Vertex AI model to use for the LLM
model_name='chat-bison@002'

# maximum number of model responses generated per prompt
candidate_count = 3

# determines the maximum amount of text output from one prompt.
# a token is approximately four characters.
max_output_tokens = 2048

# temperature controls the degree of randomness in token selection.
# lower temperatures are good for prompts that expect a true or
# correct response, while higher temperatures can lead to more
# diverse or unexpected results. With a temperature of 0 the highest
# probability token is always selected. for most use cases, try
# starting with a temperature of 0.2.
temperature = 0.2

# top-p changes how the model selects tokens for output. Tokens are
# selected from most probable to least until the sum of their
# probabilities equals the top-p value. For example, if tokens A, B, and C
# have a probability of .3, .2, and .1 and the top-p value is .5, then the
# model will select either A or B as the next token (using temperature).
# the default top-p value is .8.
top_p = 0.8

# top-k changes how the model selects tokens for output.
# a top-k of 1 means the selected token is the most probable among
# all tokens in the model’s vocabulary (also called greedy decoding),
# while a top-k of 3 means that the next token is selected from among
# the 3 most probable tokens (using temperature).
top_k = 40

# how verbose the llm and langchain agent is when thinking
# through a prompt. you're going to want this set to True
# for development so you can debug its thought process
verbose = True
# *********************** [END] LLM parameter config ************************ #


# ********************** [START] Configuration Checks *********************** #
if not project:
    raise Exception('GCP_PROJECT environment variable not set')
# *********************** [END] Configuration Checks ************************ #

## Import and Initialize Vertex AI Client

This will complain about not having cuda drivers and the GPU not being used. You can safely ignore that. If you want to use the GPU, that's possible in Linux with Docker, but you'll need to set up a non-containerized development environment to use GPUs with MacOS.

In [2]:
from google.cloud import aiplatform
import vertexai

vertexai.init(project=project, location=location)

print(f"Vertex AI SDK version: {aiplatform.__version__}")


2023-12-17 03:36:46.773910: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-17 03:36:46.775536: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-17 03:36:46.793813: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-17 03:36:46.793849: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-17 03:36:46.793861: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to regi

Vertex AI SDK version: 1.38.1


## Import LangChain

This doesn't actually initialize anything, it just lets us print the version.

In [3]:
import langchain

print(f"LangChain version: {langchain.__version__}")


LangChain version: 0.0.350


## Configure LLM with Vertex AI

In [4]:
from langchain.chat_models import ChatVertexAI

llm = ChatVertexAI(
    model_name=model_name,
    max_output_tokens=max_output_tokens,
    temperature=temperature,
    top_p=top_p,
    top_k=top_k,
    verbose=verbose,
    n=candidate_count,
)


## Initialize Embeddings Function with Vertex AI

There are other options for creating embeddings. I was interested in sticking with Google products here.

In [5]:
from langchain.embeddings import VertexAIEmbeddings

# https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.vertexai.VertexAIEmbeddings.html
embeddings = VertexAIEmbeddings(
    requests_per_minute=embeddings_requests_per_minute,
    num_instances_per_batch=embeddings_num_instances_per_batch,
    model_name = "textembedding-gecko@latest"
)

## Get YouTube Documents

In [6]:
from langchain.document_loaders import YoutubeLoader

if load_documents:
    loader = YoutubeLoader.from_youtube_url(
        "https://www.youtube.com/watch?v=cTjQp_TQlXo",
        add_video_info=True,
    )

    documents = loader.load()

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

def split_docs(documents, chunk_size=1500, chunk_overlap=0):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

if load_documents:
    transformed_docs = split_docs(documents)


## Create ChromaDB Database

In [8]:
from helpers.files import file_exists
from langchain.vectorstores import Chroma

# if a vector db already exists, load it instead of creating a new one
if load_documents:
  # https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.chroma.Chroma.html#langchain.vectorstores.chroma.Chroma.from_documents
  db = Chroma.from_documents(
    transformed_docs,
    embeddings,
    collection_name=collection_name,
    persist_directory=chroma_db_youtube_transcript_dir
  )
else:
  db = Chroma(
    persist_directory=chroma_db_youtube_transcript_dir,
    embedding_function=embeddings,
    collection_name=collection_name
  )

## Query the vector database directly

- https://python.langchain.com/docs/modules/data_connection/vectorstores/

In [9]:
def print_db_docs(search_type, docs):
    print('---')
    print(f"Matching documents ({search_type}): {len(docs)}")

    # print out the first 5 results
    for doc in docs[:5]:
        print(doc)


query = "Will who was the Greatest Knight?"
docs = db.similarity_search(query)
print_db_docs("similarity", docs)

docs = db.max_marginal_relevance_search(query)
print_db_docs("max marginal relevance", docs)


---
Matching documents (similarity): 4
page_content="away with his family around him foreign was taken to London where a large crowd of Barons escorted him to Westminster Abbey for a vigil and a mass before he was finally laid to rest in Temple Church in his funeral oration Archbishop Stephen Langton described William Marshall as the greatest Knight to be found in all the world he left his household in a magnificent position and he would have been sure that his lion would have gone on for Generations as one of the most powerful families in England unfortunately none of his five Sons were able to Father an heir so the male lion was extinguished after just one generation however his son William engaged a writer to record The Narrative of his father's extraordinary life based on written evidence and the stories of the men who knew him this is the first known biography of a medieval Knight and this remarkable Legacy provides a unique insight into the period and into one of the most remarka

## Make Retrieval QA Chain

In [10]:
retriever = db.as_retriever()
retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever
)

## Ask the Retrieval QA Chain Some Questions

In [11]:
def print_retrieval_qa_results(result):
    print('---')
    print(f"Query: {result['query']}")
    print(f"Result: {result['result']}")


query = "Who was the Greatest Knight?"
result = retrieval_qa({'query': query})
print_retrieval_qa_results(result)


---
Query: Who was the Greatest Knight?
Result:  Archbishop Stephen Langton called William Marshall the greatest Knight in the world.


## Build Chat Bot Chain

This will provide knowledge about the YouTube video to the chat bot.

In [14]:
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain.prompts import ChatPromptTemplate

template = """Answer the question based only on the following context:

{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

def format_docs(docs):
    return "\n\n".join([d.page_content for d in docs])

# https://python.langchain.com/docs/modules/data_connection/retrievers/
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [15]:
import gradio as gr

def chat_response(message, history):
  return chain.invoke(message)

demo = gr.ChatInterface(chat_response)

demo.launch(
  server_name="0.0.0.0",
  server_port=5000,
)

Running on local URL:  http://0.0.0.0:5000

To create a public link, set `share=True` in `launch()`.


