# React Zero-Shot from URL Sources with LangChain and Selenium

## ChromaDB Persistence

Each notebook that uses ChromaDB follows the same pattern for persistence.

If the directory already exists that ChromaDB would be writing it's data to, it will load the existing database. If the directory does not exist, it will create a new database.

If you change parameters that affect the embeddings generation (like swapping in a new PDF file), you'll need to delete the database directory to force a new database to be created.

This can be done by running the following from the root of the repository. If the ChromaDB directory is `data/chromadb/url_source`, you'd run the following to delete it:

```sh
rm -rf data/chromadb/url_source
```

or if you run into permissions issues:

```sh
sudo rm -rf data/chromadb/url_source
```

In [1]:
import os

# ****************** [START] Google Cloud project settings ****************** #
project =  os.getenv('GCP_PROJECT')
location = os.environ.get('GCP_REGION', 'us-central1')
# ******************* [END] Google Cloud project settings ******************* #


# *********************** [START] Embeddings config ************************* #
# set rate limiting options for Vertex AI embeddings
embeddings_requests_per_minute = 100
embeddings_num_instances_per_batch = 5
# *********************** [END] Embeddings config *************************** #


# ********************** [START] data directory config ********************** #
# local directory to write chroma db persistence to
# or pull files like PDFs from to create embeddings
from helpers.files import get_data_dir
data_dir = get_data_dir()

chroma_db_dir = f'{data_dir}/chromadb'
chroma_db_url_source_dir = f'{chroma_db_dir}/url_source'
# *********************** [END] data directory config *********************** #


# ********************** [START] LLM data config **************************** #
from helpers.files import file_exists

collection_name = 'url-source'
load_documents = True
if file_exists(chroma_db_url_source_dir):
    load_documents = False
# *********************** [END] LLM data config ***************************** #


# *********************** [START] URL tool config *************************** #
urls = [
    "https://www.imup.io/",
]

# tool config for the langchain react agent
tool_name = "imUp.io marketing document reader"
tool_description = "Use this to read imUp.io's mission, how it works, testimonials, plans, FAQ, and more."
# *********************** [END] URL tool config ***************************** #


# ******************* [START] vector db retriever config ******************** #
# experiment with:
# - mmr
# - similarity
db_search_type = "similarity"
db_search_kwargs = {"k": 5}
# ********************* [END] vector db retriever config ******************** #


# *********************** [START] LLM parameter config ********************** #
# Vertex AI model to use for the LLM
model_name='text-bison@002'

# maximum number of model responses generated per prompt
candidate_count = 1

# determines the maximum amount of text output from one prompt.
# a token is approximately four characters.
max_output_tokens = 1024

# temperature controls the degree of randomness in token selection.
# lower temperatures are good for prompts that expect a true or
# correct response, while higher temperatures can lead to more
# diverse or unexpected results. With a temperature of 0 the highest
# probability token is always selected. for most use cases, try
# starting with a temperature of 0.2.
temperature = 0.2

# top-p changes how the model selects tokens for output. Tokens are
# selected from most probable to least until the sum of their
# probabilities equals the top-p value. For example, if tokens A, B, and C
# have a probability of .3, .2, and .1 and the top-p value is .5, then the
# model will select either A or B as the next token (using temperature).
# the default top-p value is .8.
top_p = 0.8

# top-k changes how the model selects tokens for output.
# a top-k of 1 means the selected token is the most probable among
# all tokens in the model’s vocabulary (also called greedy decoding),
# while a top-k of 3 means that the next token is selected from among
# the 3 most probable tokens (using temperature).
top_k = 40

# how verbose the llm and langchain agent is when thinking
# through a prompt. you're going to want this set to True
# for development so you can debug its thought process
verbose = True
# *********************** [END] LLM parameter config ************************ #


# ********************** [START] Configuration Checks *********************** #
if not project:
    raise Exception('GCP_PROJECT environment variable not set')
# *********************** [END] Configuration Checks ************************ #


## Import and Initialize Vertex AI Client

This will complain about not having cuda drivers and the GPU not being used. You can safely ignore that. If you want to use the GPU, that's possible in Linux with Docker, but you'll need to set up a non-containerized development environment to use GPUs with MacOS.

In [2]:
from google.cloud import aiplatform
import vertexai

vertexai.init(project=project, location=location)

print(f"Vertex AI SDK version: {aiplatform.__version__}")


2023-12-16 21:52:31.408295: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-16 21:52:31.409877: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-16 21:52:31.428254: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-16 21:52:31.428279: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-16 21:52:31.428292: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to regi

Vertex AI SDK version: 1.38.1


## Import LangChain

This doesn't actually initialize anything, it just lets us print the version.

In [3]:
import langchain

print(f"LangChain version: {langchain.__version__}")


LangChain version: 0.0.350


## Configure LLM with Vertex AI

- https://github.com/langchain-ai/langchain/blob/34e6f3ff72067af3265341bcea7983c106f15a74/libs/community/langchain_community/llms/vertexai.py#L203

In [4]:
from langchain.llms import VertexAI

llm = VertexAI(
    model_name=model_name,
    max_output_tokens=max_output_tokens,
    temperature=temperature,
    top_p=top_p,
    top_k=top_k,
    verbose=verbose,
)


## Initialize Embeddings Function with Vertex AI

There are other options for creating embeddings. I was interested in sticking with Google products here.

In [5]:
from langchain.embeddings import VertexAIEmbeddings

# https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.vertexai.VertexAIEmbeddings.html
embeddings = VertexAIEmbeddings(
    requests_per_minute=embeddings_requests_per_minute,
    num_instances_per_batch=embeddings_num_instances_per_batch,
    model_name = "textembedding-gecko@latest"
)

## Load Sites into LLM

This can include YouTube videos and Google Maps, not just regular HTML/SPA sites.

Reference: https://python.langchain.com/docs/integrations/document_loaders/url

In [7]:
from langchain.document_loaders import SeleniumURLLoader

loader = SeleniumURLLoader(
  urls=urls,
  browser='firefox',
  headless=True,
)

# https://api.python.langchain.com/en/latest/document_loaders/langchain.document_loaders.url_selenium.SeleniumURLLoader.html#langchain.document_loaders.url_selenium.SeleniumURLLoader.load_and_split
documents = loader.load_and_split()

print(f"Loaded {len(documents)} documents")


Loaded 3 documents


In [8]:
print(documents[0])
print(documents[1])

page_content="Home\n\nOur mission\n\nHow it works\n\nTestimonials\n\nPlans\n\nFAQ\n\nSupport\n\nOrgs\n\nLogin\n\nBusiness\n\nMeasure your d*mn internet\n\nFind the best internet available\n\nFind the best internet available\n\nthen measure it.\n\nFind it\n\nMeasure it\n\nOur Mission\n\nCrowdsourcing internet information and taking the finger pointing out of problems.\n\nWe're here to provide insight into the internet at large. We gather data about providers, their plans, and their performance from your perspective -- and give it to you, for free.\n\nCheck our internet providers page to make sure you're getting a good deal in your area. Then use our app to measure it 24/7 and make sure you're getting what you pay for. Catch any disconnections and slow speeds and export the data as proof to help track down where issues may be coming from.\n\nNow that most of our lives revolve around being online, we want to make sure you have the best possible connection you can get.\n\nBrad Miller\n\nCo

## Create Embeddings Database

This is written with persistence and will not re-create the database if it already exists.

In [9]:
from langchain.vectorstores import Chroma

if load_documents:
  # https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.chroma.Chroma.html#langchain.vectorstores.chroma.Chroma.from_documents
  db = Chroma.from_documents(
    documents,
    embeddings,
    collection_name=collection_name,
    persist_directory=chroma_db_url_source_dir,
  )
else:
  db = Chroma(
    persist_directory=chroma_db_url_source_dir,
    embedding_function=embeddings,
    collection_name=collection_name,
  )


## Persist the Embeddings Database

In [10]:
# I think this would be safe to run in all circumstances but
# it feels weird to try writing if there are no changes anyway
if load_documents:
  db.persist()

## Initialize Retrieval QA Chain

In [11]:
from langchain.chains import RetrievalQA

retrieval_qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(
      search_type=db_search_type,
      search_kwargs=db_search_kwargs,
    ),
)

## Configure Retrieval QA Chain Tool

In [12]:
from langchain.agents import initialize_agent, Tool, AgentExecutor

tools = [
  Tool(
    name=tool_name,
    func=retrieval_qa.run,
    description=tool_description,
  ),
]


In [13]:
## Initialize ReAct Agent


In [14]:
from langchain.agents import AgentType

# initialize ReAct agent
react = initialize_agent(
  tools,
  llm,
  agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
  verbose=True,
  # https://python.langchain.com/docs/modules/agents/how_to/max_time_limit
  max_execution_time=60,
  # By default, the early stopping uses the force method which
  # just returns that constant string. Alternatively, you could
  # specify the generate method which then does one FINAL pass
  # through the LLM to generate an output.
  early_stopping_method="generate",
)

agent_executor = AgentExecutor.from_agent_and_tools(
  agent=react.agent,
  tools=tools,
  verbose=True,
)


## Ask something about the URL content

In [19]:
question = "How much does imUp.io cost?"
agent_executor.run(question)

question = "What features are included in imUp.io's lite plan?"
agent_executor.run(question)




[1m> Entering new AgentExecutor chain...[0m


Number of requested results 5 is greater than number of elements in index 3, updating n_results = 3


[32;1m[1;3m imUp.io offers different plans, so I need to check the pricing section of their marketing document.
Action: imUp.io marketing document reader
Action Input: pricing[0m
Observation: [36;1m[1;3m The pricing for imUp's services varies depending on the plan you choose. They offer a free plan with basic features, as well as paid plans with additional features and benefits. The paid plans range from $10 per user per year for the Lite plan, to $30 per user per year for the Integrity plan.[0m
Thought:[32;1m[1;3m I now know the pricing for imUp.io's services.
Final Answer: imUp.io offers a free plan with basic features, as well as paid plans with additional features and benefits. The paid plans range from $10 per user per year for the Lite plan, to $30 per user per year for the Integrity plan.[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m


Number of requested results 5 is greater than number of elements in index 3, updating n_results = 3


[32;1m[1;3m The lite plan should have some basic features.
Action: imUp.io marketing document reader
Action Input: Plans[0m
Observation: [36;1m[1;3m imUp offers three pricing plans: Lite, Integrity, and Reporting. The Lite plan is $1 per user per month and includes downtime monitoring, speed testing, remote speed testing, downtime notifications, slow speed notifications, and data export. The Integrity plan is $3 per user per month and includes all of the features of the Lite plan, plus 1 month of metrics storage and weekday email support. The Reporting plan is $30 per user per year and includes all of the features of the Integrity plan, plus 2 days of metrics storage and weekday email support.[0m
Thought:[32;1m[1;3m The Lite plan includes downtime monitoring, speed testing, remote speed testing, downtime notifications, slow speed notifications, and data export.
Final Answer: Downtime monitoring, speed testing, remote speed testing, downtime notifications, slow speed notificatio

'Downtime monitoring, speed testing, remote speed testing, downtime notifications, slow speed notifications, and data export.'