# ReAct Wikipedia Source with LangChain

This notebook shows how to hook up Wikipedia knowledge to an LLM using LangChain.

In [1]:
import os

# ****************** [START] Google Cloud project settings ****************** #
project =  os.getenv('GCP_PROJECT')
location = os.environ.get('GCP_REGION', 'us-central1')
# ******************* [END] Google Cloud project settings ******************* #


# *********************** [START] Embeddings config ************************* #
# set rate limiting options for Vertex AI embeddings
embeddings_requests_per_minute = 100
embeddings_num_instances_per_batch = 5
# *********************** [END] Embeddings config *************************** #


# *********************** [START] LLM parameter config ********************** #
# Vertex AI model to use for the LLM
model_name='text-bison@002'

# determines the maximum amount of text output from one prompt.
# a token is approximately four characters.
max_output_tokens = 1024

# temperature controls the degree of randomness in token selection.
# lower temperatures are good for prompts that expect a true or
# correct response, while higher temperatures can lead to more
# diverse or unexpected results. With a temperature of 0 the highest
# probability token is always selected. for most use cases, try
# starting with a temperature of 0.2.
temperature = 0.2

# top-p changes how the model selects tokens for output. Tokens are
# selected from most probable to least until the sum of their
# probabilities equals the top-p value. For example, if tokens A, B, and C
# have a probability of .3, .2, and .1 and the top-p value is .5, then the
# model will select either A or B as the next token (using temperature).
# the default top-p value is .8.
top_p = 0.8

# top-k changes how the model selects tokens for output.
# a top-k of 1 means the selected token is the most probable among
# all tokens in the model’s vocabulary (also called greedy decoding),
# while a top-k of 3 means that the next token is selected from among
# the 3 most probable tokens (using temperature).
top_k = 40

# how verbose the llm and langchain agent is when thinking
# through a prompt. you're going to want this set to True
# for development so you can debug its thought process
verbose = True
# *********************** [END] LLM parameter config ************************ #


# ********************** [START] Configuration Checks *********************** #
if not project:
    raise Exception('GCP_PROJECT environment variable not set')
# *********************** [END] Configuration Checks ************************ #

## Import and Initialize Vertex AI Client

This will complain about not having cuda drivers and the GPU not being used. You can safely ignore that. If you want to use the GPU, that's possible in Linux with Docker, but you'll need to set up a non-containerized development environment to use GPUs with MacOS.

In [2]:
from google.cloud import aiplatform
import vertexai

vertexai.init(project=project, location=location)

print(f"Vertex AI SDK version: {aiplatform.__version__}")


2023-12-13 06:32:08.376992: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-13 06:32:08.378811: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-13 06:32:08.400249: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-13 06:32:08.400287: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-13 06:32:08.400307: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to regi

Vertex AI SDK version: 1.36.0


## Import LangChain

This doesn't actually initialize anything, it just lets us print the version.

In [3]:
import langchain

print(f"LangChain version: {langchain.__version__}")


LangChain version: 0.0.330


## Configure LLM with Vertex AI

In [4]:
from langchain.llms import VertexAI

llm = VertexAI(
    model_name=model_name,
    max_output_tokens=max_output_tokens,
    temperature=temperature,
    top_p=top_p,
    top_k=top_k,
    verbose=verbose,
)

## Initialize Embeddings Function with Vertex AI

There are other options for creating embeddings. I was interested in sticking with Google products here.

In [5]:
from langchain.embeddings import VertexAIEmbeddings

# https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.vertexai.VertexAIEmbeddings.html
embeddings = VertexAIEmbeddings(
    requests_per_minute=embeddings_requests_per_minute,
    num_instances_per_batch=embeddings_num_instances_per_batch,
    model_name = "textembedding-gecko@latest"
)

## Load Wikipedia Docstore Explorer and Tool

In [6]:
from langchain.docstore.wikipedia import Wikipedia
from langchain.agents import initialize_agent, Tool, AgentExecutor
from langchain.agents.react.base import DocstoreExplorer


# load the wikipedia docstore with ReAct docstore explorer.
# this provides two functions as the interface, ".search", and ".lookup"
# - https://python.langchain.com/docs/integrations/document_loaders/wikipedia
# - https://python.langchain.com/docs/modules/agents/agent_types/react_docstore
#
docstore=DocstoreExplorer(Wikipedia())

wikipedia_search_tool = Tool(
    name="Search",
    func=docstore.search,
    description="Search for a term in Wikipedia. Always use this before a lookup.",
)

wikipedia_lookup_tool = Tool(
    name="Lookup",
    func=docstore.lookup,
    description="Lookup a term in Wikipedia.",
)


## Build tool chain

I don't think the order matters here since it's reasoning about which tools to call on based on the prompt.

In [7]:
tools = [
  wikipedia_search_tool,
  wikipedia_lookup_tool,
]


## initialize agent

In [8]:
from langchain.agents import AgentType

# initialize ReAct agent
react = initialize_agent(
  tools,
  llm,
  agent=AgentType.REACT_DOCSTORE,
  verbose=verbose,
  # https://python.langchain.com/docs/modules/agents/how_to/max_time_limit
  max_execution_time=60,
  # By default, the early stopping uses the force method which
  # just returns that constant string. Alternatively, you could
  # specify the generate method which then does one FINAL pass
  # through the LLM to generate an output.
  early_stopping_method="generate",
)

agent_executor = AgentExecutor.from_agent_and_tools(
  agent=react.agent,
  tools=tools,
  verbose=verbose,
)


## Ask the LLM some questions it wouldn't know about from the stock model

In [9]:
question = "What is Google's Gemini product?"
agent_executor.run(question)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Thought: I need to search Google Gemini product and find what it is.
Action: Search[Google Gemini product][0m
Observation: [36;1m[1;3mGemini is a family of multimodal large language models developed by Google DeepMind, serving as the successor to LaMDA and PaLM 2. Comprising Gemini Ultra, Gemini Pro, and Gemini Nano, it was announced on December 6, 2023, positioned as a contender to OpenAI's GPT-4.[0m
Thought:[32;1m[1;3m Google's Gemini product is a family of multimodal large language models.
Action: Finish[a family of multimodal large language models][0m

[1m> Finished chain.[0m


'a family of multimodal large language models'