# ReAct Zero-Shot from Slack Chat History with LangChain

This example reads in a Slack history export `.zip` file, creates embeddings, and creates a question-answering agent that can search the history for context.

## ChromaDB Persistence

Each notebook that uses ChromaDB follows the same pattern for persistence.

If the directory already exists that ChromaDB would be writing it's data to, it will load the existing database. If the directory does not exist, it will create a new database.

If you change parameters that affect the embeddings generation (like swapping in a new `.zip` file), you'll need to delete the database directory to force a new database to be created.

This can be done by running the following from the root of the repository. If the ChromaDB directory is `data/chromadb/slack_export`, you'd run the following to delete it:

```sh
rm -rf data/chromadb/slack_export
```

or if you run into permissions issues:

```sh
sudo rm -rf data/chromadb/slack_export
```

## Setup

First, run a Slack export from Slack's web UI to get a `.zip` of your workspaces. Move that `.zip` file to the `data/slack` directory.

Then configure the `slack_file_path` and `slack_workspace_url` variables to match your Slack export.

In [1]:
import os

# ****************** [START] Google Cloud project settings ****************** #
project =  os.getenv('GCP_PROJECT')
location = os.environ.get('GCP_REGION', 'us-central1')
# ******************* [END] Google Cloud project settings ******************* #


# *********************** [START] Embeddings config ************************* #
# set rate limiting options for Vertex AI embeddings
embeddings_requests_per_minute = 100
embeddings_num_instances_per_batch = 5
# *********************** [END] Embeddings config *************************** #


# ********************** [START] data directory config ********************** #
from helpers.files import get_data_dir
data_dir = get_data_dir()

chroma_db_dir = f'{data_dir}/chromadb'
chroma_db_slack_source_dir = f'{chroma_db_dir}/slack_export'

slack_dir = os.path.join(data_dir, 'slack')
# *********************** [END] data directory config *********************** #


# ********************** [START] LLM data config **************************** #
from helpers.files import file_exists

collection_name = 'slack-source'
load_documents = True
if file_exists(chroma_db_slack_source_dir):
    load_documents = False

chunk_size = 500
chunk_overlap = 200
# *********************** [END] LLM data config ***************************** #


# *********************** [START] RAG parameter config ********************** #
# experiment with:
# - mmr
# - similarity
db_search_type = "similarity"
db_search_kwargs = {"k": 20}
# *********************** [END] RAG parameter config ************************ #


# *********************** [START] Slack tool config ************************* #
slack_file_path = os.path.join(slack_dir, 'eng_slack_channels_on_call.zip')
slack_workspace_url = 'https://example-company.slack.com'
# *********************** [END] Slack tool config *************************** #


# *********************** [START] LLM parameter config ********************** #
# Vertex AI model to use for the LLM
model_name='text-bison@002'

# maximum number of model responses generated per prompt
candidate_count = 1

# determines the maximum amount of text output from one prompt.
# a token is approximately four characters.
max_output_tokens = 2048

# temperature controls the degree of randomness in token selection.
# lower temperatures are good for prompts that expect a true or
# correct response, while higher temperatures can lead to more
# diverse or unexpected results. With a temperature of 0 the highest
# probability token is always selected. for most use cases, try
# starting with a temperature of 0.2.
temperature = 0.2

# top-p changes how the model selects tokens for output. Tokens are
# selected from most probable to least until the sum of their
# probabilities equals the top-p value. For example, if tokens A, B, and C
# have a probability of .3, .2, and .1 and the top-p value is .5, then the
# model will select either A or B as the next token (using temperature).
# the default top-p value is .8.
top_p = 0.8

# top-k changes how the model selects tokens for output.
# a top-k of 1 means the selected token is the most probable among
# all tokens in the model’s vocabulary (also called greedy decoding),
# while a top-k of 3 means that the next token is selected from among
# the 3 most probable tokens (using temperature).
top_k = 40

# how verbose the llm and langchain agent is when thinking
# through a prompt. you're going to want this set to True
# for development so you can debug its thought process
verbose = True
# *********************** [END] LLM parameter config ************************ #


# ********************** [START] Configuration Checks *********************** #
if not project:
    raise Exception('GCP_PROJECT environment variable not set')
# *********************** [END] Configuration Checks ************************ #

## Import and Initialize Vertex AI Client

This will complain about not having cuda drivers and the GPU not being used. You can safely ignore that. If you want to use the GPU, that's possible in Linux with Docker, but you'll need to set up a non-containerized development environment to use GPUs with MacOS.

In [2]:
from google.cloud import aiplatform
import vertexai

vertexai.init(project=project, location=location)

print(f"Vertex AI SDK version: {aiplatform.__version__}")


2023-12-17 01:36:06.203543: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-17 01:36:06.205188: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-17 01:36:06.223318: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-17 01:36:06.223352: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-17 01:36:06.223367: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to regi

Vertex AI SDK version: 1.38.1


## Import LangChain

This doesn't actually initialize anything, it just lets us print the version.

In [3]:
import langchain

print(f"LangChain version: {langchain.__version__}")


LangChain version: 0.0.350


## Configure LLM with Vertex AI

In [4]:
from langchain.llms import VertexAI

llm = VertexAI(
    model_name=model_name,
    max_output_tokens=max_output_tokens,
    temperature=temperature,
    top_p=top_p,
    top_k=top_k,
    verbose=verbose,
)


## Initialize Embeddings Function with Vertex AI

There are other options for creating embeddings. I was interested in sticking with Google products here.

In [5]:
from langchain.embeddings import VertexAIEmbeddings

# https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.vertexai.VertexAIEmbeddings.html
embeddings = VertexAIEmbeddings(
    requests_per_minute=embeddings_requests_per_minute,
    num_instances_per_batch=embeddings_num_instances_per_batch,
    model_name = "textembedding-gecko@latest"
)

## Load Slack History into Local Vector Store (ChromaDB)

In [6]:
from langchain.document_loaders import SlackDirectoryLoader

if load_documents:
  loader = SlackDirectoryLoader(slack_file_path, slack_workspace_url)

## Load and Chunk the Slack history into smaller pieces

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def split_docs(documents, chunk_size=1000, chunk_overlap=100):
  text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
  docs = text_splitter.split_documents(documents)
  return docs

if load_documents:
  documents = loader.load()

  transformed_docs = split_docs(
    documents,
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap,
  )

  print(f'Document count: {len(documents)}')
  print(f'Transformed document count: {len(transformed_docs)}')

## Create Embeddings Database

This is written with persistence and will not re-create the database if it already exists.

In [8]:
from langchain.vectorstores import Chroma

if load_documents:
  # https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.chroma.Chroma.html#langchain.vectorstores.chroma.Chroma.from_documents
  db = Chroma.from_documents(
    transformed_docs,
    embeddings,
    collection_name=collection_name,
    persist_directory=chroma_db_slack_source_dir,
  )
else:
  db = Chroma(
    persist_directory=chroma_db_slack_source_dir,
    embedding_function=embeddings,
    collection_name=collection_name,
  )


## Persist the Embeddings Database

In [9]:
# I think this would be safe to run in all circumstances but
# it feels weird to try writing if there are no changes anyway
if load_documents:
  db.persist()


## ask the database some things directly

I removed the results from this cell because I don't want to expose real Slack history.

- https://python.langchain.com/docs/modules/data_connection/vectorstores/

In [25]:
def print_db_docs(search_type, docs):
    """
    Output looks like this:
    ---
    Matching documents (similarity): 4
    page_content="slack message content here" metadata={'channel': 'channel-name', 'source': 'https://example-company.slack.com/archives//p1701...', 'timestamp': '1701310050.508689', 'user': 'USER_ID'}
    ...
    """
    print('---')
    print(f"Matching documents ({search_type}): {len(docs)}")

    # print out the first 5 results
    for doc in docs[:5]:
        print(doc)

# this will print out the raw slack message content and metadata about its source
query = "What should I do if an anomaly monitor resolves?"
docs = db.similarity_search(query)
print_db_docs("similarity", docs)

docs = db.max_marginal_relevance_search(query)
print_db_docs("max marginal relevance", docs)


## Create Retrievers

One will be used to ask directly, and one will be used with a LangChain ReAct agent.

The one we ask directly will be able to support returning source documents.

In [11]:
from langchain.chains import RetrievalQA

def create_retrieval_qa_chain(return_source_documents=True):
    retrieval_qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=db.as_retriever(
            search_type=db_search_type,
            search_kwargs=db_search_kwargs,
        ),
        # not supported in zero-shot ReAct, but can be enabled if you want
        # to query directly the retrieval qa chain directly
        #
        return_source_documents=return_source_documents,
    )

    return retrieval_qa

retrieval_qa = create_retrieval_qa_chain()
retrieval_qa_react = create_retrieval_qa_chain(return_source_documents=False)

## ask the retrieval qa chain some questions

In [12]:
def print_retrieval_qa_results(result):
    print('---')
    print(f"Query: {result['query']}")
    print(f"Result: {result['result']}")


query = "What should I do if an anomaly monitor resolves?"
result = retrieval_qa({'query': query})
print_retrieval_qa_results(result)


---
Query: What should I do if an anomaly monitor resolves?
Result:  If an anomaly monitor resolves, you should first check the details of the alert in Datadog to see what triggered it and whether or not it's truly resolved or just reached a new normal where it no longer considered it an anomaly. You can also try to tune the monitor to alert when you want it, and see if you can stretch it out so you're not alerted on routine maintenance at the same time. If you're still getting false alarms, you can modify the notification settings to try to stop them.


## Configure Retrieval Augmented Generation Prompt Template

In [13]:

# from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

def create_prompt(question):
    prompt_rag_statement = f"""
    Refer to TeamSnap Slack Message History tool for the following question regarding TeamSnap services.

Question:
{question}

Response:
"""

    return prompt_rag_statement

## Build tool chain

This will provide knowledge to the ReAct agent from the Slack history embeddings.

In [14]:
from langchain.agents import initialize_agent, Tool, AgentExecutor

tools = [
  Tool(
    name="Read TeamSnap Slack Message History",
    func=retrieval_qa_react.run,
    description="Useful for looking up context related to TeamSnap and TeamSnap systems.",
  ),
]


## Initialize Agent

In [15]:
from langchain.agents import AgentType

# initialize ReAct agent
react = initialize_agent(
  tools,
  llm,
  agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
  verbose=True,
  # https://python.langchain.com/docs/modules/agents/how_to/max_time_limit
  max_execution_time=60,
  # By default, the early stopping uses the force method which
  # just returns that constant string. Alternatively, you could
  # specify the generate method which then does one FINAL pass
  # through the LLM to generate an output.
  early_stopping_method="generate",
)

agent_executor = AgentExecutor.from_agent_and_tools(
  agent=react.agent,
  tools=tools,
  verbose=True,
)


## Ask something that requires context from Slack history

In [23]:
query = "What should I do if an anomaly monitor resolves?"

question = create_prompt(query)
agent_executor.run(question)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should check TeamSnap Slack Message History to see if there are any relevant messages about what to do if an anomaly monitor resolves.
Action: Read TeamSnap Slack Message History
Action Input: anomaly monitor resolves[0m
Observation: [36;1m[1;3m If you go to datadog and click on "Monitors" or "manage monitors" in the drop down, it'll take you to a big list of all our monitors.
if you copy and paste from the pagerduty alert so you're sure you're looking at the right one (there's tons of rabbit monitors)
when you get to the specific monitor's page, you can see the details of what triggered it and whether or not it's truly resolved or just reached a new normal where it no longer considered it an anomaly[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Go to Datadog, click on "Monitors" or "manage monitors" in the drop down, and find the specific monitor that triggered the alert. Check the details of wha

'Go to Datadog, click on "Monitors" or "manage monitors" in the drop down, and find the specific monitor that triggered the alert. Check the details of what triggered it and whether or not it\'s truly resolved or just reached a new normal where it no longer considered it an anomaly.'