# Build a Knowledge Based System with Vertex AI Vector Search, LangChain and Gemini

## Objectives

1. Generate embeddings for a dataset
2. Add the embeddings to Cloud Storage
3. Create an index in Vertex AI Vector Search
4. Leverage similarity metrics to evaluate and retrieve the most relevant knowledge base results
5. Utilize LangChain to query Vertex AI Vector Search and provide context to prompts submitted to Gemini

In [1]:
%%capture --no-stderr
!pip3 install -q --upgrade pip
!pip3 install -q google-cloud-aiplatform
!pip3 install -q langchain
!pip3 install -q langchain-community
!pip3 install -q lxml
!pip3 install -q requests
!pip3 install -q beautifulsoup4
!pip3 install -q unstructured
!pip3 install -q langchain-google-genai
!pip3 install -q google-generativeai
!pip3 install -q tqdm

In [2]:
# restart the kernel
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)


{'status': 'ok', 'restart': True}

# Initial Setup

In [1]:
from IPython.display import display
from IPython.display import Markdown
import textwrap

def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))


In [2]:
# source API key from GCP project and configure genai client
import os
import pathlib
import textwrap
import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown

key_name = !gcloud services api-keys list --filter="gemini-api-key" --format="value(name)"
key_name = key_name[0]

api_key = !gcloud services api-keys get-key-string $key_name --location="us-central1" --format="value(keyString)"
api_key = api_key[0]

os.environ["GOOGLE_API_KEY"] = api_key

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])


In [3]:
# Define project information
import sys
import subprocess

PROJECT_ID = subprocess.check_output(["gcloud", "config", "get-value", "project"], text=True).strip()
REGION = "us-central1"  # @param {type:"string"}

print(f"Your project ID is: {PROJECT_ID}")

Your project ID is: qwiklabs-gcp-02-2cb5b7451de0


In [4]:
# Set environment vars
BUCKET = f"gs://{PROJECT_ID}/embeddings"
DIMENSIONS=768
DISPLAY_NAME='vertex_docs_qa'
ENDPOINT=f"{REGION}-aiplatform.googleapis.com"
TEXT_GENERATION_MODEL='gemini-pro'
SITEMAP='https://docs.anthropic.com/sitemap.xml'


In [5]:
import os
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

# Create Documents from Vertex AI Cloud Documentation Site

## Load and parse sitemap.xml

In [6]:
# Parse the xml of sitemap and get URLs of doc site
import requests
from bs4 import BeautifulSoup

def parse_sitemap(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "xml")
    urls = [element.text for element in soup.find_all("loc")]
    return urls

sites = parse_sitemap(SITEMAP)

In [7]:
# Use this to filter out docs that don't have a corresponding reference page
sites_filtered = [url for url in sites if '/en/docs' in url]

In [8]:
len(sites_filtered)

37

## Load documentation pages using the LangChain UnstructuredURLLoader

In [9]:
# This step will take a few minutes to complete
# you will see download messages below the cell after execution
from langchain.document_loaders import UnstructuredURLLoader
loader = UnstructuredURLLoader(urls=sites_filtered)
documents = loader.load();

In [10]:
to_markdown(documents[1].page_content + "\n\nSource: " + documents[1].metadata["source"])

> Anthropic home page
> 
> Use cases
> 
> Content moderation
> 
> WelcomeUser GuidesAPI ReferencePrompt LibraryRelease NotesDeveloper Newsletter
> 
> Developer Console
> 
> Developer Discord
> 
> Support
> 
> Get started
> 
> Overview
> 
> Initial setup
> 
> Intro to Claude
> 
> Learn about Claude
> 
> Use cases
> 
> Overview
> 
> Ticket routing
> 
> Customer support agent
> 
> Content moderation
> 
> Legal summarization
> 
> Models
> 
> Security and compliance
> 
> Build with Claude
> 
> Define success criteria
> 
> Develop test cases
> 
> Prompt engineering
> 
> Text generation
> 
> Embeddings
> 
> Google Sheets add-on
> 
> Vision
> 
> Tool use (function calling)
> 
> Prompt Caching (beta)
> 
> Message Batches (beta)
> 
> Test and evaluate
> 
> Strengthen guardrails
> 
> Using the Evaluation Tool
> 
> Resources
> 
> Glossary
> 
> Model Deprecations
> 
> System status
> 
> Claude 3 model card
> 
> Anthropic Cookbook
> 
> Anthropic Courses
> 
> Legal center
> 
> Anthropic Privacy Policy
> 
> Use cases
> 
> Content moderation
> 
> Content moderation is a critical aspect of maintaining a safe, respectful, and productive environment in digital applications. In this guide, we’ll discuss how Claude can be used to moderate content within your digital application.
> 
> Visit our content moderation cookbook to see an example content moderation implementation using Claude.
> 
> This guide is focused on moderating user-generated content within your application. If you’re looking for guidance on moderating interactions with Claude, please refer to our guardrails guide.
> 
> Before building with Claude
> 
> Decide whether to use Claude for content moderation
> 
> Here are some key indicators that you should use an LLM like Claude instead of a traditional ML or rules-based approach for content moderation:
> 
> Traditional ML methods require significant engineering resources, ML expertise, and infrastructure costs. Human moderation systems incur even higher costs. With Claude, you can have a sophisticated moderation system up and running in a fraction of the time for a fraction of the price.
> 
> Traditional ML approaches, such as bag-of-words models or simple pattern matching, often struggle to understand the tone, intent, and context of the content. While human moderation systems excel at understanding semantic meaning, they require time for content to be reviewed. Claude bridges the gap by combining semantic understanding with the ability to deliver moderation decisions quickly.
> 
> By leveraging its advanced reasoning capabilities, Claude can interpret and apply complex moderation guidelines uniformly. This consistency helps ensure fair treatment of all content, reducing the risk of inconsistent or biased moderation decisions that can undermine user trust.
> 
> Once a traditional ML approach has been established, changing it is a laborious and data-intensive undertaking. On the other hand, as your product or customer needs evolve, Claude can easily adapt to changes or additions to moderation policies without extensive relabeling of training data.
> 
> If you wish to provide users or regulators with clear explanations behind moderation decisions, Claude can generate detailed and coherent justifications. This transparency is important for building trust and ensuring accountability in content moderation practices.
> 
> Traditional ML approaches typically require separate models or extensive translation processes for each supported language. Human moderation requires hiring a workforce fluent in each supported language. Claude’s multilingual capabilities allow it to classify tickets in various languages without the need for separate models or extensive translation processes, streamlining moderation for global customer bases.
> 
> Claude’s multimodal capabilities allow it to analyze and interpret content across both text and images. This makes it a versatile tool for comprehensive content moderation in environments where different media types need to be evaluated together.
> 
> Anthropic has trained all Claude models to be honest, helpful and harmless. This may result in Claude moderating content deemed particularly dangerous (in line with our Acceptable Use Policy), regardless of the prompt used. For example, an adult website that wants to allow users to post explicit sexual content may find that Claude still flags explicit content as requiring moderation, even if they specify in their prompt not to moderate explicit sexual content. We recommend reviewing our AUP in advance of building a moderation solution.
> 
> Generate examples of content to moderate
> 
> Before developing a content moderation solution, first create examples of content that should be flagged and content that should not be flagged. Ensure that you include edge cases and challenging scenarios that may be difficult for a content moderation system to handle effectively. Afterwards, review your examples to create a well-defined list of moderation categories. For instance, the examples generated by a social media platform might include the following:
> 
> allowed_user_comments = [
>     'This movie was great, I really enjoyed it. The main actor really killed it!',
>     'I hate Mondays.',
>     'It is a great time to invest in gold!'
> ]
> 
> disallowed_user_comments = [
>     'Delete this post now or you better hide. I am coming after you and your family.',
>     'Stay away from the 5G cellphones!! They are using 5G to control you.',
>     'Congratulations! You have won a $1,000 gift card. Click here to claim your prize!'
> ]
> 
> # Sample user comments to test the content moderation
> user_comments = allowed_user_comments + disallowed_user_comments
> 
> # List of categories considered unsafe for content moderation
> unsafe_categories = [
>     'Child Exploitation',
>     'Conspiracy Theories',
>     'Hate',
>     'Indiscriminate Weapons', 
>     'Intellectual Property',
>     'Non-Violent Crimes', 
>     'Privacy',
>     'Self-Harm',
>     'Sex Crimes',
>     'Sexual Content',
>     'Specialized Advice',
>     'Violent Crimes'
> ]
> 
> Effectively moderating these examples requires a nuanced understanding of language. In the comment, This movie was great, I really enjoyed it. The main actor really killed it!, the content moderation system needs to recognize that “killed it” is a metaphor, not an indication of actual violence. Conversely, despite the lack of explicit mentions of violence, the comment Delete this post now or you better hide. I am coming after you and your family. should be flagged by the content moderation system.
> 
> The unsafe_categories list can be customized to fit your specific needs. For example, if you wish to prevent minors from creating content on your website, you could append “Underage Posting” to the list.
> 
> How to moderate content using Claude
> 
> Select the right Claude model
> 
> When selecting a model, it’s important to consider the size of your data. If costs are a concern, a smaller model like Claude 3 Haiku is an excellent choice due to its cost-effectiveness. Below is an estimate of the cost to moderate text for a social media platform that receives one billion posts per month:
> 
> Content size
> 
> Posts per month: 1bn
> 
> Characters per post: 100
> 
> Total characters: 100bn
> 
> Estimated tokens
> 
> Input tokens: 28.6bn (assuming 1 token per 3.5 characters)
> 
> Percentage of messages flagged: 3%
> 
> Output tokens per flagged message: 50
> 
> Total output tokens: 1.5bn
> 
> Claude 3 Haiku estimated cost
> 
> Input token cost: 2,860 MTok * $0.25/MTok = $715
> 
> Output token cost: 1,500 MTok * $1.25/MTok = $1,875
> 
> Monthly cost: $715 + $1,875 = $2,590
> 
> Claude 3.5 Sonnet estimated cost
> 
> Input token cost: 2,860 MTok * $3.00/MTok = $8,580
> 
> Output token cost: 1,500 MTok * $15.00/MTok = $22,500
> 
> Monthly cost: $8,580 + $22,500 = $31,080
> 
> Actual costs may differ from these estimates. These estimates are based on the prompt highlighted in the section on batch processing. Output tokens can be reduced even further by removing the explanation field from the response.
> 
> Build a strong prompt
> 
> In order to use Claude for content moderation, Claude must understand the moderation requirements of your application. Let’s start by writing a prompt that allows you to define your moderation needs:
> 
> import anthropic
> import json
> 
> # Initialize the Anthropic client
> client = anthropic.Anthropic()
> 
> def moderate_message(message, unsafe_categories):
>     # Convert the list of unsafe categories into a string, with each category on a new line
>     unsafe_category_str = '\n'.join(unsafe_categories)
>     
>     # Construct the prompt for Claude, including the message and unsafe categories
>     assessment_prompt = f"""
>     Determine whether the following message warrants moderation, 
>     based on the unsafe categories outlined below.
>     
>     Message:
>     <message>{message}</message>
> 
>     Unsafe Categories:
>     <categories>
>     {unsafe_category_str}
>     </categories>
> 
>     Respond with ONLY a JSON object, using the format below:
>     {{
>     "violation": <Boolean field denoting whether the message should be moderated>,
>     "categories": [Comma-separated list of violated categories],
>     "explanation": [Optional. Only include if there is a violation.]
>     }}"""
> 
>     # Send the request to Claude for content moderation
>     response = client.messages.create(
>         model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
>         max_tokens=200,
>         temperature=0,   # Use 0 temperature for increased consistency
>         messages=[
>             {"role": "user", "content": assessment_prompt}
>         ]
>     )
>     
>     # Parse the JSON response from Claude
>     assessment = json.loads(response.content[0].text)
>     
>     # Extract the violation status from the assessment
>     contains_violation = assessment['violation']
>     
>     # If there's a violation, get the categories and explanation; otherwise, use empty defaults
>     violated_categories = assessment.get('categories', []) if contains_violation else []
>     explanation = assessment.get('explanation') if contains_violation else None
>     
>     return contains_violation, violated_categories, explanation
> 
> # Process each comment and print the results
> for comment in user_comments:
>     print(f"\nComment: {comment}")
>     violation, violated_categories, explanation = moderate_message(comment, unsafe_categories)
>     
>     if violation:
>         print(f"Violated Categories: {', '.join(violated_categories)}")
>         print(f"Explanation: {explanation}")
>     else:
>         print("No issues detected.")
> 
> In this example, the moderate_message function contains an assessment prompt that includes the unsafe content categories and the message we wish to evaluate. The prompt asks Claude to assess whether the message should be moderated, based on the unsafe categories we defined.
> 
> The model’s assessment is then parsed to determine if there is a violation. If there is a violation, Claude also returns a list of violated categories, as well as an explanation as to why the message is unsafe.
> 
> Evaluate your prompt
> 
> Content moderation is a classification problem. Thus, you can use the same techniques outlined in our classification cookbook to determine the accuracy of your content moderation system.
> 
> One additional consideration is that instead of treating content moderation as a binary classification problem, you may instead create multiple categories to represent various risk levels. Creating multiple risk levels allows you to adjust the aggressiveness of your moderation. For example, you might want to automatically block user queries that are deemed high risk, while users with many medium risk queries are flagged for human review.
> 
> import anthropic
> import json
> 
> # Initialize the Anthropic client
> client = anthropic.Anthropic()
> 
> def assess_risk_level(message, unsafe_categories):
>     # Convert the list of unsafe categories into a string, with each category on a new line
>     unsafe_category_str = '\n'.join(unsafe_categories)
>     
>     # Construct the prompt for Claude, including the message, unsafe categories, and risk level definitions
>     assessment_prompt = f"""
>     Assess the risk level of the following message warrants moderation, 
>     based on the unsafe categories listed below.
>     
> Message:
> <message>{message}</message>
> 
> Unsafe Categories:
> <categories>
> {unsafe_category_str}
> </categories>
> 
> Assign a risk level based on your confidence that the user's message should be moderated 
> based on the defined unsafe categories:
> 
> 0 - No risk
> 1 - Low risk
> 2 - Medium risk
> 3 - High risk
> 
> Respond with ONLY a JSON object, using the format below:
> {{
>   "risk_level": <Numerical field denoting the risk level>,
>   "categories": [Comma-separated list of violated categories],
>   "explanation": <Optional. Only include if risk level is greater than 0>
> }}"""
> 
>     # Send the request to Claude for risk assessment
>     response = client.messages.create(
>         model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
>         max_tokens=200,
>         temperature=0,   # Use 0 temperature for increased consistency
>         messages=[
>             {"role": "user", "content": assessment_prompt}
>         ]
>     )
>     
>     # Parse the JSON response from Claude
>     assessment = json.loads(response.content[0].text)
>     
>     # Extract the risk level, violated categories, and explanation from the assessment
>     risk_level = assessment["risk_level"]
>     violated_categories = assessment["categories"]
>     explanation = assessment.get("explanation")
>     
>     return risk_level, violated_categories, explanation
> 
> # Process each comment and print the results
> for comment in user_comments:
>     print(f"\nComment: {comment}")
>     risk_level, violated_categories, explanation = assess_risk_level(comment, unsafe_categories)
>     
>     print(f"Risk Level: {risk_level}")
>     if violated_categories:
>         print(f"Violated Categories: {', '.join(violated_categories)}")
>     if explanation:
>         print(f"Explanation: {explanation}")
> 
> This code implements an assess_risk_level function that uses Claude to evaluate the risk level of a message. The function accepts a message and a list of unsafe categories as inputs.
> 
> Within the function, a prompt is generated for Claude, including the message to be assessed, the unsafe categories, and specific instructions for evaluating the risk level. The prompt instructs Claude to respond with a JSON object that includes the risk level, the violated categories, and an optional explanation.
> 
> This approach enables flexible content moderation by assigning risk levels. It can be seamlessly integrated into a larger system to automate content filtering or flag comments for human review based on their assessed risk level. For instance, when executing this code, the comment Delete this post now or you better hide. I am coming after you and your family. is identified as high risk due to its dangerous threat. Conversely, the comment Stay away from the 5G cellphones!! They are using 5G to control you. is categorized as medium risk.
> 
> Deploy your prompt
> 
> Once you are confident in the quality of your solution, it’s time to deploy it to production. Here are some best practices to follow when using content moderation in production:
> 
> Provide clear feedback to users: When user input is blocked or a response is flagged due to content moderation, provide informative and constructive feedback to help users understand why their message was flagged and how they can rephrase it appropriately. In the coding examples above, this is done through the explanation tag in the Claude response.
> 
> Analyze moderated content: Keep track of the types of content being flagged by your moderation system to identify trends and potential areas for improvement.
> 
> Continuously evaluate and improve: Regularly assess the performance of your content moderation system using metrics such as precision and recall tracking. Use this data to iteratively refine your moderation prompts, keywords, and assessment criteria.
> 
> Improve performance
> 
> In complex scenarios, it may be helpful to consider additional strategies to improve performance beyond standard prompt engineering techniques. Here are some advanced strategies:
> 
> Define topics and provide examples
> 
> In addition to listing the unsafe categories in the prompt, further improvements can be made by providing definitions and phrases related to each category.
> 
> import anthropic
> import json
> 
> # Initialize the Anthropic client
> client = anthropic.Anthropic()
> 
> # Dictionary of categories considered unsafe for content moderation, with their definitions
> unsafe_category_definitions = {
>     'Child Exploitation': 'Content that depicts child nudity or that enables, encourages, excuses, or depicts the sexual abuse of children.',
>     'Conspiracy Theories': """Content that promotes or endorses unfounded, false, or misleading theories about events, situations, or individuals, 
>         which can incite harm or undermine public trust in institutions or facts.""",
>     'Hate': """Content that is hateful toward people on the basis of their protected characteristics 
>         (race, color, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease), 
>         or content that perpetuates negative stereotypes.""",
>     'Indiscriminate Weapons': 'Content that enables the creation of weapons of mass destruction.', 
>     'Intellectual Property': 'Content that violates the intellectual property rights of third parties.',
>     'Non-Violent Crimes': 'Content that enables, encourages, or excuses the commission of non-violent crimes.', 
>     'Privacy': 'Content that contains sensitive, personal information about private individuals.',
>     'Self-Harm': 'Content that enables, encourages, or excuses acts of intentional self-harm.',
>     'Sex Crimes': 'Content that enables, encourages, or excuses the commission of sex crimes.',
>     'Sexual Content': 'Content that contains erotic descriptions or explicit depictions of sex acts.',
>     'Specialized Advice': """Content that contains financial, medical, or legal advice. 
>         Financial advice includes guidance on investments, stocks, bonds, or any financial planning.""",
>     'Violent Crimes': 'Content that enables, encourages, or excuses the commission of violent crimes.', 
> }
> 
> 
> def moderate_message_with_definitions(message, unsafe_category_definitions):
>     # Format unsafe categories string, with each category and its definition on a new line
>     unsafe_category_str = '\n'.join(f'{category}: {definition}' 
>         for category, definition in unsafe_category_definitions.items()
>     )
>     
>     # Construct the prompt for Claude, including the message and unsafe categories
>     assessment_prompt = f"""Determine whether the following message warrants moderation, based on the unsafe categories outlined below.
> 
> Message:
> <message>{message}</message>
> 
> Unsafe Categories and Their Definitions:
> <categories>
> {unsafe_category_str}
> </categories>
> 
> It's important that you remember all unsafe categories and their definitions.
> 
> Respond with ONLY a JSON object, using the format below:
> {{
>   "violation": <Boolean field denoting whether the message should be moderated>,
>   "categories": [Comma-separated list of violated categories],
>   "explanation": [Optional. Only include if there is a violation.]
> }}"""
> 
>     # Send the request to Claude for content moderation
>     response = client.messages.create(
>         model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
>         max_tokens=200,
>         temperature=0,   # Use 0 temperature for increased consistency
>         messages=[
>             {"role": "user", "content": assessment_prompt}
>         ]
>     )
>     
>     # Parse the JSON response from Claude
>     assessment = json.loads(response.content[0].text)
>     
>     # Extract the violation status from the assessment
>     contains_violation = assessment['violation']
>     
>     # If there's a violation, get the categories and explanation; otherwise, use empty defaults
>     violated_categories = assessment.get('categories', []) if contains_violation else []
>     explanation = assessment.get('explanation') if contains_violation else None
>     
>     return contains_violation, violated_categories, explanation
> 
> 
> # Process each comment and print the results
> for comment in user_comments:
>     print(f"\nComment: {comment}")
>     violation, violated_categories, explanation = moderate_message_with_definitions(comment, unsafe_category_definitions)
>     
>     if violation:
>         print(f"Violated Categories: {', '.join(violated_categories)}")
>         print(f"Explanation: {explanation}")
>     else:
>         print("No issues detected.")
> 
> The moderate_message_with_definitions function expands upon the earlier moderate_message function by allowing each unsafe category to be paired with a detailed definition. This occurs in the code by replacing the unsafe_categories list from the original function with an unsafe_category_definitions dictionary. This dictionary maps each unsafe category to its corresponding definition. Both the category names and their definitions are included in the prompt.
> 
> Notably, the definition for the Specialized Advice category now specifies the types of financial advice that should be prohibited. As a result, the comment It's a great time to invest in gold!, which previously passed the moderate_message assessment, now triggers a violation.
> 
> Consider batch processing
> 
> To reduce costs in situations where real-time moderation isn’t necessary, consider moderating messages in batches. Include multiple messages within the prompt’s context, and ask Claude to assess which messages should be moderated.
> 
> import anthropic
> import json
> 
> # Initialize the Anthropic client
> client = anthropic.Anthropic()
> 
> def batch_moderate_messages(messages, unsafe_categories):
>     # Convert the list of unsafe categories into a string, with each category on a new line
>     unsafe_category_str = '\n'.join(unsafe_categories)
>     
>     # Format messages string, with each message wrapped in XML-like tags and given an ID
>     messages_str = '\n'.join([f'<message id={idx}>{msg}</message>' for idx, msg in enumerate(messages)])
>     
>     # Construct the prompt for Claude, including the messages and unsafe categories
>     assessment_prompt = f"""Determine the messages to moderate, based on the unsafe categories outlined below.
> 
> Messages:
> <messages>
> {messages_str}
> </messages>
> 
> Unsafe categories and their definitions:
> <categories>
> {unsafe_category_str}
> </categories>
> 
> Respond with ONLY a JSON object, using the format below:
> {{
>   "violations": [
>     {{
>       "id": <message id>,
>       "categories": [list of violated categories],
>       "explanation": <Explanation of why there's a violation>
>     }},
>     ...
>   ]
> }}
> 
> Important Notes:
> - Remember to analyze every message for a violation.
> - Select any number of violations that reasonably apply."""
> 
>     # Send the request to Claude for content moderation
>     response = client.messages.create(
>         model="claude-3-haiku-20240307",  # Using the Haiku model for lower costs
>         max_tokens=2048,  # Increased max token count to handle batches
>         temperature=0,    # Use 0 temperature for increased consistency
>         messages=[
>             {"role": "user", "content": assessment_prompt}
>         ]
>     )
>     
>     # Parse the JSON response from Claude
>     assessment = json.loads(response.content[0].text)
>     return assessment
> 
> 
> # Process the batch of comments and get the response
> response_obj = batch_moderate_messages(user_comments, unsafe_categories)
> 
> # Print the results for each detected violation
> for violation in response_obj['violations']:
>     print(f"""Comment: {user_comments[violation['id']]}
> Violated Categories: {', '.join(violation['categories'])}
> Explanation: {violation['explanation']}
> """)
> 
> In this example, the batch_moderate_messages function handles the moderation of an entire batch of messages with a single Claude API call. Inside the function, a prompt is created that includes the list of messages to evaluate, the defined unsafe content categories, and their descriptions. The prompt directs Claude to return a JSON object listing all messages that contain violations. Each message in the response is identified by its id, which corresponds to the message’s position in the input list. Keep in mind that finding the optimal batch size for your specific needs may require some experimentation. While larger batch sizes can lower costs, they might also lead to a slight decrease in quality. Additionally, you may need to increase the max_tokens parameter in the Claude API call to accommodate longer responses. For details on the maximum number of tokens your chosen model can output, refer to the model comparison page.
> 
> Content moderation cookbook
> 
> View a fully implemented code-based example of how to use Claude for content moderation.
> 
> Guardrails guide
> 
> Explore our guardrails guide for techniques to moderate interactions with Claude.
> 
> Customer support agentLegal summarization
> 
> xlinkedin
> 
> On this page
> 
> Before building with Claude
> 
> Decide whether to use Claude for content moderation
> 
> Generate examples of content to moderate
> 
> How to moderate content using Claude
> 
> Select the right Claude model
> 
> Build a strong prompt
> 
> Evaluate your prompt
> 
> Deploy your prompt
> 
> Improve performance
> 
> Define topics and provide examples
> 
> Consider batch processing
> 
> Source: https://docs.anthropic.com/en/docs/about-claude/use-case-guides/content-moderation

In [11]:
len(documents)

37

## Create Document chunks 

In [12]:
# recursively loop through the text and create document chunks for embedding
import warnings
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    #separator = "\n",
    chunk_size = 2000,
    chunk_overlap  = 100)

document_chunks = text_splitter.split_documents(documents)

print(f"Number documents {len(documents)}")
print(f"Number chunks {len(document_chunks)}")

document_chunks=[f"content: {chunk.page_content}, source: {chunk.metadata['source']}" for chunk in document_chunks]

Number documents 37
Number chunks 238


# Generate embeddings from Document chunks

In [13]:
# make a documents directory
!rm -rf ./documents
!mkdir ./documents

In [14]:
# view the document chunks in a dataframe
import pandas as pd

df = pd.DataFrame(document_chunks, columns =['text'])
df

Unnamed: 0,text
0,content: Anthropic home page\n\nLearn about Cl...
1,content: Model Anthropic API AWS Bedrock GCP V...
2,content: Prompt and output performance\n\nThe ...
3,content: Claude 2.1 Claude 2 Claude Instant 1....
4,content: xlinkedin\n\nOn this page\n\nModel na...
...,...
233,content: Set appropriate output limits: Use th...
234,content: Anthropic home page\n\nStrengthen gua...
235,content: Strategies to reduce prompt leak\n\nS...
236,content: Anthropic home page\n\nGet started\n\...


In [15]:
# Run this cell to generate the embeddings files you will later upload to Cloud Storage
from tqdm import tqdm
import json

index_embeddings = []
model = "models/embedding-001"

for index, doc in tqdm(df.iterrows(), total=len(df), position=0):

    response = genai.embed_content(model=model, content=doc['text'], task_type="retrieval_query")

    doc_id=f"{index}.txt"
    embedding_dict = {
        "id": doc_id,
        "embedding": response["embedding"],
    }
    index_embeddings.append(json.dumps(embedding_dict) + "\n")
    
    with open(f"documents/{doc_id}", "w") as document:
          document.write(doc['text'])
    
with open("embeddings.json", "w") as f:
    f.writelines(index_embeddings)

100%|██████████| 238/238 [00:32<00:00,  7.40it/s]


In [16]:
from google.cloud import storage

source_file = '/home/jupyter/embeddings.json'
destination_blob_name = 'embeddings/embeddings.json' # Adjust if needed

client = storage.Client(project=PROJECT_ID)
bucket = client.bucket(PROJECT_ID)
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file)

In [17]:
# Upload the embedding files to Cloud Storage
# This step will take a few minutes to complete
import subprocess
gsutil_command = f"gsutil -q cp -r './documents' gs://{PROJECT_ID}/documents"

subprocess.run(['gsutil', '-q', 'cp', '-r', './documents', f'gs://{PROJECT_ID}/documents'])

CompletedProcess(args=['gsutil', '-q', 'cp', '-r', './documents', 'gs://qwiklabs-gcp-02-2cb5b7451de0/documents'], returncode=0)

# Create a Vertex AI Vector Store index

In [18]:
# Create the Vertex AI Vector Search index
# This step will take several minutes to complete
# Wait for this cell to complete before proceeding
index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
      display_name="vertex_docs",
      contents_delta_uri=f"gs://{PROJECT_ID}/embeddings",
      dimensions=768,
      approximate_neighbors_count=150,
      distance_measure_type="DOT_PRODUCT_DISTANCE"
)

Creating MatchingEngineIndex
Create MatchingEngineIndex backing LRO: projects/201377065835/locations/us-central1/indexes/4250953845540126720/operations/6531253722171834368
MatchingEngineIndex created. Resource name: projects/201377065835/locations/us-central1/indexes/4250953845540126720
To use this MatchingEngineIndex in another session:
index = aiplatform.MatchingEngineIndex('projects/201377065835/locations/us-central1/indexes/4250953845540126720')


In [19]:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="vertex_docs",
    description="Embeddings for the documentation curated from the sitemap.",
    public_endpoint_enabled=True,
)

Creating MatchingEngineIndexEndpoint
Create MatchingEngineIndexEndpoint backing LRO: projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800/operations/7358227203747741696
MatchingEngineIndexEndpoint created. Resource name: projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800
To use this MatchingEngineIndexEndpoint in another session:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800')


In [20]:
# This step will take up to 20 minutes to complete
# You can view the deployment in the Vertex AI console on the "Vector Search" tab
# Wait for this cell to complete before proceeding
index_endpoint = index_endpoint.deploy_index(
    index=index, deployed_index_id="vertex_index_deployment"
)

Deploying index MatchingEngineIndexEndpoint index_endpoint: projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800
Deploy index MatchingEngineIndexEndpoint index_endpoint backing LRO: projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800/operations/1568286932809547776
MatchingEngineIndexEndpoint index_endpoint Deployed index. Resource name: projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800


In [21]:
INDEX_RESOURCE_NAME=index.resource_name
index = aiplatform.MatchingEngineIndex(index_name=INDEX_RESOURCE_NAME)

deployed_index = index.deployed_indexes
deployed_index

[index_endpoint: "projects/201377065835/locations/us-central1/indexEndpoints/4627074783169740800"
deployed_index_id: "vertex_index_deployment"
]

# Search Vector Store, add result as context to a query (without using a LangChain Chain)

In [22]:
# In the next cells you will query the model directly using the Vertex AI python SDK
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores.matching_engine import MatchingEngine
from langchain.agents import Tool

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

def search_vector_store(question):

    vector_store = MatchingEngine.from_components(
                        index_id=INDEX_RESOURCE_NAME,
                        region=REGION,
                        embedding=embeddings,
                        project_id=PROJECT_ID,
                        endpoint_id=deployed_index[0].index_endpoint,
                        gcs_bucket_name=f"{PROJECT_ID}")
    
    relevant_documentation=vector_store.similarity_search(question, k=8)
    context = "\n".join([doc.page_content for doc in relevant_documentation])[:10000]
    return str(context)

In [23]:
from vertexai.preview.generative_models import GenerativeModel
import warnings

# filter warnings for unused libs
warnings.filterwarnings('ignore')

def ask_question(question):
    context = search_vector_store(question)

    prompt=f"""
        Follow exactly those 3 steps:
        1. Read the context below and aggregrate this data
        Context : {context}
        2. Answer the question using only this context
        3. Show the source for your answers
        User Question: {question}


        If you don't have any context and are unsure of the answer, reply that you don't know about this topic.
        """

    model = GenerativeModel("gemini-pro")
    response = model.generate_content(prompt)

    return to_markdown(f"Question: \n{question} \n\n Response: \n {response.text}")

In [24]:
ask_question("How do I reduce prompt leaks?")

> Question: 
> How do I reduce prompt leaks? 
> 
>  Response: 
>  You can try using system prompts to isolate key information and context from user queries. You can emphasize key instructions in the User turn, then reemphasize those instructions by prefilling the Assistant turn.
> 
> Notice that this system prompt is still predominantly a role prompt, which is the most effective way to use system prompts.

In [25]:
ask_question("What use cases and capabilities does Anthropic support?")

> Question: 
> What use cases and capabilities does Anthropic support? 
> 
>  Response: 
>  ## Anthropic's Use Cases and Capabilities:
> 
> **Use Cases:**
> 
> * **Customer Support Agent:** Claude can handle customer inquiries in real time, 24/7, reducing wait times and managing high volumes of support queries.
> * **Content Moderation:** Claude can identify and remove harmful or inappropriate content from online platforms.
> * **Legal Summarization:** Claude can summarize legal documents and contracts, saving time and effort for lawyers and legal professionals.
> * **Ticket Routing:** Claude can automatically route customer support tickets to the appropriate agent based on the content of the message.
> 
> **Capabilities:**
> 
> * **Text Generation:** Claude can generate human-quality text in response to prompts and questions.
> * **Reasoning and Math:** Claude can solve complex reasoning and math problems.
> * **Vision:** Claude can understand and analyze images.
> * **Tool Use (Function Calling):** Claude can be integrated with other applications and tools to automate tasks.
> * **Embeddings:** Claude can generate numerical representations of text that can be used for tasks like similarity search and classification.
> * **Coding:** Claude can generate and debug code in multiple programming languages.
> * **Large Language Model (LLM):** Claude is a large language model that has been trained on a massive dataset of text and code. This allows Claude to perform a wide range of tasks, including language translation, writing different kinds of creative content, and answering open ended, challenging, or strange questions.
> 
> **Source:**
> 
> This information is aggregated from the provided context:
> 
> * https://docs.anthropic.com/en/docs/intro-to-claude
> * https://docs.anthropic.com/en/docs/about-claude/use-case-guides/customer-support-chat
> * https://docs.anthropic.com/en/docs/build-with-claude/vision
> * https://docs.anthropic.com/en/docs/welcome-to-claude
> * https://docs.anthropic.com/en/docs/intro-to-claude/overview

# Create Retrieval Augmentation Generation application using LangChain

In [26]:
# To answer questions and chain together the prompt, vector search, returned context and model input use a LangChain "Chain"
# In this case you will use the RetrievalQA chain which is commonly used for Question/Answering applications
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA

# initialize model using chat
model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.0, convert_system_message_to_human=True)

In [27]:
from langchain.prompts import PromptTemplate

template = """
    Follow exactly those 3 steps:
    1. Read the context below and aggregrate this data
    Context : {context}
    
    2. Answer the question using only this context
    3. Show the source for your answers
    User Question: {question}

    If you don't have any context and are unsure of the answer, reply that you don't know about this topic.
    """

prompt = PromptTemplate(input_variables=["context",  "question"], template=template)

In [28]:
from langchain.vectorstores.matching_engine import MatchingEngine

vector_store = MatchingEngine.from_components(
    index_id=INDEX_RESOURCE_NAME,
    region=REGION,
    embedding=embeddings,
    project_id=PROJECT_ID,
    endpoint_id=deployed_index[0].index_endpoint,
    gcs_bucket_name=f"{PROJECT_ID}"
)

retriever = vector_store.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 1}
)

# Test the retriever with a simple search performed above
to_markdown(retriever.get_relevant_documents("How do I get started with Anthropic?")[0].page_content)

> content: Use Workbench to create evals, draft prompts, and iteratively refine based on test results.
> 
> Deploy polished prompts and monitor real-world performance for further refinement.
> 
> Implement Claude
> 
> Set up your environment, integrate Claude with your systems (APIs, databases, UIs), and define human-in-the-loop requirements.
> 
> Test your system
> 
> Conduct red teaming for potential misuse and A/B test improvements.
> 
> Deploy to production
> 
> Once your application runs smoothly end-to-end, deploy to production.
> 
> Monitor and improve
> 
> Monitor performance and effectiveness to make ongoing improvements.
> 
> Start building with Claude
> 
> When you’re ready, start building with Claude:
> 
> Follow the Quickstart to make your first API call
> 
> Check out the API Reference
> 
> Explore the Prompt Library for example prompts
> 
> Experiment and start building with the Workbench
> 
> Check out the Anthropic Cookbook for working code examples
> 
> Initial setupOverview
> 
> xlinkedin
> 
> On this page
> 
> What you can do with Claude
> 
> Model options
> 
> Claude 3.5 Family
> 
> Claude 3 Family
> 
> Enterprise considerations
> 
> Implementing Claude
> 
> Start building with Claude, source: https://docs.anthropic.com/en/docs/intro-to-claude

In [29]:
chain_type_kwargs = {"prompt": prompt}
qa = RetrievalQA.from_chain_type(
    llm=model,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs=chain_type_kwargs,
    return_source_documents=True
)

In [30]:
def ask_question(question: str):
    response = qa({"query": question})

    # since k is set to 1 only return the first source retrieved
    source = response['source_documents']
    
    return to_markdown(f"Response: \n\n {response['result']}")

In [31]:
# Note: You will see a library warning when running this step
ask_question("How do I get started with Anthropic?")

> Response: 
> 
>  1. Follow the Quickstart to make your first API call
> 2. Check out the API Reference
> 3. Explore the Prompt Library for example prompts
> 4. Experiment and start building with the Workbench
> 5. Check out the Anthropic Cookbook for working code examples
> 
> Source: https://docs.anthropic.com/en/docs/intro-to-claude