# Rerank Document Compressor

In this notebook we will go through how you can use a rerank document compressor with Bedrock.



In [3]:
import boto3

session = boto3.Session()
client = session.client('bedrock')
foundation_model = client.get_foundation_model(modelIdentifier="amazon.rerank-v1:0")

model_arn = foundation_model["modelDetails"]["modelArn"]

The code below processes a list of documents to determine their relevance to a given query using AWS Bedrock's reranking capabilities. It initializes a BedrockRerank instance, providing a list of documents and a query. The `compress_documents` method then evaluates and ranks the documents based on relevance, ensuring that the most relevant ones are prioritized.

In [4]:
from langchain_core.documents import Document
from langchain_aws import BedrockRerank

# Initialize the class
reranker = BedrockRerank(model_arn=model_arn)

# List of documents to rerank
documents = [
    Document(page_content="LangChain is a powerful library for LLMs."),
    Document(page_content="AWS Bedrock enables access to AI models."),
    Document(page_content="Artificial intelligence is transforming the world."),
]

# Query for reranking
query = "What is AWS Bedrock?"

# Call the rerank method
results = reranker.compress_documents(documents, query)

# Display the most relevant documents
for doc in results:
    print(f"Content: {doc.page_content}")
    print(f"Score: {doc.metadata['relevance_score']}")

Content: AWS Bedrock enables access to AI models.
Score: 0.07081620395183563
Content: Artificial intelligence is transforming the world.
Score: 2.8350802949717036e-06
Content: LangChain is a powerful library for LLMs.
Score: 1.5903378880466335e-06


Now let's enhance our base retriever by wrapping it with a `ContextualCompressionRetriever`. Here, we integrate `BedrockRerank`, which leverages AWS Bedrock's reranking capabilities to refine the retrieved results.

When a query is executed, the retriever first retrieves relevant documents using FAISS and then reranks them based on relevance, providing more accurate and meaningful responses.

In [5]:
from langchain_aws import BedrockEmbeddings
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain.vectorstores import FAISS
from langchain_core.documents import Document
from langchain_aws import BedrockRerank

# Create a vector store using FAISS with Bedrock embeddings
documents = [
    Document(page_content="LangChain integrates LLM models."),
    Document(page_content="AWS Bedrock provides cloud-based AI models."),
    Document(page_content="Machine learning can be used for predictions."),
]
embeddings = BedrockEmbeddings()
vectorstore = FAISS.from_documents(documents, embeddings)

# Create the document compressor using BedrockRerank
reranker = BedrockRerank(model_arn=model_arn)

# Create the retriever with contextual compression
retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=vectorstore.as_retriever(),
)

# Execute a query
query = "How does AWS Bedrock work?"
retrieved_docs = retriever.invoke(query)

# Display the most relevant documents
for doc in retrieved_docs:
    print(f"Content: {doc.page_content}")
    print(f"Score: {doc.metadata.get('relevance_score', 'N/A')}")

Content: AWS Bedrock provides cloud-based AI models.
Score: 0.07585818320512772
Content: Machine learning can be used for predictions.
Score: 2.8573158488143235e-06
Content: LangChain integrates LLM models.
Score: 1.640820528336917e-06


Unlike `compress_documents`, which works with structured Document objects, the rerank method allows passing plain text strings. This simplifies the process of evaluating and ranking text data.

In [6]:
from langchain_aws import BedrockRerank

# Initialize BedrockRerank
reranker = BedrockRerank(model_arn=model_arn)

# Unstructured documents
documents = [
    "LangChain is used to integrate LLM models.",
    "AWS Bedrock provides access to cloud-based models.",
    "Machine learning is revolutionizing the world.",
]

# Query
query = "What is the role of AWS Bedrock?"

# Rerank the documents
results = reranker.rerank(query=query, documents=documents)

# Display the results
for res in results:
    print(f"Index: {res['index']}, Score: {res['relevance_score']}")
    print(f"Document: {documents[res['index']]}")

Index: 1, Score: 0.07159119844436646
Document: AWS Bedrock provides access to cloud-based models.
Index: 2, Score: 9.666109690442681e-06
Document: Machine learning is revolutionizing the world.
Index: 0, Score: 8.25057043130073e-07
Document: LangChain is used to integrate LLM models.
