# Demo of retrieval IO processor with MTRAG benchmark data

This notebook shows how to use the retrieval IO processor to implement the retrieval 
phase of Retrieval-Augmented Generation (RAG) on top of Granite 3.3.

This notebook can run its own vLLM server to perform inference, or you can host the 
model on your own server. 

To use your own server, set the `run_server` variable below
to `False` and set appropriate values for the constants in the cell marked
`# Constants go here`.

In [None]:
from granite_io.io.granite_3_3.input_processors.granite_3_3_input_processor import (
    Granite3Point3Inputs,
)
from granite_io import make_io_processor, make_backend
from granite_io.io.retrieval import ElasticsearchRetriever, RetrievalRequestProcessor
from granite_io.backend.vllm_server import LocalVLLMServer
from IPython.display import display, Markdown
import pandas as pd

In [None]:
# Constants go here
host = ""
corpus_name = ""
model_name = "ibm-granite/granite-3.3-8b-instruct"

run_server = True

In [None]:
if run_server:
    # Start by firing up a local vLLM server and connecting a backend instance to it.
    server = LocalVLLMServer(model_name)
    server.wait_for_startup(200)
    backend = server.make_backend()
else:  # if not run_server
    # Use an existing server.
    # The constants here are for the server that local_vllm_server.ipynb starts.
    # Modify as needed.
    openai_base_url = "http://localhost:36101/v1"
    openai_api_key = "granite_intrinsics_1234"
    backend = make_backend(
        "openai",
        {
            "model_name": model_name,
            "openai_base_url": openai_base_url,
            "openai_api_key": openai_api_key,
        },
    )

In [4]:
# Spin up an IO processor for the base model
io_proc = make_io_processor(model_name, backend=backend)
io_proc

<granite_io.io.granite_3_3.granite_3_3.Granite3Point3InputOutputProcessor at 0x7f5d49af94b0>

In [5]:
# Create an example chat completions request
chat_input = Granite3Point3Inputs.model_validate(
    {
        "messages": [
            {
                "role": "assistant",
                "content": "Welcome to the California Appellate Courts help desk.",
            },
            {
                "role": "user",
                "content": "I need to do some legal research to be prepared for my "
                "oral argument. Can I visit the law library?",
            },
        ],
        "generate_inputs": {
            "temperature": 0.0,
            "max_tokens": 4096,
        },
    }
)
chat_input

Granite3Point3Inputs(messages=[AssistantMessage(content='Welcome to the California Appellate Courts help desk.', role='assistant', tool_calls=[], reasoning_content=None, citations=None, documents=None, hallucinations=None, stop_reason=None), UserMessage(content='I need to do some legal research to be prepared for my oral argument. Can I visit the law library?', role='user')], tools=[], generate_inputs=GenerateInputs(prompt=None, model=None, best_of=None, echo=None, frequency_penalty=None, logit_bias=None, logprobs=None, max_tokens=4096, n=None, presence_penalty=None, stop=None, stream=None, stream_options=None, suffix=None, temperature=0.0, top_p=None, user=None, extra_headers=None, extra_body={}), documents=[], controls=None, thinking=False, sanitize=None)

In [6]:
# Run the chat completion request through the base model without RAG.
# The result should be a refusal message that starts with, "As an AI, I don't have
# physical locations or resources."
non_rag_result = io_proc.create_chat_completion(chat_input)
display(Markdown(non_rag_result.results[0].next_message.content))

As an AI, I don't have real-time access to physical locations or their current operating conditions. However, I can guide you on how to proceed with your legal research. 

Many appellate courts, including those in California, provide access to their law libraries for attorneys and the public. You would typically need to contact the specific court where your case is being heard to inquire about visiting their law library. 

Due to the COVID-19 pandemic, many courts have adjusted their operations and might offer remote access or curbside pickup for legal materials. It's advisable to check the court's website or contact them directly for the most current information.

Additionally, you can conduct legal research remotely using online legal databases such as Westlaw, LexisNexis, or a free resource like Justia or the California Courts' self-help website, which provides free access to California appellate court opinions and other legal resources.

Remember, it's crucial to stay updated with the latest rules and procedures of the court due to potential changes resulting from the pandemic or other factors.

In [7]:
# Spin up an in-memory vector database
retriever = ElasticsearchRetriever(
    corpus_name=corpus_name, host=host, verify_certs=False, ssl_show_warn=False
)

In [None]:
# Use a RetrievalRequestProcessor to augment the chat completion request with documents.
rag_processor = RetrievalRequestProcessor(retriever)
rag_chat_input = rag_processor.process(chat_input)[0]
pd.set_option("display.max_colwidth", 200)
print("Documents:")
pd.DataFrame.from_records([d.model_dump() for d in rag_chat_input.documents])

In [9]:
# Run the same request through the base model with RAG documents
rag_result = io_proc.create_chat_completion(rag_chat_input)
display(Markdown(rag_result.results[0].next_message.content))

Yes, you can visit the law library to conduct your legal research. Law libraries contain case-law reports, legal periodicals, and legislation, which are essential resources for determining the current state of the law.

In [10]:
# Free up GPU resources
if "server" in locals():
    server.shutdown()