# GenAI ChatBot for Enterprise Data Using Retrieval Augmented Generation (RAG)
### Built on Google's GCP Vertex AI platform, using Gemini Models
### Uses GCS (for Document storage), LangChain (for building RAG system), Redis (for Vector DB)
           

## Installation & Authentication

Install LangChain, Vertex AI LLM SDK, and related libraries.


In [None]:
%pip install google-cloud-aiplatform langchain unstructured unstructured[pdf] --upgrade --user

In [None]:
!pip install -U langchain-community langchain-google-community langchain-google-vertexai

In [8]:
# Restart kernel after installs so that your environment can access the new packages

import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


## Get Libraries & Classes


In [6]:
from google.cloud import aiplatform
from langchain_google_community import GCSDirectoryLoader
from langchain_google_vertexai import VertexAIEmbeddings
from langchain_google_vertexai import VertexAI

# Using Vertex AI
import vertexai

# print(f"Vertex AI SDK version: {aiplatform.__version__}")

## Initialize Vertex AI

**We will need a project id and location where the Vertex AI compute and embedding will be hosted**


In [7]:
import os
PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

vertexai.init(project=PROJECT_ID, location=LOCATION)

## Ingest the Documents to build the context for the LLM

_Load all the Documents from Google Cloud Storage (GCS Bucket)_


In [10]:
loader = GCSDirectoryLoader(
    project_name=PROJECT_ID, bucket="rag-langchain-demo"
)
documents = loader.load()
print(f"No. of documents = {len(documents)}")

No. of documents = 2


_Split documents into chunks as needed by the token limit of the LLM and let there be an overlap between the chunks_


In [11]:
# split the documents into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=100)
doc_chunks = text_splitter.split_documents(documents)
print(f"No. of document chunks = {len(doc_chunks)}")

No. of document chunks = 125


## Structuring the ingested documents in a vector space using Redis as Vector Database


_Create an embedding vector engine for all the text in the documents that have been ingested_


In [17]:
# Define Text Embeddings model
embeddings_model = VertexAIEmbeddings(project=PROJECT_ID, location=LOCATION, model_name="text-embedding-005")

_Create a vector store and store the embeddings in the vector store_


In [None]:
import os

# Replace values below with your own if using Redis Cloud instance
REDIS_HOST = os.getenv("REDIS_HOST", "127.0.0.1") 
REDIS_PORT = os.getenv("REDIS_PORT", "6379")      
REDIS_URL = f"redis://{REDIS_HOST}:{REDIS_PORT}"

In [19]:
import redis

# Connect with the Redis Python Client
client = redis.Redis.from_url(REDIS_URL)
client.ping()

True

In [None]:
! pip install langchain-google-memorystore-redis

In [21]:
import re
from langchain_google_memorystore_redis import (
    DistanceStrategy,
    HNSWConfig,
    RedisVectorStore,
)

index_config = HNSWConfig(
    name="public_rag_demo", distance_strategy=DistanceStrategy.COSINE, vector_size=768
)

try:
    RedisVectorStore.init_index(client=client, index_config=index_config)
except redis.exceptions.ResponseError as e:
    if re.match(r".*already exists", str(e)):
        print("Index already exists, skipping creation.")
    else:
        raise

In [23]:
redis_vector_db = RedisVectorStore(
    index_name="public_rag_demo",
    embeddings=embeddings_model,
    client=client
)

In [24]:
redis_vector_db.add_documents(doc_chunks)

['public_rag_demo72b0d9b0-0976-43db-9ad9-66da2328bd82',
 'public_rag_demob4445ca9-6342-4e45-8463-9c2e0eabb8ca',
 'public_rag_demo9b8d59aa-0b3f-496c-be7d-5db9ba5f8ebb',
 'public_rag_demo05340896-f976-4661-8518-f5171fc28f76',
 'public_rag_demoe1915047-0288-48d1-8e5d-74682fa7bd24',
 'public_rag_demoe3f32cbf-534f-4455-921f-9cf6322e923a',
 'public_rag_demo5f2d2f66-0e03-4fec-8930-8e321eca1458',
 'public_rag_demoad3f1ed3-5bde-450d-a9c6-9b10af14e629',
 'public_rag_demo69bab5b0-fd37-4b29-9704-396487208778',
 'public_rag_demo3d405d45-a088-4a86-8eff-01698041508c',
 'public_rag_demo44680270-52e4-41b5-a522-af5ce591de86',
 'public_rag_demo31e97098-d47c-4773-aead-db6756bee8bb',
 'public_rag_democ7f78586-05a8-453f-b2cf-4bb22c178a7c',
 'public_rag_demo94e626ce-52db-4d99-8c75-89e2248ef3be',
 'public_rag_democacc4099-bcfa-48b6-8e7d-e3c68ffd41e0',
 'public_rag_demo365cce0f-6bf3-4bb7-b3dd-0b62ccef531f',
 'public_rag_demof76c4699-d06b-479d-8e5a-f09cb45300b8',
 'public_rag_demo675b507f-fc42-4527-b17a-3633a64

## Obtain handle to the retriever

We will use the native retriever to perform similarity search within the vector store among the different document chunks so as to return that document chunk which has the lowest vectoral "distance" with the incoming user query.


In [25]:
# Expose index to the retriever
retriever = redis_vector_db.as_retriever(
    search_type="similarity", search_kwargs={"k": 3}
)
retriever

VectorStoreRetriever(tags=['RedisVectorStore'], vectorstore=<langchain_google_memorystore_redis.vectorstore.RedisVectorStore object at 0x7fdfc14ccfa0>, search_kwargs={'k': 3})

## Define a Retrieval QA Chain to use retriever


In [27]:
# We use Vertex AI Gemini Flash for LLM
llm = VertexAI(
    project=PROJECT_ID, 
    location=LOCATION, 
    model_name="gemini-2.0-flash",
    max_output_tokens=1000,
    temperature=0.05,
    top_p=0.8,
    top_k=40,
    verbose=True
)
#llm

In [28]:
# Create chain to answer questions
from langchain.chains import RetrievalQA

# Uses LLM to synthesize results from the search index.
qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True
)

## Leverage LLM to search from retriever


_Example:_


In [33]:
query = "What is the profit of Uber in 2023? Give revenue, profit details"
results = qa({"query": query})
print("Query:", query)
print("Response:\n", results['result'])

Query: What is the profit of Uber in 2023? Give revenue, profit details
Response:
 Based on the provided text:

*   **Revenue:** $37.3 billion
*   **Net income attributable to Uber Technologies, Inc.:** $1.9 billion
*   **Adjusted EBITDA:** $4.1 billion


In [35]:
query = "What is the increase in profit of Google in 2024 compared to 2023 ?"
results = qa({"query": query})
print("Query:", query)
print("Response:\n", results['result'])

Query: What is the increase in profit of Google in 2024 compared to 2023 ?
Response:
 Google Services operating income increased $25.4 billion from 2023 to 2024 and Google Cloud operating income increased $4.4 billion from 2023 to 2024. The combined increase in profit of Google Services and Google Cloud in 2024 compared to 2023 is $29.8 billion.


In [37]:
query = "What is the profit of Ola in 2024?"
results = qa({"query": query})
print("Query:", query)
print("Response:\n", results['result'])

Query: What is the profit of Ola in 2024?
Response:
 I'm sorry, but I cannot answer the question about Ola's profit in 2024. The provided text does not contain any information about Ola's financial performance. It focuses on Uber's performance and investments.



In [42]:
query = "How many employees are there in Uber?"
results = qa({"query": query})
print("Query:", query)
print("Response:\n", results['result'])

Query: How many employees are there in Uber?
Response:
 As of December 31, 2023, Uber and its subsidiaries had approximately 30,400 employees globally.



In [51]:
query = "What is the revenue of Alphabet in 2024? What is the increase compared to 2023"
results = qa({"query": query})
print("Query:", query)
print("Response:\n", results['result'])

Query: What is the revenue of Alphabet in 2024? What is the increase compared to 2023
Response:
 The revenue of Alphabet in 2024 was $350.018 billion. The increase compared to 2023 was $42.624 billion.



## Build a Front End

Enable a simple front end so users can query against documents and obtain intelligent answers with grounding information that references the base documents that was used to respond to user query


In [38]:
%pip install -q gradio

Note: you may need to restart the kernel to use updated packages.


In [47]:
from google.cloud import storage
import gradio as gr


def chatbot(input_text):
    result = qa({"query": input_text})

    return (
        result["result"],
        get_public_url(result["source_documents"][0].metadata["source"]),
        result["source_documents"][0].metadata["source"],
    )


def get_public_url(uri):
    """Returns the public URL for a file in Google Cloud Storage."""
    # Split the URI into its components
    components = uri.split("/")

    # Get the bucket name
    bucket_name = components[2]

    # Get the file name
    file_name = components[3]

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(file_name)
    return blob.public_url


iface = gr.Interface(
    fn=chatbot,
    inputs=[gr.Textbox(label="Query")],
    title="Enterprise Data Search ChatBot",
    outputs=[
        gr.Textbox(label="Response"),
        gr.Textbox(label="URL"),
        gr.Textbox(label="Cloud Storage URI"),
    ],
    theme=gr.themes.Soft,
)



In [52]:
print("Launching Gradio")

iface.launch(share=False)

Launching Gradio
Rerunning server... use `close()` to stop if you need to change `launch()` parameters.
----

To create a public link, set `share=True` in `launch()`.


