## <b><font color='darkblue'>Preface</font></b>
([source](https://cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-rag-quickstart)) Before trying this sample, follow the Python setup instructions in the [Vertex AI quickstart using client libraries](https://cloud.google.com/vertex-ai/docs/start/client-libraries). For more information, see the [Vertex AI Python API reference documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest).

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see [Set up authentication for a local development environment](https://cloud.google.com/docs/authentication/set-up-adc-local-dev-environment).

In [1]:
from IPython.display import display, Markdown, Latex

## <b><font color='darkblue'>Use the quickstart to get familiar with RAG</font></b>
<b><font size='3ptx'>Retrieval-Augmented Generation (RAG) is an AI framework that enhances large language models (LLMs) by grounding them in external knowledge sources</font>.</b>

<b>Instead of solely relying on their pre-trained knowledge, RAG models first retrieve relevant information from a knowledge base</b> (like a document database or the internet) <b>based on the user's query. This retrieved information is then fed into the LLM along with the original prompt, allowing the LLM to generate more accurate, informed, and up-to-date responses</b>. This mitigates issues like hallucination and knowledge gaps, making LLMs more reliable and adaptable to specific domains.

### <b><font color='darkgreen'>Vertex AI RAG Engine overview</font></b>
([source](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-overview)) <b>Vertex AI RAG Engine, a component of the Vertex AI Platform, facilitates Retrieval-Augmented Generation (RAG)</b>.

Vertex AI RAG Engine is also a data framework for developing context-augmented large language model (LLM) applications. Context augmentation occurs when you apply an LLM to your data. This implements retrieval-augmented generation (RAG).

A common problem with LLMs is that they don't understand private knowledge, that is, your organization's data. <b>With Vertex AI RAG Engine, you can enrich the LLM context with additional private information, because the model can reduce hallucination and answer questions more accurately</b>.

By combining additional knowledge sources with the existing knowledge that LLMs have, a better context is provided. <b>The improved context along with the query enhances the quality of the LLM's response</b>.

The following image illustrates the key concepts to understanding Vertex AI RAG Engine.

![flow](https://cloud.google.com/static/vertex-ai/images/Vertex-RAG-Diagram.png)

### <b><font color='darkgreen'>Get started</font></b>
Install Vertex AI SDK and other required packages

In [3]:
# %pip install --quiet google-cloud-aiplatform
!pip freeze | grep 'google-cloud-aiplatform'

google-cloud-aiplatform==1.81.0


### <b><font color='darkgreen'>Set Google Cloud project information and initialize Vertex AI SDK</font></b>
To get started using Vertex AI, you must have an existing Google Cloud project and [**enable the Vertex AI API**](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [7]:
from dotenv import load_dotenv, find_dotenv
import os
import vertexai
from vertexai.preview import rag
from vertexai.preview.generative_models import GenerativeModel, Tool
from google.oauth2 import service_account

os.environ['ALLOW_RESET'] = 'TRUE'
_ = load_dotenv(find_dotenv(os.path.expanduser('~/.env')))
# https://cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-rag-quickstart
# https://cloud.google.com/vertex-ai/docs/start/install-sdk
# https://stackoverflow.com/questions/76802606/google-vertex-ai-prediction-api-authentication
key_path = os.environ['GOOGLE_APPLICATION_CREDENTIALS']

# Create a credentials object using your service account key file
credentials = service_account.Credentials.from_service_account_file(
    key_path,
    scopes=[
        'https://www.googleapis.com/auth/cloud-platform'])

# Deployed Endpoint model configs
project_id = "llm-demo-407300"
endpoint_id = "endpoint_id"
location = 'us-central1'

# RAG settings
CORPUS_ISPLAY_NAME = "test_corpus"

vertexai.init(
        project=project_id,
        location=location,
        credentials=credentials)

#### <b>Restart runtime</b>
To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [9]:
import IPython
#app = IPython.Application.instance()
#app.kernel.do_shutdown(True)

### <b><font color='darkgreen'>Create Corpus</font></b>
Create a RAG Corpus, Import Files for further query.

In [10]:
corpa_page_iter = rag.list_corpora()

In [11]:
is_test_corpus_created = False
created_corpus_name = None

for corpa in corpa_page_iter:
    if corpa.display_name == CORPUS_ISPLAY_NAME:
        is_test_corpus_created = True
        created_corpus_name = corpa.name
        break
        
print(f'Is testing corpus created? {is_test_corpus_created} with name as "{created_corpus_name}"')

Is testing corpus created? False with name as "None"


We could import files from Google Drive into corpus. Eligible paths can be formatted as:
* https://drive.google.com/drive/folders/{folder_id}
* https://drive.google.com/file/d/{file_id}.

Remember to grant "Viewer" access to the "Vertex RAG Data Service Agent" (with the format of `service-{project_number}@gcp-sa-vertex-rag.iam.gserviceaccount.com`) for your Drive folder/files.

In [12]:
%%time
rag_corpus = None
if not is_test_corpus_created:
    # Create RagCorpus
    # Configure embedding model, for example "text-embedding-004".
    embedding_model_config = rag.EmbeddingModelConfig(
        publisher_model="publishers/google/models/text-embedding-004")

    rag_corpus = rag.create_corpus(
        display_name=CORPUS_ISPLAY_NAME,
        embedding_model_config=embedding_model_config,
    )
else:
    rag_corpus = rag.get_corpus(created_corpus_name)

CPU times: user 55.9 ms, sys: 20 ms, total: 75.9 ms
Wall time: 10.7 s


In [13]:
# Import Files to the RagCorpus
# Supports Google Cloud Storage and Google Drive Links
if not is_test_corpus_created:
    paths = [
        'https://drive.google.com/file/d/17CGIWzv1f40QBzxuoc_09wFSBKgECjpp/view?usp=drive_link',
    ]

    rag.import_files(
        rag_corpus.name,
        paths,
        chunk_size=512,  # Optional
        chunk_overlap=100,  # Optional
        max_embedding_requests_per_min=900,  # Optional
    )

Upload a local file to the corpus ([more](https://cloud.google.com/vertex-ai/generative-ai/docs/samples/generativeaionvertexai-rag-upload-file)):

In [14]:
rag_corpus.name

'projects/504697874977/locations/us-central1/ragCorpora/3379951520341557248'

In [15]:
import os
print(os.getcwd())

/usr/local/google/home/johnkclee/Github/ml_articles/google/vertex_ai


In [24]:
'''
import google.auth.exceptions
from google.cloud import storage
#if not is_test_corpus_created:

try:
    rag_file = rag.upload_file(
        corpus_name=rag_corpus.name,
        path="test_local_file.txt",
        display_name="test_local_file.txt",
        description="About Cockatoo.AI",
    )
    print(f"File uploaded successfully. RagFile name: {rag_file.name}")  # Print the RagFile name
except google.auth.exceptions.GoogleAuthError as e:
    print(f"Authentication error: {e}")
    print("Please ensure you are properly authenticated with Google Cloud.")
    print("Use 'gcloud auth login' and 'gcloud config set project YOUR_PROJECT_ID'.")
    print("Or, if using a service account, set the GOOGLE_APPLICATION_CREDENTIALS environment variable.")
#except storage.exceptions.Forbidden as e:
#    print(f"Permission error: {e}")
#    print("Ensure your account/service account has the 'Storage Object Admin' or 'Storage Object Creator' role.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
'''
print('Not working yet!')

Not working yet!


Below we implement a function to get token size of file:

In [41]:
from google import genai
from google.genai.types import HttpOptions

gemini_model_name = 'gemini-2.0-flash-001'

genai_client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = genai_client.models.count_tokens(
    model=gemini_model_name,
    contents="What's the highest mountain in Africa?",
)
print(response.total_tokens)

10


In [42]:
def count_token(genai_client, content: str) -> int:
    response = client.models.count_tokens(
        model="gemini-2.0-flash-001",
        contents=content,
    )
    return response.total_tokens

Import files from Google Cloud Storage

In [43]:
from google.cloud import storage

INPUT_GCS_BUCKET = 'gs://llm-agent-storage'

storage_client = storage.Client.from_service_account_json(key_path)
bucket = storage_client.get_bucket('llm-agent-storage')
blob_name = path_to_file = 'test_local_file.txt'
blob_token_size = count_token(genai_client, open(path_to_file, 'r').read()) + 5
blob = bucket.blob(blob_name)
blob.upload_from_filename(path_to_file)
print(f'blob_token_size: {blob_token_size}')

blob_token_size: 9


[**Fine-tune RAG transformations**](https://cloud.google.com/vertex-ai/generative-ai/docs/fine-tune-rag-transformations): 
* **`chunk_size`**: When documents are ingested into an index, they are split into chunks. The chunk_size parameter (in tokens) specifies the size of the chunk. The default chunk size is 1,024 tokens.
* **`chunk_overlap`**: By default, documents are split into chunks with a certain amount of overlap to improve relevance and retrieval quality. The default chunk overlap is 200 tokens.

In [33]:
response = rag.import_files(
    corpus_name=rag_corpus.name,
    paths=[INPUT_GCS_BUCKET],
    chunk_size=max(blob_token_size, 1024),  # Optional
    chunk_overlap=0,  # Optional
    max_embedding_requests_per_min=900,  # Optional
)

In [48]:
rag_file_iter = rag.list_files(rag_corpus.name)

In [49]:
for rag_file in rag_file_iter:
    print(rag_file.display_name)

John_K_Lee_20250124.pdf
test.txt
test_local_file.txt


### <b><font color='darkgreen'>Direct context retrieval</font></b>

In [36]:
# Direct context retrieval
response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=rag_corpus.name,
            # Optional: supply IDs from `rag.list_files()`.
            # rag_file_ids=["rag-file-1", "rag-file-2", ...],
        )
    ],
    text="What is Cockatoo.AI?",
    similarity_top_k=10,  # Optional
    vector_distance_threshold=0.5,  # Optional
)

In [37]:
response

contexts {
  contexts {
    source_uri: "gs://llm-agent-storage/test.txt"
    text: "Project `Cockatoo.AI` as a language tutor (focused on listening & speaking)\r\nIn the Open issues, please put your name at the end of the task that you are interested in.\r\n\r\nGoal towards users:\r\nSpeak as native as possible to users so users can practice listening as well as speaking.\r\nCan adjust the language tutor’s teaching attitude (e.g., Direct vs Plato)\r\nCan adjust the language tutor’s speaking/work usage difficulties (e.g., A1, A2, B1, B2, C1 levels) to make learning easier for users.\r\nCan discuss the topic the user uploaded/raised and can correspondingly ask appropriate questions to the user.\r\nCan practice listening to various tones from AI (e.g., expressing the sentence with happiness, sadness, …, from elderly/adult/woman/man …)\r\nGoal towards team members:\r\nImprove LLM, voice models, DL, and other framework knowledge/skills.\r\nCan use the language tutor to benefit him/herself.

In [38]:
print(response.contexts.contexts[0].text)

Project `Cockatoo.AI` as a language tutor (focused on listening & speaking)
In the Open issues, please put your name at the end of the task that you are interested in.

Goal towards users:
Speak as native as possible to users so users can practice listening as well as speaking.
Can adjust the language tutor’s teaching attitude (e.g., Direct vs Plato)
Can adjust the language tutor’s speaking/work usage difficulties (e.g., A1, A2, B1, B2, C1 levels) to make learning easier for users.
Can discuss the topic the user uploaded/raised and can correspondingly ask appropriate questions to the user.
Can practice listening to various tones from AI (e.g., expressing the sentence with happiness, sadness, …, from elderly/adult/woman/man …)
Goal towards team members:
Improve LLM, voice models, DL, and other framework knowledge/skills.
Can use the language tutor to benefit him/herself.
Learn CI/CD and do research on improving potential product biasedness and relevant skills.
Learn the balance between 

### <b><font color='darkgreen'>Enhance generation</font>

In [39]:
# Create a RAG retrieval tool
rag_retrieval_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[
                rag.RagResource(
                    rag_corpus=rag_corpus.name,  # Currently only 1 corpus is allowed.
                    # Optional: supply IDs from `rag.list_files()`.
                    # rag_file_ids=["rag-file-1", "rag-file-2", ...],
                )
            ],
            similarity_top_k=3,  # Optional
            vector_distance_threshold=0.5,  # Optional
        ),
    )
)

In [44]:
# Create a gemini-pro model instance
rag_model = GenerativeModel(
    model_name=gemini_model_name, tools=[rag_retrieval_tool]
)

In [45]:
# Generate response
response = rag_model.generate_content("Show me the academic background of John.")

In [46]:
Markdown(response.candidates[0].content.parts[0].text)

John K. Lee has a M.S. in Computer Science & Information Engineering from National Taiwan University and a B.S. in Electronic Engineering from National Taiwan University of Science and Technology (NTUST).


### <b><font color='darkgreen'>Delete corpus</font></b>
We could use `rag.delete_corpus` to delete the corpus:

In [50]:
rag.delete_corpus(rag_corpus.name)

Successfully deleted the RagCorpus.


## <b><font color='darkblue'>Supplement</font></b>
* [Intro to Building a Scalable and Modular RAG System with RAG Engine in Vertex AI](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/rag-engine/intro_rag_engine.ipynb)