![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI%2FGenerate&file=Long+Context+Retrieval+With+The+Vertex+AI+Gemini+API.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FGenerate%2FLong%2520Context%2520Retrieval%2520With%2520The%2520Vertex%2520AI%2520Gemini%2520API.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Generate/Long%20Context%20Retrieval%20With%20The%20Vertex%20AI%20Gemini%20API.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Long Context Retrieval With The Vertex AI Gemini API

**Retrieval - the task of retrieving information as context for an LLM**, like the Gemini family on Vertex AI. Retrieval augmented generation (RAG) is the task of retrieving relevant context and then providing it along with the prompt to the LLM. 

[Long context](https://cloud.google.com/vertex-ai/generative-ai/docs/long-context) is a way of providing full-length sources to the LLM, which can then perform its own retrieval.  Gemini 1.5 Flash (1M) and Gemini 1.5 Pro (2M) have incredible input context windows (1M and 2M tokens respectively) and have shown [near-perfect retrieval of > 99%](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf).

For a complete overview of the Gemini API, check out the companion workflow [Vertex AI Gemini API](./Vertex%20AI%20Gemini%20API.ipynb).

**Use Case Exploration**

Buying a home usually involves borrowing money from a lending institution, typically through a mortgage secured by the home's value. But how do these institutions manage the risks associated with such large loans, and how are lending standards established?

In the United States, two government-sponsored enterprises (GSEs) play a vital role in the housing market:
- Federal National Mortgage Association ([Fannie Mae](https://www.fanniemae.com/))
- Federal Home Loan Mortgage Corporation ([Freddie Mac](https://www.freddiemac.com/))

These GSEs purchase mortgages from lenders, enabling those lenders to offer more loans. This process also allows Fannie Mae and Freddie Mac to set standards for mortgages, ensuring they are responsible and borrowers are more likely to repay them. This system makes homeownership more affordable and stabilizes the housing market by maintaining a steady flow of liquidity for lenders and keeping interest rates controlled.

However, navigating the complexities of these GSEs and their extensive servicing guides can be challenging. What if you could directly query these guides and get precise answers without needing to design a complex RAG architecture?

This workflow leverages the long context capabilities of Vertex AI Gemini models and the efficiency of context caching to provide low-latency and cost-effective access to these comprehensive documents. Explore the implementation below!

**References**
- [Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach](https://arxiv.org/pdf/2407.16833)


---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform', '1.69.0'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('fitz', 'pymupdf'),
    ('requests', 'requests')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

inputs:

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'long-context'

GCS_BUCKET = PROJECT_ID # change to Bucket name if not the same as the Project ID

packages:

In [75]:
# Python standard library imports:
import os, io, base64, json, datetime

# package imports
from IPython.display import Markdown
import IPython.display
import fitz #pymupdf
import requests

# vertex ai imports
from google.cloud import aiplatform
import vertexai
import vertexai.generative_models # for Gemini Models

# preview imports for vertex ai api:
from vertexai.preview import caching
import vertexai.preview.generative_models
import vertexai.preview.batch_prediction

# google cloud imports
from google.cloud import storage

In [9]:
aiplatform.__version__

'1.69.0'

clients:

In [10]:
vertexai.init(project = PROJECT_ID, location = REGION)
gcs = storage.Client(project = PROJECT_ID)
bucket = gcs.bucket(GCS_BUCKET)

---
## Gemini Models

Select one of the [supported Gemini models](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported-models) and read more about the characteristics of each [here](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).


### Setup Model

Here the [Gemini 1.5 Flash model with version 002](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-1.5-flash) is selected. It has these characteristics (to name a few):
- Max Input Tokens: 1,048,576
- Max Output Tokens: 8,192
- Max image:
    - raw size 20MB
    - base64 encoded size 7MB
    - number per prompt 3000
- Max video:
    - length 1 hour
    - number per prompt 10
- Max audio:
    - length 8.4 hours
    - number per prompt 1
- Max PDF:
    - size 30 MB
- 102 [Languages](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#languages-gemini) for understanding and responding


In [42]:
gemini = vertexai.generative_models.GenerativeModel("gemini-1.5-pro-002")

### **Prompt With Text**

In [12]:
response = gemini.generate_content('How do I get a mortgage?')
Markdown(response.text)

Getting a mortgage is a multi-step process. Here's a general outline of how it works:

**1. Check Your Credit Score and History:**

* **Obtain your credit report:** Get free copies from AnnualCreditReport.com (the only federally authorized site). Review for errors and dispute any inaccuracies.
* **Understand your credit score:**  A higher score generally means better interest rates.  Aim for a score above 620, though higher is always better.

**2. Determine How Much You Can Afford:**

* **Use online mortgage calculators:** These tools can give you a preliminary estimate based on income, debt, and desired down payment.
* **Consider your debt-to-income ratio (DTI):** Lenders look at this closely.  A lower DTI (generally below 43%) is preferred.
* **Factor in all housing costs:** Don't just think about the mortgage payment. Include property taxes, homeowner's insurance, and potential HOA fees.

**3. Get Pre-Approved for a Mortgage:**

* **Shop around with multiple lenders:** Compare interest rates, fees, and loan terms. Don't be afraid to negotiate.
* **Provide necessary documentation:** Be prepared to share pay stubs, tax returns, bank statements, and other financial information.
* **Receive a pre-approval letter:** This shows sellers you're a serious buyer and strengthens your offer.

**4. Find a Real Estate Agent (Optional but Recommended):**

* **An agent can help you navigate the home-buying process:** They can help you find properties, negotiate offers, and understand the local market.

**5. Shop for a Home:**

* **Consider your needs and wants:**  Think about location, size, amenities, and your long-term goals.
* **Attend open houses and schedule private showings:** Get a feel for the properties and the neighborhoods.

**6. Make an Offer on a Home:**

* **Work with your agent to craft a competitive offer:**  Consider the asking price, market conditions, and any contingencies (e.g., home inspection).

**7. Get a Home Appraisal:**

* **The lender will order an appraisal to determine the fair market value of the property:** This protects both you and the lender.

**8. Finalize the Mortgage:**

* **The lender will finalize the loan terms and prepare the closing documents:** Review everything carefully before signing.
* **Purchase homeowner's insurance:**  This is required by lenders.
* **Close on the home:** This is when you sign the final paperwork, pay closing costs, and receive the keys to your new home.


**Key Terms to Know:**

* **Down payment:** The upfront portion of the home's purchase price that you pay.  A larger down payment typically means a lower interest rate and monthly payment.
* **Interest rate:** The cost of borrowing money.
* **Loan term:** The length of time you have to repay the loan (e.g., 15 years, 30 years).
* **Closing costs:** Fees associated with the mortgage process, including appraisal fees, title insurance, and lender fees.
* **Private Mortgage Insurance (PMI):**  If your down payment is less than 20%, you'll likely be required to pay PMI, which protects the lender if you default on the loan.


**Tips for a Smoother Process:**

* **Save for a down payment:** The more you can save, the better.
* **Pay down debt:**  Lowering your DTI can improve your chances of getting approved and securing a favorable interest rate.
* **Organize your financial documents:**  Having everything readily available will streamline the application process.
* **Ask questions:** Don't hesitate to ask your lender or real estate agent if you're unsure about anything.

Getting a mortgage can seem daunting, but by understanding the process and preparing in advance, you can make it a smoother experience. Good luck!


### Retrieve Documents

In [13]:
freddie_url = 'https://guide.freddiemac.com/ci/okcsFattach/get/1002095_2'
fannie_url = 'https://singlefamily.fanniemae.com/media/39861/display'

In [14]:
freddie_retrieve = requests.get(freddie_url).content
fannie_retrieve = requests.get(fannie_url).content

In [15]:
freddie_doc = fitz.open(stream = freddie_retrieve, filetype = 'pdf')
fannie_doc = fitz.open(stream = fannie_retrieve, filetype = 'pdf')

In [16]:
freddie_doc.page_count, fannie_doc.page_count

(2641, 1180)

### Split Documents

The models have constraints on the size of individual files.  Here we want to split the PDFs into parts of no more than 1000 pages and verify that they are under 30MB in size.

In [55]:
def doc_parts(doc):
    start_page = 0
    max_pages = 1000
    n_pages = doc.page_count
    
    doc_list = []
    while start_page < n_pages:
        end_page = min(start_page + max_pages - 1, n_pages)
        new_doc = fitz.open()
        new_doc.insert_pdf(doc, from_page = start_page, to_page = end_page)
        doc_list.append(new_doc)
        start_page = end_page + 1
    
    print(f"The document has {n_pages} pages and has been split into parts with page counts: {[p.page_count for p in doc_list]}")
    
    return doc_list

In [57]:
freddie_parts = doc_parts(freddie_doc)

The document has 2641 pages and has been split into parts with page counts: [1000, 1000, 641]


In [58]:
fannie_parts = doc_parts(fannie_doc)

The document has 1180 pages and has been split into parts with page counts: [1000, 180]


### Save Documents To GCS Files

In [63]:
def doc_to_gcs(document, name):
    buffer = io.BytesIO()
    document.save(buffer)
    buffer.seek(0) # reset the position to the beginning
    blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/{name}.pdf")
    blob.upload_from_file(buffer, content_type = 'application/pdf')
    print(f"The file 'gs://{bucket.name}/{blob.name}' is {(blob.size / (1024*1024)):.2f} MB")
    return blob

In [64]:
freddie_blob = doc_to_gcs(freddie_doc, 'freddie_full')

The file 'gs://statmike-mlops-349915/applied-genai/long-context/freddie_full.pdf' is 21.44 MB


In [65]:
fannie_blob = doc_to_gcs(fannie_doc, 'fannie_full')

The file 'gs://statmike-mlops-349915/applied-genai/long-context/fannie_full.pdf' is 4.55 MB


### Save Document Parts To GCS Files

In [66]:
freddie_blobs = [doc_to_gcs(doc, f'freddie_part_{d}') for d, doc in enumerate(freddie_parts)]

The file 'gs://statmike-mlops-349915/applied-genai/long-context/freddie_part_0.pdf' is 8.45 MB
The file 'gs://statmike-mlops-349915/applied-genai/long-context/freddie_part_1.pdf' is 7.67 MB
The file 'gs://statmike-mlops-349915/applied-genai/long-context/freddie_part_2.pdf' is 4.59 MB


In [67]:
fannie_blobs = [doc_to_gcs(doc, f'fannie_part_{d}') for d, doc in enumerate(fannie_parts)]

The file 'gs://statmike-mlops-349915/applied-genai/long-context/fannie_part_0.pdf' is 3.49 MB
The file 'gs://statmike-mlops-349915/applied-genai/long-context/fannie_part_1.pdf' is 0.61 MB


### Gemini Multimodal Context Parts

In [70]:
freddie_contexts = [
    vertexai.generative_models.Part.from_uri(
        uri = f"gs://{bucket.name}/{b.name}",
        mime_type = b.content_type
    ) for b in freddie_blobs
]

In [71]:
fannie_contexts = [
    vertexai.generative_models.Part.from_uri(
        uri = f"gs://{bucket.name}/{b.name}",
        mime_type = b.content_type
    ) for b in fannie_blobs
]

### Create Context Cache(s)

Rather than sending the document (parts) along with each call to the Gemini API, it can be helpful to first load the documents as a context cache.  This makes subsequent call to the API faster and possibly cheaper as the documents are charged at a caching rate (size and time used) rather than a token/character rate.  

For more information on [Context Caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview) check out the companion workflow: [Vertex AI Gemini API](./Vertex%20AI%20Gemini%20API.ipynb).

In [76]:
freddie_cache = vertexai.preview.caching.CachedContent.create(
    model_name = 'gemini-1.5-flash-002',
    contents = freddie_contexts,
    ttl = datetime.timedelta(minutes = 30)
)

In [77]:
fannie_cache = vertexai.preview.caching.CachedContent.create(
    model_name = 'gemini-1.5-flash-002',
    contents = fannie_contexts,
    ttl = datetime.timedelta(minutes = 30)
)

In [79]:
combined_cache = vertexai.preview.caching.CachedContent.create(
    model_name = 'gemini-1.5-flash-002',
    contents = ['The Freddie Mac documents:'] + freddie_contexts + ['The Fannie Mae documents:'] + fannie_contexts,
    system_instruction = 'You are incredibly knowledgable about GSEs (Freddie Mac and Fannie Mae) who purchase mortgages from lenders.  You answer questions about the selling process from the point of view of each GSE and then you compare/contrast each of them relative to the users question.',
    ttl = datetime.timedelta(minutes = 30)
)

### Generate Responses With Gemini Multimodal Prompts

Register the Gemini Flash 1.5 model separately for each of the three caches. Then prompt each to see answer specific to Fannie Mae, Freddie Mac, and then a combined and coparative answer.

In [80]:
prompt = 'Does a lender have to perform servicing functions directly?'

In [81]:
freddie_model = vertexai.preview.generative_models.GenerativeModel.from_cached_content(cached_content = freddie_cache)
fannie_model = vertexai.preview.generative_models.GenerativeModel.from_cached_content(cached_content = fannie_cache)
combined_model = vertexai.preview.generative_models.GenerativeModel.from_cached_content(cached_content = combined_cache)

In [82]:
freddie_response = freddie_model.generate_content(
    contents = [prompt]
)
Markdown(freddie_response.text)

No.  A lender may contract with a servicer to perform servicing functions on its behalf.  Freddie Mac's Seller/Servicer Guide outlines the responsibilities of both lenders and servicers.  The lender retains ultimate responsibility for the mortgage, but the servicer performs the day-to-day servicing activities.


In [83]:
fannie_response = fannie_model.generate_content(
    contents = [prompt]
)
Markdown(fannie_response.text)

No.  A lender may use other organizations to perform some or all of its servicing functions (Subpart A3, Getting Started With Fannie Mae).  The Selling Guide refers to this as "subservicing," meaning that a servicer (the "subservicer") other than the contractually responsible servicer (the "master" servicer) is performing the servicing functions.  However, the lender remains fully responsible to Fannie Mae for all functions that are outsourced to third parties.

In [84]:
combined_response = combined_model.generate_content(
    contents = [prompt]
)
Markdown(combined_response.text)

Here's a comparison of Freddie Mac and Fannie Mae's perspectives on whether a lender must perform servicing functions directly, followed by a summary comparison:

**Freddie Mac's Perspective:**

Freddie Mac does *not* require lenders to perform servicing functions directly.  The *Seller/Servicer Guide* extensively covers the roles and responsibilities of both Sellers (the originating lender) and Servicers (the entity responsible for ongoing loan administration after Freddie Mac purchases the mortgage).  

The guide clearly outlines scenarios where servicing is transferred to a third-party servicer, either concurrently with the sale to Freddie Mac or subsequently.  This highlights that direct servicing by the originating lender is not a mandatory requirement.  Furthermore, Freddie Mac’s *Seller/Servicer Guide* details the processes, responsibilities, and requirements related to transfers of servicing.

**Fannie Mae's Perspective:**

Similar to Freddie Mac, Fannie Mae also *does not* require direct servicing by the originating lender.  The *Selling Guide* outlines the roles of Sellers and Servicers, allowing for the possibility of transferring servicing rights to another Fannie Mae-approved servicer.  The guide describes the requirements for transfers of servicing in detail, emphasizing that  the lender is not required to continue servicing the loan after its sale.

**Comparison:**

Both Freddie Mac and Fannie Mae operate under the same fundamental principle regarding servicing:  they permit lenders to either service the loans themselves or transfer servicing rights to a third-party servicer.  Their respective guides provide comprehensive details on the procedures and requirements for both direct servicing and servicing transfers, indicating that neither GSE mandates direct servicing as a requirement for selling mortgages.  The specific requirements and processes outlined in each GSE's guide might have minor variations.


### Check And Remove The Context Cache(s)

Check the remaining time for each cache.  This time can be extended as needed with `.update()`.  In this case the caches are deleted to eliminate any further costs now that this workflow is complete.

In [158]:
def time_left(cache):
    expire = cache.expire_time
    now = datetime.datetime.now(tz=expire.tzinfo) 
    print(f"Expiration Time: {expire.strftime('%B %d, %Y at %I:%M:%S %p')}")
    print(f"   Current Time: {now.strftime('%B %d, %Y at %I:%M:%S %p')}")
    diff = (expire - now).total_seconds()
    sign, diff = (1, abs(diff)) if diff >= 0 else (-1, abs(diff))
    minutes = int(diff // 60)
    seconds = int(diff % 60)
    if minutes > 60:
        hours = int(minutes // 60)
        minutes = minutes - hours*60
    else: hours = 0
    if sign == 1:
        print(f"{hours:02d}:{minutes:02d}:{seconds:02d} until expiration")
    else:
        print(f"Expired {hours:02d}:{minutes:02d}:{seconds:02d} ago")
    return

In [159]:
time_left(freddie_cache)

Expiration Time: October 12, 2024 at 05:01:54 PM
   Current Time: October 12, 2024 at 05:12:31 PM
Expired 00:10:37 ago


In [160]:
time_left(fannie_cache)

Expiration Time: October 12, 2024 at 05:04:18 PM
   Current Time: October 12, 2024 at 05:12:35 PM
Expired 00:08:16 ago


In [161]:
time_left(combined_cache)

Expiration Time: October 12, 2024 at 05:06:10 PM
   Current Time: October 12, 2024 at 05:12:36 PM
Expired 00:06:25 ago


In [162]:
if len(freddie_cache.list()) > 0:
    freddie_cache.refresh
    freddie_cache.delete()

In [163]:
if len(fannie_cache.list()) > 0:
    fannie_cache.refresh
    fannie_cache.delete()

In [164]:
if len(combined_cache.list()) > 0:
    combiend_cache.refresh
    combined_cache.delete()