![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FApplied+GenAI&file=Grounding+Overview+-+Vertex+AI+Search.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Grounding%20Overview%20-%20Vertex%20AI%20Search.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FApplied%2520GenAI%2FGrounding%2520Overview%2520-%2520Vertex%2520AI%2520Search.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Applied%20GenAI/Grounding%20Overview%20-%20Vertex%20AI%20Search.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Applied%20GenAI/Grounding%20Overview%20-%20Vertex%20AI%20Search.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---
This is part a [series of notebook based workflows](./readme.md) for Applided GenAI using Vertex AI.
Specifically, these are related to grounding methods for LLMs:

||Notebook Workflow|Description|
|---|---|---|
||[Grounding Overview](./Grounding%20Overview.ipynb)|Overview of grounding methods with comparison and evaluation|
|**This Notebook**|[Grounding Overview - Vertex AI Search](./Grounding%20Overview%20-%20Vertex%20AI%20Search.ipynb)|Setting up and using Vertex AI Search.|
||[Grounding Overview - RAG With BigQuery](./Grounding%20Overview%20-%20RAG%20With%20BigQuery.ipynb)|A Complete workflow (process, parse, embed, index, retrieve, generate) with BigQuery Vector Search.|
||[Grounding Overview - RAG With Vertex AI Feature Store](./Grounding%20Overview%20-%20RAG%20With%20Vertex%20AI%20Feature%20Store.ipynb)|A complete workflow (process, parse, embed, index, retrieve, generate) with Vertex AI Feature Store as an online retrieval system over BigQuery data.|
||Grounding Overview - RAG With Vertex AI Vector Search|A complete workflow (process, parse, embed, index, retrieve, generate) with Vertex AI Vector Search as an online retrieval system.|
||Grounding Overview - RAG With LlamaIndex ON Vertex AI|A complete retrieval workflow (process, parse, embed, index, retrieve, generate) with LlamaIndex on Vertex AI as a retrieval system.|

---

# Grounding Overview - Vertex AI Search

This workflow uses [Vertex AI Agent Builder](https://cloud.google.com/generative-ai-app-builder/docs/introduction) To build a search experience with [Vertex AI Search](https://cloud.google.com/generative-ai-app-builder/docs/enterprise-search-introduction).  

Why Vertex AI Search? Building a search application, or a retrieval augmented generation (RAG) application for context and grounding requires steps like:
- process, annotate, and break content into chunks
- generate embeddings of chunks
- store, index, and retrieve chunks
- rank, rerank retrieval based on query/question/prompt
- generation of answer with grounding
- verifying groundeness and identifying contradiction

Vertex AI Search does all of this, and much more, a service within the solution and provides APIs and client for interaction as use in this workflow.

The alternative of building a search or retrieval application can be done with many other Vertex AI and GCP offerings but there are also easy to use APIs to make this fast and simple.  Read more about these offerings [here](https://cloud.google.com/generative-ai-app-builder/docs/builder-apis#build-rag).

This workflow will walk through:
- Preparing unstructured data for input
    - This workflow shows how to make a metadata file so that any number of files can be store in any number of GCS locations (buckets and folders).
- How to create and work with a **data store**
    - retrieve/create data stores
    - list documents in data stores
    - immport documents
- How to create a **search app**
    - retrieve/create a search app
- Get **Search** results
    - results = related documents
    - snippets = brief extract that previews documents relationship to the search
    - extrative answers = short verbatim text extract meant to answer the search
    - extrative segments = longer verbatim text extract meant to answer the search or be used as context for LLMs.
    - summaries = automatic LLM summaries of the search results
- Get **Answers**
    - control the query and answer phase - multiple examples
    - create a session and get answers with follow-up


>**NOTE:**
>If Vertex AI Agent Builder has not been previously used then part of the setup may need to be completed in the console prior to running this notebook which primarily uses the Python SDK. See [Before you begin](https://cloud.google.com/generative-ai-app-builder/docs/before-you-begin).


**Extensions Of This Work**

Vertex AI Search has clients with many ways of retrieving search results, including generated answers, that are grounded on information in datastores.  The search apps are also directly usable by other API's:
- Generative AI On Vertex AI [Grounding Gemini Responses on your data](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#private-ground-gemini)

To see this and many more grounding approaches side-by-side check out the accompanying workflow in this repository that uses the data store created here:
- [Grounding Overview](./Grounding%20Overview.ipynb)



---
## Colab Setup

When running this notebook in [Colab](https://colab.google/) or [Colab Enterprise](https://cloud.google.com/colab/docs/introduction), this section will authenticate to GCP (follow prompts in the popup) and set the current project for the session.

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs and API Enablement

The clients packages may need installing in this environment. 

### Installs (If Needed)

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.discoveryengine', 'google-cloud-discoveryengine')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable discoveryengine.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
SERIES = 'applied-genai'
EXPERIMENT = 'grounding-overview'

# make this the gcs bucket for storing files
GCS_BUCKET = PROJECT_ID 

# vertex search location
VS_LOCATION = 'global'

Packages

In [8]:
import requests
import base64
import json
import hashlib

from google.cloud import storage

import google.cloud.discoveryengine_v1 as discoveryengine
import google.cloud.discoveryengine_v1alpha as discoveryengine_alpha

Clients

In [9]:
# vertex ai agent builder
API_ENDPOINT = dict(api_endpoint = (f'{VS_LOCATION}-' if VS_LOCATION != 'global' else '') + 'discoveryengine.googleapis.com')
datastore_client = discoveryengine.DataStoreServiceClient(client_options = API_ENDPOINT)
document_client = discoveryengine.DocumentServiceClient(client_options = API_ENDPOINT)
engine_client = discoveryengine.EngineServiceClient(client_options = API_ENDPOINT)
search_client = discoveryengine.SearchServiceClient(client_options = API_ENDPOINT)

# gcs client: assumes bucket already exists
gcs = storage.Client(project = PROJECT_ID)
bucket = gcs.bucket(GCS_BUCKET)

---
## Prompt And Context

The [official rules of baseball](https://img.mlbstatic.com/mlb-images/image/upload/mlb/wqn5ah4c3qtivwx3jatm.pdf), a pdf that is updated annually with the latest changes to the game and published by MLB.


In [10]:
prompt = "what are the dimensions of first base in baseball?"

In [11]:
url = 'https://img.mlbstatic.com/mlb-images/image/upload/mlb/wqn5ah4c3qtivwx3jatm.pdf'
# get the pdf
context_bytes = requests.get(url).content
context_base64 = base64.b64encode(context_bytes).decode('utf-8')

---
## Store Document(s) In GCS

In [12]:
# store pdf in gcs
file_blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/{url.split('/')[-1]}")
file_blob.upload_from_string(context_bytes, content_type = 'application/pdf')

---
## Prepare Document(s) For The Vertex Agent Builder Data Store

There are multiple ways to [prepare data for ingesting](https://cloud.google.com/generative-ai-app-builder/docs/prepare-data) depending on location, volume, how often it will change, and type (website, unstructured, strutred, media, third-party (Slack, ServiceNow, ...), Healthcare FHIR, ....).

Here, the data will be prepared as **Unstructured data in GCS storage**.  Files an be imported as:
- **single file** at GCS URI
- **multiple files** in a GCS 'folder'.  Note that import is nnot recursive so subfolder will not be imported.  For this case or even multiple buckets see the next option:
- **Any number of files** at a single or multiple folders and buckets can be imported with a **metadata** file.  A JSON lines file with one line per file that include the document id and uri as well as optional metadata.


In [13]:
for blob in bucket.list_blobs(prefix = f"{SERIES}/{EXPERIMENT}/{url.split('/')[-1]}"):
    print(blob.name)

applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf


### Document ID's

The name of a file could make a great id.  But what happens if multiple file names exist but in different folders?  To make this managable across files, folders and buckets, adopting a hash of the full file path could be a great practice.  This function is designed to convert the full file path into a hash of length 63.

In [14]:
def generate_id(name):
    hasher =hashlib.sha256()
    hasher.update(name.encode('utf-8'))
    return hasher.hexdigest()[0:63]

In [15]:
generate_id(bucket.name+'/'+blob.name)

'b48b4704ced27a1e9bdee5b1d90479abdb488b2df9fa0b93d554ea250039e18'

### Create Metadata

At a minimum the document `id` and the `content` needs to be provided but optional metadata can also provided as shown here with `structData`:

In [16]:
metadata = []
file_types = ['pdf', 'docx', 'txt', 'html', 'pptx']
content_type = dict(
  pdf = 'application/pdf',
  txt = 'text/plain',
  html = 'text/html',
  docx = 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
  pptx = 'application/vnd.openxmlformats-officedocument.presentationml.presentation'
)
for blob in bucket.list_blobs(prefix = f"{SERIES}/{EXPERIMENT}/{url.split('/')[-1]}"):
    folder = blob.name.split('/')[0]
    filename = blob.name.split('/')[-1]
    filetype = blob.name.split('.')[-1].lower()
    filepath = '/'.join(blob.name.split('/')[0:-1])
    if filetype in file_types:
        json_data = dict(
            id = generate_id(blob.name),
            structData = dict(
                title= filename,
                path = filepath,
                location = bucket.name
            ),
            content = dict(
                mimeType = content_type[filetype],
                uri = f'gs://{blob.bucket.name}/{blob.name}'
            )
        )
        metadata.append(json.dumps(json_data))   


In [17]:
metadata

['{"id": "73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67", "structData": {"title": "wqn5ah4c3qtivwx3jatm.pdf", "path": "applied-genai/grounding-overview", "location": "statmike-mlops-349915"}, "content": {"mimeType": "application/pdf", "uri": "gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf"}}']

### Write Metadata File To GCS

In [18]:
metadata_blob = bucket.blob(f'{SERIES}/{EXPERIMENT}/{SERIES}-{EXPERIMENT}.json')
with metadata_blob.open('w') as f:
    for m in metadata:
        f.write(m + '\n')

In [19]:
f"gs://{bucket.name}/{blob.name}"

'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf'

---
## Create A Search Data Store

[Creating a search data store](https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es) depends on the type of data, in this case an [import from cloud storage](https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#cloud-storage) using a metadata file in JSON lines format. 

- [Discoveryengine Python Data Store Client](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.data_store_service.DataStoreServiceClient)

In [20]:
VS_DATASTORE_ID = f"{SERIES}-{EXPERIMENT}"

### Check For Existing Data Store (and retrieve)

In [21]:
try:
    datastore = datastore_client.get_data_store(
        name = datastore_client.collection_path(
            project = PROJECT_ID,
            location = VS_LOCATION,
            collection = 'default_collection'
        ) + f'/dataStores/{VS_DATASTORE_ID}'
    )
    ds_exist = True
except Exception as err:
    ds_exist = False
    
ds_exist

True

### Create New Data Store (if needed)

In [22]:
if not ds_exist:
    ds_create = datastore_client.create_data_store(
        parent = datastore_client.collection_path(
            project = PROJECT_ID,
            location = VS_LOCATION,
            collection = 'default_collection'
        ),
        data_store = discoveryengine.DataStore(
            display_name = f"{SERIES}-{EXPERIMENT}",
            industry_vertical = discoveryengine.IndustryVertical.GENERIC,
            solution_types = [discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
            content_config = discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
            #document_processing_config = discoveryengine.DocumentProcessingConfig(
            #    #chunking_config = ,
            #    default_parsing_config = discoveryengine.DocumentProcessingConfig.ParsingConfig.DigitalParsingConfig,
            #    parsing_config_overrides = [
            #        {'pdf' :discoveryengine.DocumentProcessingConfig.ParsingConfig.LayoutParsingConfig}
            #    ]
            #)
        ),
        data_store_id = VS_DATASTORE_ID
    )
    response = ds_create.result()
    print(ds_create.operation.name)

### Get The Data Store ID

In [23]:
datastore = datastore_client.get_data_store(
    name = datastore_client.collection_path(
        project = PROJECT_ID,
        location = VS_LOCATION,
        collection = 'default_collection'
    ) + f'/dataStores/{VS_DATASTORE_ID}'
)
datastore

name: "projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview"
display_name: "applied-genai-grounding-overview"
industry_vertical: GENERIC
solution_types: SOLUTION_TYPE_SEARCH
default_schema_id: "default_schema"
content_config: CONTENT_REQUIRED
create_time {
  seconds: 1724189228
  nanos: 173562000
}

In [24]:
datastore.name

'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview'

In [25]:
datastore_id = datastore_client.parse_data_store_path(datastore.name)
datastore_id

{'project': '1026793852137',
 'location': 'global/collections/default_collection',
 'data_store': 'applied-genai-grounding-overview'}

### List Documents If Prior Data Store

In [26]:
if ds_exist:
    doc_exist = False
    for doc in document_client.list_documents(
        parent = document_client.branch_path(
            project = PROJECT_ID,
            location = VS_LOCATION,
            data_store = datastore_id['data_store'],
            branch = 'default_branch'
        )
    ):
        print(doc.content.uri)
        if doc.content.uri == f"gs://{bucket.name}/{file_blob.name}":
            doc_exist = True
            break

doc_exist

gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf


True

### Import Documents (if missing)

- [Discoveryengine Python Document Service Client](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.document_service.DocumentServiceClient)

In [27]:
if not doc_exist:
    doc_import = document_client.import_documents(
        request = discoveryengine.ImportDocumentsRequest(
            parent = document_client.branch_path(
                project = PROJECT_ID,
                location = VS_LOCATION,
                data_store = datastore_id['data_store'],
                branch = 'default_branch'
            ),
            gcs_source = discoveryengine.GcsSource(
                input_uris = [f"gs://{bucket.name}/{metadata_blob.name}"],
                data_schema = 'document'
            ),
            reconciliation_mode = discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL
        )
    )
    response = doc_import.result()
    operation_metadata = discoveryengine.ImportDocumentsMetadata(doc_import.metadata)
    print(operation_metadata)

### Console View Of Data Store

In [28]:
print(f"Review the data store in the console:\n\nhttps://console.cloud.google.com/gen-app-builder/locations/{VS_LOCATION}/collections/{'default_collection'}/data-stores/{datastore_id['data_store']}/data/documents?project=statmike-mlops-349915")

Review the data store in the console:

https://console.cloud.google.com/gen-app-builder/locations/global/collections/default_collection/data-stores/applied-genai-grounding-overview/data/documents?project=statmike-mlops-349915


<p align="center">
    <img src="./resources/images/screenshots/grounding/vs_datastore.png" width="75%">
<p>

---
## Create A Search App

The [Relationship between apps and data stores](https://cloud.google.com/generative-ai-app-builder/docs/create-datastore-ingest#app-store-relationship) allows one or more apps to search and retrieve from or more (blended search) data stores.

Here we [creatre an app](https://cloud.google.com/generative-ai-app-builder/docs/create-engine-es) that uses the data store created above.

- [Discoveryengine Python Engine Service Client](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.engine_service.EngineServiceClient)

In [29]:
VS_APP_ID = f"{SERIES}-{EXPERIMENT}"

### Check For Existing Search App

In [30]:
try:
    app = engine_client.get_engine(
        name = engine_client.engine_path(
            project = PROJECT_ID,
            location = VS_LOCATION,
            collection = 'default_collection',
            engine = VS_APP_ID
        )
    )
    app_exist = True
except Exception as err:
    app_exist = False
    
app_exist

True

### Create New Search App (if needed)

In [31]:
if not app_exist:
    app_create = engine_client.create_engine(
        parent = engine_client.collection_path(
            project = PROJECT_ID,
            location = VS_LOCATION,
            collection = 'default_collection'
        ),
        engine = discoveryengine.Engine(
            display_name = VS_APP_ID,
            industry_vertical = discoveryengine.IndustryVertical.GENERIC,
            solution_type = discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH,
            search_engine_config = discoveryengine.Engine.SearchEngineConfig(
                search_tier = discoveryengine.SearchTier.SEARCH_TIER_ENTERPRISE,
                search_add_ons = [discoveryengine.SearchAddOn.SEARCH_ADD_ON_LLM],
            ),
            data_store_ids = [datastore_id['data_store']],
        ),
        engine_id = VS_APP_ID
    )
    response = app_create.result()
    operation_metadata = discoveryengine.CreateEngineMetadata(app_create.metadata)
    print(operation_metadata)
    print(app_create.operation.name)

### Console View Of Search App

Opens to the preview tab for easy testing:

In [32]:
print(f"Review the search app in the console:\n\nhttps://console.cloud.google.com/gen-app-builder/locations/{VS_LOCATION}/engines/{datastore_id['data_store']}/preview/search?project=statmike-mlops-349915")

Review the search app in the console:

https://console.cloud.google.com/gen-app-builder/locations/global/engines/applied-genai-grounding-overview/preview/search?project=statmike-mlops-349915


<p align="center">
    <img src="./resources/images/screenshots/grounding/vs_app_preview.png" width="75%">
<p>

---
## Search: Multiple Methods

The search app can be used to retrieve layers of increasing detailed information, including LLM generated summaries that answer the input question.  This section shows how to use use the SDK to do each type of search.

- [Discoveryengine Python Search Service Client](https://cloud.google.com/python/docs/reference/discoveryengine/latest/google.cloud.discoveryengine_v1.services.search_service)

In [33]:
serving_config = search_client.serving_config_path(
    project = PROJECT_ID,
    location = VS_LOCATION,
    data_store = VS_DATASTORE_ID,
    serving_config = 'default_config'
)

---
### Get Search Results 

[Get search results](https://cloud.google.com/generative-ai-app-builder/docs/preview-search-results) using the API.  This approach returns a list of documents that match the search/query, and nothing more.

In [34]:
search_response = search_client.search(
    request = discoveryengine.SearchRequest(
        query = prompt, # pass the user search/question
        page_size = 10, # max documents to return
        serving_config = serving_config,
        # search behavior:
        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
            snippet_spec = discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
                return_snippet = False
            ),
        ),
        # how queries and spelling are handled:
        query_expansion_spec = discoveryengine.SearchRequest.QueryExpansionSpec(
            condition = discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO
        ),
        spell_correction_spec = discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode = discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        )
    )
)

In [35]:
#search_response

Gather results:

In [36]:
search_results = []
for result in search_response.results:
    document = dict(result.document.derived_struct_data) |  dict(result.document.struct_data)
    search_results.append(document)

In [37]:
#search_results

Format and Print results:

In [38]:
for r, result in enumerate(search_results):
    print(f"\nSource Document {r+1}:", end = "")
    print(f"\n\tName: {result['title']}", end = "")
    print(f"\n\tLink: {result['link']}", end = "")


Source Document 1:
	Name: wqn5ah4c3qtivwx3jatm.pdf
	Link: gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf

---
### Get Search Results - Including Snippets

In addition to search results you can also request **snippets**, brief extracts from the matched documents that serve as a preview of the content. Read more here: [Get Snippets And Extractive Segments](https://cloud.google.com/generative-ai-app-builder/docs/snippets).

In [40]:
search_response = search_client.search(
    request = discoveryengine.SearchRequest(
        query = prompt, # pass the user search/question
        page_size = 10, # max documents to return
        serving_config = serving_config,
        # search behavior:
        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
            snippet_spec = discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
                return_snippet = True
            ),
        ),
        # how queries and spelling are handled:
        query_expansion_spec = discoveryengine.SearchRequest.QueryExpansionSpec(
            condition = discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO
        ),
        spell_correction_spec = discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode = discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        )
    )
)

In [41]:
#search_response

Gather results:

In [42]:
search_results = []
for result in search_response.results:
    document = dict(result.document.derived_struct_data) |  dict(result.document.struct_data)
    if 'snippets' in document.keys():
        document['snippets'] = [dict(snippet) for snippet in document['snippets'] if snippet['snippet_status'] == 'SUCCESS']
    search_results.append(document)

In [43]:
#search_results

Format and Print results:

In [44]:
for r, result in enumerate(search_results):
    print(f"\nSource Document {r+1}:", end = "")
    print(f"\n\tName: {result['title']}", end = "")
    print(f"\n\tLink: {result['link']}", end = "")
    if 'snippets' in result.keys():
        for s, snippet in enumerate(result['snippets']):
            print(f"\n\tSnippet {s+1}: {snippet['snippet']}", end = "")


Source Document 1:
	Name: wqn5ah4c3qtivwx3jatm.pdf
	Link: gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf
	Snippet 1: All <b>measurements</b> from home <b>base</b> shall be taken from the point where the <b>first</b> and third <b>base</b> lines intersect. The catcher&#39;s box, the <b>batters</b>&#39; boxes, the coaches&nbsp;...

---
### Get Search Results - Including Extrative Answers

In addition to search results you can also request **Extractive Answers**, a short verbatim text extracted from the document for use as a brief answer. Read more here: [Get Snippets And Extractive Segments](https://cloud.google.com/generative-ai-app-builder/docs/snippets).

In [45]:
search_response = search_client.search(
    request = discoveryengine.SearchRequest(
        query = prompt, # pass the user search/question
        page_size = 10, # max documents to return
        serving_config = serving_config,
        # search behavior:
        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
            snippet_spec = discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
                return_snippet = False
            ),
            extractive_content_spec = discoveryengine.SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
                max_extractive_answer_count = 2
            )
        ),
        # how queries and spelling are handled:
        query_expansion_spec = discoveryengine.SearchRequest.QueryExpansionSpec(
            condition = discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO
        ),
        spell_correction_spec = discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode = discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        )
    )
)

In [46]:
#search_response

Gather results:

In [47]:
search_results = []
for result in search_response.results:
    document = dict(result.document.derived_struct_data) |  dict(result.document.struct_data)
    if 'extractive_answers' in document.keys():
        document['extrative_answers'] = [dict(answer) for answer in document['extractive_answers']]
    search_results.append(document)

In [48]:
#search_results

Format and Print results:

In [49]:
for r, result in enumerate(search_results):
    print(f"\nSource Document {r+1}:", end = "")
    print(f"\n\tName: {result['title']}", end = "")
    print(f"\n\tLink: {result['link']}", end = "")
    if 'extractive_answers' in result.keys():
        for a, answer in enumerate(result['extrative_answers']):
            print(f"\n\tAnswer {a+1} (page = {answer['pageNumber']}): {answer['content']}", end = "")


Source Document 1:
	Name: wqn5ah4c3qtivwx3jatm.pdf
	Link: gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf
	Answer 1 (page = 14): See Appendix 1. When location of home base is determined, with a steel tape measure 127 feet, 33 ⁄8 inches in desired direction to establish second base. From home base, measure 90 feet toward first base; from second base, measure 90 feet toward first base; the intersection of these lines establishes first base.
	Answer 2 (page = 15): 2.02 Home Base Home base shall be marked by a five-sided slab of whitened rubber. It shall be a <b>17-inch square with two of the corners removed so that one edge is 17 inches long, two adjacent sides are 8½ inches and the remaining two sides are 12 inches</b> and set at an angle to make a point.

---
### Get Search Results - Including Extrative Segments

In addition to search results you can also request **Extractive Segments**, a longer verbatim text (than extrative answers) extracted from the document for use as a brief answer or post-processing like context for an LLM. Read more here: [Get Snippets And Extractive Segments](https://cloud.google.com/generative-ai-app-builder/docs/snippets).

In [50]:
search_response = search_client.search(
    request = discoveryengine.SearchRequest(
        query = prompt, # pass the user search/question
        page_size = 10, # max documents to return
        serving_config = serving_config,
        # search behavior:
        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
            snippet_spec = discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
                return_snippet = False
            ),
            extractive_content_spec = discoveryengine.SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
                max_extractive_segment_count = 2,
                return_extractive_segment_score = True,
                #num_previous_segments = 1,
                #num_next_segments = 1
            )
        ),
        # how queries and spelling are handled:
        query_expansion_spec = discoveryengine.SearchRequest.QueryExpansionSpec(
            condition = discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO
        ),
        spell_correction_spec = discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode = discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        )
    )
)

In [51]:
#search_response

Gather results:

In [52]:
search_results = []
for result in search_response.results:
    document = dict(result.document.derived_struct_data) |  dict(result.document.struct_data)
    if 'extractive_segments' in document.keys():
        document['extractive_segments'] = [dict(segment) for segment in document['extractive_segments']]
    search_results.append(document)

In [53]:
#search_results

Format and Print results:

In [54]:
for r, result in enumerate(search_results):
    print(f"\nSource Document {r+1}:", end = "")
    print(f"\n\tName: {result['title']}", end = "")
    print(f"\n\tLink: {result['link']}", end = "")
    if 'extractive_segments' in result.keys():
        for s, segment in enumerate(result['extractive_segments']):
            len_content = len(segment['content'])
            content = segment['content'][0:min(50, len_content)]
            if len_content > 50:
                content += f' ... ({len_content - 50} more characters)'
            content = content.replace("\n", " ")
            print(f"\n\tSegment {s+1} (page = {segment['pageNumber']}, relevance score = {segment['relevanceScore']:.3f}): {content}", end = "")


Source Document 1:
	Name: wqn5ah4c3qtivwx3jatm.pdf
	Link: gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf
	Segment 1 (page = 14, relevance score = 0.831): 2  Rule 2.01  2.00–THE PLAYING FIELD  2.01 Layout  ... (2120 more characters)
	Segment 2 (page = 15, relevance score = 0.814): 3  Rule 2.01 to 2.02  The foul lines and all other ... (1994 more characters)

---
### Get Search Results - Including Summaries

Going a step farther than search results is a search summary.  This request that parts of the results be interpreted by an LLM and summarized as an answer.

- [Get Search Summaries](https://cloud.google.com/generative-ai-app-builder/docs/get-search-summaries)
- [Available Model Versions and Lifecycle](https://cloud.google.com/generative-ai-app-builder/docs/answer-generation-models)

In [55]:
search_response = search_client.search(
    request = discoveryengine.SearchRequest(
        query = prompt, # pass the user search/question
        page_size = 10, # max documents to return
        serving_config = serving_config,
        # search behavior:
        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
            snippet_spec = discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
                return_snippet = False
            ),
            summary_spec = discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec(
                summary_result_count = 5, # how many of page_size to use
                include_citations = True,
                ignore_adversarial_query = True,
                ignore_non_summary_seeking_query = False,
                model_spec = discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec.ModelSpec(
                    version = 'stable' # see link above for all options, including specific models/versions
                )
            ),            
        ),
        # how queries and spelling are handled:
        query_expansion_spec = discoveryengine.SearchRequest.QueryExpansionSpec(
            condition = discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO
        ),
        spell_correction_spec = discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode = discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        )
    )
)

In [56]:
#search_response

In [57]:
search_response.summary.summary_text

'First base in baseball is located 90 feet from home plate and 90 feet from second base. The intersection of these lines establishes first base. Home base is marked by a five-sided slab of whitened rubber. The slab is a 17-inch square with two corners removed. The remaining sides are 8.5 inches and 12 inches. \n'

---
### Get Search Results - Including combinations of snipppets, answers, segments, and summaries

While the section above introduced the individual search results components, this section shows that mutliple types or even all types can be combined in a single request.

In [58]:
search_response = search_client.search(
    request = discoveryengine.SearchRequest(
        query = prompt, # pass the user search/question
        page_size = 10, # max documents to return
        serving_config = serving_config,
        # search behavior:
        content_search_spec = discoveryengine.SearchRequest.ContentSearchSpec(
            snippet_spec = discoveryengine.SearchRequest.ContentSearchSpec.SnippetSpec(
                return_snippet = True
            ),
            extractive_content_spec = discoveryengine.SearchRequest.ContentSearchSpec.ExtractiveContentSpec(
                max_extractive_answer_count = 2,
                max_extractive_segment_count = 2,
                return_extractive_segment_score = True,
                #num_previous_segments = 1,
                #num_next_segments = 1
            ),
            summary_spec = discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec(
                summary_result_count = 5, # how many of page_size to use
                include_citations = True,
                ignore_adversarial_query = True,
                ignore_non_summary_seeking_query = False,
                model_spec = discoveryengine.SearchRequest.ContentSearchSpec.SummarySpec.ModelSpec(
                    version = 'stable' # see link above for all options, including specific models/versions
                )
            ), 
        ),
        # how queries and spelling are handled:
        query_expansion_spec = discoveryengine.SearchRequest.QueryExpansionSpec(
            condition = discoveryengine.SearchRequest.QueryExpansionSpec.Condition.AUTO
        ),
        spell_correction_spec = discoveryengine.SearchRequest.SpellCorrectionSpec(
            mode = discoveryengine.SearchRequest.SpellCorrectionSpec.Mode.AUTO
        )
    )
)

In [59]:
#search_response

Gather results:

In [60]:
search_results = []
for result in search_response.results:
    document = dict(result.document.derived_struct_data) |  dict(result.document.struct_data)
    if 'snippets' in document.keys():
        document['snippets'] = [dict(snippet) for snippet in document['snippets'] if snippet['snippet_status'] == 'SUCCESS']
    if 'extractive_answers' in document.keys():
        document['extrative_answers'] = [dict(answer) for answer in document['extractive_answers']]
    if 'extractive_segments' in document.keys():
        document['extractive_segments'] = [dict(segment) for segment in document['extractive_segments']]
    search_results.append(document)

In [61]:
#search_results

Format and Print results:

In [62]:
for r, result in enumerate(search_results):
    print(f"\nSource Document {r+1}:", end = "")
    print(f"\n\tName: {result['title']}", end = "")
    print(f"\n\tLink: {result['link']}", end = "")
    if 'snippets' in result.keys():
        for s, snippet in enumerate(result['snippets']):
            print(f"\n\tSnippet {s+1}: {snippet['snippet']}", end = "")
    if 'extractive_answers' in result.keys():
        for a, answer in enumerate(result['extrative_answers']):
            print(f"\n\tAnswer {a+1} (page = {answer['pageNumber']}): {answer['content']}", end = "")
    if 'extractive_segments' in result.keys():
        for s, segment in enumerate(result['extractive_segments']):
            len_content = len(segment['content'])
            content = segment['content'][0:min(50, len_content)]
            if len_content > 50:
                content += f' ... ({len_content - 50} more characters)'
            content = content.replace("\n", " ")
            print(f"\n\tSegment {s+1} (page = {segment['pageNumber']}, relevance score = {segment['relevanceScore']:.3f}): {content}", end = "")
            
print(f"\n\nSummary:\n\t{search_response.summary.summary_text}")


Source Document 1:
	Name: wqn5ah4c3qtivwx3jatm.pdf
	Link: gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf
	Snippet 1: All <b>measurements</b> from home <b>base</b> shall be taken from the point where the <b>first</b> and third <b>base</b> lines intersect. The catcher&#39;s box, the <b>batters</b>&#39; boxes, the coaches&nbsp;...
	Answer 1 (page = 14): See Appendix 1. When location of home base is determined, with a steel tape measure 127 feet, 33 ⁄8 inches in desired direction to establish second base. From home base, measure 90 feet toward first base; from second base, measure 90 feet toward first base; the intersection of these lines establishes first base.
	Answer 2 (page = 15): 2.02 Home Base Home base shall be marked by a five-sided slab of whitened rubber. It shall be a <b>17-inch square with two of the corners removed so that one edge is 17 inches long, two adjacent sides are 8½ inches and the remaining two sides are 12 inches</b> and set at

---
## Answers: Get Answers With Follow-Ups

Another search method is `answer` and this section covers the `answer method` using the [Answer API](https://cloud.google.com/generative-ai-app-builder/docs/reference/rest/v1/projects.locations.dataStores.servingConfigs/answer) directly via REST.

The answer method gives control over the query phase, the answer phase, and ability to configure follow-up questions.  For a breakdown of the methods and information on which scenarios to use, or not use, a method check out [Get answers and follow-ups](https://cloud.google.com/generative-ai-app-builder/docs/answer).

In [82]:
prompt

'what are the dimensions of first base in baseball?'

### REST Call Function

A simple function to make REST calls for the answer api.  Takes an input parameter of data (a json string for the request to the API), and a datastore id (created previously in this workflow.

In [44]:
def answer_api(data, datastore = VS_DATASTORE_ID):
    token = !gcloud auth application-default print-access-token
    headers = {
      "content-type": "application/json",
      "Authorization": f'Bearer {token[0]}'
    }
    response = requests.post(
      f'https://discoveryengine.googleapis.com/v1/projects/{PROJECT_ID}/locations/{VS_LOCATION}/collections/default_collection/dataStores/{datastore}/servingConfigs/default_search:answer',
      data = data,
      headers = headers
    )
    if response.status_code == 200:
        answer = json.loads(response.text)
    else:
        answer = dict(answer = dict(answerText = 'Error'))
    return answer

---
### Get Answers: Basic

Make a request that returns an answer and search results (with links): [Search and answer (basic)](https://cloud.google.com/generative-ai-app-builder/docs/answer#search-answer-basic)

Prepare data:

In [148]:
request = dict(
    query = dict(text = prompt)
)
data = json.dumps(request)

Make Request:

In [149]:
answer = answer_api(data)

Review Answer:

In [150]:
answer['answer']['answerText']

'First base in baseball is marked by a white canvas or rubber-covered bag that is 18 inches square. The bag must be between 3 and 5 inches thick and filled with soft material. The distance from home plate to first base is 90 feet. To determine the location of first base, measure 90 feet from home plate toward first base and 90 feet from second base toward first base. The intersection of these lines establishes the location of first base. \n'

In [151]:
answer['answer']

{'state': 'SUCCEEDED',
 'answerText': 'First base in baseball is marked by a white canvas or rubber-covered bag that is 18 inches square. The bag must be between 3 and 5 inches thick and filled with soft material. The distance from home plate to first base is 90 feet. To determine the location of first base, measure 90 feet from home plate toward first base and 90 feet from second base toward first base. The intersection of these lines establishes the location of first base. \n',
 'steps': [{'state': 'SUCCEEDED',
   'description': 'Rephrase the query and search.',
   'actions': [{'searchAction': {'query': 'What are the dimensions of first base in baseball?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/w

---
### Get Answers: Query Phase Commands - Disable Query Rephrasing

Make a request that returns an answer and search results (with links): [Search and answer (rephrasing disabled)](https://cloud.google.com/generative-ai-app-builder/docs/answer#search-answer-no-rephrase)

Prepare data:

In [152]:
request = dict(
    query = dict(text = prompt),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            disable = True
        )
    )
)
data = json.dumps(request)

Make Request:

In [153]:
answer = answer_api(data)

Review Answer:

In [154]:
answer['answer']['answerText']

'First base in baseball is marked by a white canvas or rubber-covered bag. The bag is 18 inches square and between 3 and 5 inches thick. It is filled with soft material. The first and third base bags must be entirely within the infield. The second base bag is centered on second base. \n'

In [155]:
answer['answer']

{'state': 'SUCCEEDED',
 'answerText': 'First base in baseball is marked by a white canvas or rubber-covered bag. The bag is 18 inches square and between 3 and 5 inches thick. It is filled with soft material. The first and third base bags must be entirely within the infield. The second base bag is centered on second base. \n',
 'steps': [{'state': 'SUCCEEDED',
   'description': 'Rephrase the query and search.',
   'actions': [{'searchAction': {'query': 'what are the dimensions of first base in baseball?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> from home <b>base</b> sha

---
### Get Answers: Query Phase Commands - Specifying Query Rephrasing Steps

Make a request that returns an answer and search results (with links): [Search and answer (specify maximum steps)](https://cloud.google.com/generative-ai-app-builder/docs/answer#search-answer-max-steps)

Prepare data:

In [156]:
request = dict(
    query = dict(text = prompt),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        )
    )
)
data = json.dumps(request)

Make Request:

In [157]:
answer = answer_api(data)

Review Answer:

In [158]:
answer['answer']['answerText']

'First base in baseball is marked by a white canvas or rubber-covered bag. The bag is 18 inches square and between 3 and 5 inches thick. It is filled with soft material. First base is located 90 feet from home plate and 90 feet from second base. The distance between first base and third base is 127 feet, 3 3/8 inches. \n'

In [159]:
answer['answer']

{'answerText': 'First base in baseball is marked by a white canvas or rubber-covered bag. The bag is 18 inches square and between 3 and 5 inches thick. It is filled with soft material. First base is located 90 feet from home plate and 90 feet from second base. The distance between first base and third base is 127 feet, 3 3/8 inches. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of a baseball base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> from home <b>base</b> shall be taken from the point where the first and third <b>base</b> lines i

#### Direct Example Showcase

Reword the prompt to make decrease its quality, then see how the API rewords the question

In [160]:
question = "How big is first base?"

In [161]:
request = dict(
    query = dict(text = question),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        )
    )
)
data = json.dumps(request)
answer = answer_api(data)
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [175]:
for step in answer['answer']['steps']:
    print(step['actions'][0]['searchAction'])

{'query': 'What are the dimensions of first base?'}


Notice the rewording of the `question`

---
### Get Answers: Query Phase Commands - Search and answer with query classification

Make a request that returns an answer and search results (with links): [Search and answer with query classification](https://cloud.google.com/generative-ai-app-builder/docs/answer#search-query-classify)

Prepare data:

In [101]:
request = dict(
    query = dict(text = question),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    )
)
data = json.dumps(request)

Make Request:

In [102]:
answer = answer_api(data)

Review Answer:

In [103]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [104]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> f

#### Direct Example Showcase

Reword the prompt to make decrease its quality, then see how the API rewords the question

In [177]:
question = "Baseball if fun."

In [180]:
request = dict(
    query = dict(text = question),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    )
)
data = json.dumps(request)
answer = answer_api(data)
answer['answer']['answerText']

'A summary could not be generated for your search query. Here are some search results.'

In [181]:
for step in answer['answer']['steps']:
    print(step['actions'][0]['searchAction'])

{'query': 'Why is baseball fun?'}


In [183]:
answer

{'answer': {'answerText': 'A summary could not be generated for your search query. Here are some search results.',
  'steps': [{'actions': [{'searchAction': {'query': 'Why is baseball fun?'},
      'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
         'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
         'title': 'wqn5ah4c3qtivwx3jatm.pdf',
         'snippetInfo': [{'snippet': '... <b>baseball</b>. Do not allow criticism to keep you from studying out bad situations. Carry your rule book. It is better to consult the rules and hold up the game&nbsp;...',
           'snippetStatus': 'SUCCESS'}],
         'structData': {'location': 'statmike-mlops-349915',
          'path': 'applied-genai/grounding-overview',
          'title': 'wqn5ah4c3q

---
### Get Answers: Search phase commands - Search and answer with search result options

Make a request that returns an answer and search results (with links): [Search phase commands: Search and answer with search result options](https://cloud.google.com/generative-ai-app-builder/docs/answer#search-options)

Prepare data:

In [108]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    )
)
data = json.dumps(request)

Make Request:

In [109]:
answer = answer_api(data)

Review Answer:

In [110]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [111]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> f

---
### Get Answers: Answer phase commands - Ignore adversarial queries and non-answer-seeking queries

Make a request that returns an answer and search results (with links): [Ignore adversarial queries and non-answer-seeking queries](https://cloud.google.com/generative-ai-app-builder/docs/answer#ignore-queries)

Prepare data:

In [112]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True
    )
)
data = json.dumps(request)

Make Request:

In [113]:
answer = answer_api(data)

Review Answer:

In [114]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [115]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> f

#### Direct Example Showcase

Reword the prompt to make decrease its quality, then see how the API rewords the question

In [196]:
question = "Baseball is a horrible sport"

In [197]:
request = dict(
    query = dict(text = question),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True
    )
)
data = json.dumps(request)
answer = answer_api(data)
answer['answer']['answerText']

"I'm sorry, but I cannot fulfill your request to write a response about why baseball is a horrible sport. The provided sources focus on the rules and regulations of baseball, not on opinions about the sport's quality.  These sources provide information on umpire duties, equipment, and specific rules regarding pitching, but they do not offer any arguments for or against the sport's enjoyment.  Therefore, I cannot generate a response based on the provided sources that supports your claim that baseball is a horrible sport. \n"

In [198]:
for step in answer['answer']['steps']:
    print(step['actions'][0]['searchAction'])

{'query': 'Why is baseball a horrible sport?'}


In [199]:
answer

{'answer': {'answerText': "I'm sorry, but I cannot fulfill your request to write a response about why baseball is a horrible sport. The provided sources focus on the rules and regulations of baseball, not on opinions about the sport's quality.  These sources provide information on umpire duties, equipment, and specific rules regarding pitching, but they do not offer any arguments for or against the sport's enjoyment.  Therefore, I cannot generate a response based on the provided sources that supports your claim that baseball is a horrible sport. \n",
  'steps': [{'actions': [{'searchAction': {'query': 'Why is baseball a horrible sport?'},
      'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
         'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm

---
### Get Answers: Answer phase commands - Show only relevant answers

Make a request that returns an answer and search results (with links): [Show only relevant answers](https://cloud.google.com/generative-ai-app-builder/docs/answer#ignore-irrelevant-answers)

Prepare data:

In [116]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True,
        ignoreLowRelevantContent = True
    )
)
data = json.dumps(request)

Make Request:

In [117]:
answer = answer_api(data)

Review Answer:

In [118]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines. One line is measured 90 feet from home base toward first base, and the other line is measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The infield is a 90-foot square. \n'

In [119]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines. One line is measured 90 feet from home base toward first base, and the other line is measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The infield is a 90-foot square. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> from home <b>base</b> shall be taken from the point

---
### Get Answers: Answer phase commands - Specify the answer mode

Make a request that returns an answer and search results (with links): [Specify the answer mode](https://cloud.google.com/generative-ai-app-builder/docs/answer#answer-model)
- [Available Models](https://cloud.google.com/generative-ai-app-builder/docs/answer-generation-models#models)

Prepare data:

In [120]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True,
        modelSpec = dict(
            modelVersion = 'gemini-1.5-flash-001/answer_gen/v1'
        )
    )
)
data = json.dumps(request)

Make Request:

In [121]:
answer = answer_api(data)

Review Answer:

In [122]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [123]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> f

---
### Get Answers: Answer phase commands - Specify a custom preamble

Make a request that returns an answer and search results (with links): [Specify a custom preamble](https://cloud.google.com/generative-ai-app-builder/docs/answer#preamble)

Prepare data:

In [124]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True,
        promptSpec = dict(
            preamble = "Give a direct factual answer like an umpire for MLB would."
        )
    )
)
data = json.dumps(request)

Make Request:

In [125]:
answer = answer_api(data)

Review Answer:

In [126]:
answer['answer']['answerText']

'First base is marked by a white canvas or rubber-covered bag. The bag is 18 inches square. It is securely attached to the ground. The bag must be entirely within the infield. The distance from home base to first base is 90 feet. \n'

In [127]:
answer['answer']

{'answerText': 'First base is marked by a white canvas or rubber-covered bag. The bag is 18 inches square. It is securely attached to the ground. The bag must be entirely within the infield. The distance from home base to first base is 90 feet. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> from home <b>base</b> shall be taken from the point where the <b>first</b> and third <b>base</b> lines intersect. The catcher&#39;s box, the batters&#39; boxes, the coaches&nbsp;...',
        

---
### Get Answers: Answer phase commands - Include citations

Make a request that returns an answer and search results (with links): [Include citations](https://cloud.google.com/generative-ai-app-builder/docs/answer#citations)

Prepare data:

In [128]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True,
        includeCitations = True
    )
)
data = json.dumps(request)

Make Request:

In [129]:
answer = answer_api(data)

Review Answer:

In [130]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [131]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n',
 'citations': [{'endIndex': '76', 'sources': [{'referenceId': '0'}]},
  {'startIndex': '77', 'endIndex': '245', 'sources': [{'referenceId': '1'}]},
  {'startIndex': '246', 'endIndex': '321', 'sources': [{'referenceId': '1'}]},
  {'startIndex': '322',
   'endIndex': '421',
   'sources': [{'referenceId': '0'}, {'referenceId': '7'}]}],
 'references': [{'chunkInfo': {'content': '4 Rule 2.03 to 2.05 2.03 The Bases\nFirst, second and third bases shall be marked by white canvas or\nrubber-covered bags, securely attached to the ground as indicated in\nDiagram 2

---
### Get Answers: Answer phase commands - Set the answer language code

Make a request that returns an answer and search results (with links): [Set the answer language code](https://cloud.google.com/generative-ai-app-builder/docs/answer#language-code)
- Feature requires request - see link for form

Prepare data:

In [139]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    answerGenerationSpec = dict(
        ignoreAdversarialQuery = True,
        ignoreNonAnswerSeekingQuery = True,
        answerLanguageCode = 'i-klingon'
    )
)
data = json.dumps(request)

Make Request:

In [137]:
answer = answer_api(data)

Review Answer:

In [138]:
answer['answer']['answerText']

'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n'

In [135]:
answer['answer']

{'answerText': 'First base is a white canvas or rubber-covered bag that is 18 inches square. It is located at the intersection of two lines: one measured 90 feet from home base toward first base and the other measured 90 feet from second base toward first base. The distance between first base and third base is 127 feet, 33 ⁄8 inches. The first and third base bags must be entirely within the infield. The infield is a 90-foot square. \n',
 'steps': [{'actions': [{'searchAction': {'query': 'What are the dimensions of first base?'},
     'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/branches/0/documents/73bee30d37112e8e829720fe2f4a3ff0b450c2a51cd9fe03b5beeea277eed67',
        'uri': 'gs://statmike-mlops-349915/applied-genai/grounding-overview/wqn5ah4c3qtivwx3jatm.pdf',
        'title': 'wqn5ah4c3qtivwx3jatm.pdf',
        'snippetInfo': [{'snippet': 'All <b>measurements</b> f

---
### Get Answers & Follow-Ups

Ask follow-up questions that require prior questions as context.  [Commands for follow-up questions](https://cloud.google.com/generative-ai-app-builder/docs/answer#commands_for_follow-up_questions)


Generate A Session ID:

In [40]:
token = !gcloud auth application-default print-access-token
headers = {
  "content-type": "application/json",
  "Authorization": f'Bearer {token[0]}'
}
data = json.dumps(dict(userPseudoId = 'abc123'))
response = requests.post(
  f'https://discoveryengine.googleapis.com/v1/projects/{PROJECT_ID}/locations/{VS_LOCATION}/collections/default_collection/dataStores/{VS_DATASTORE_ID}/sessions',
  data = data,
  headers = headers
)
session_info = json.loads(response.text)
session_info

{'name': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/sessions/2875604999015314083',
 'state': 'IN_PROGRESS',
 'userPseudoId': 'abc123',
 'startTime': '2024-08-22T14:44:32.643023Z',
 'endTime': '2024-08-22T14:44:32.643023Z'}

Prepare data:

In [48]:
question = "What is a pitch?"

In [49]:
request = dict(
    query = dict(text = question),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    session = session_info['name']
)
data = json.dumps(request)

Make Request:

In [50]:
answer = answer_api(data)

Review Answer:

In [51]:
answer['answer']['answerText']

"A pitch is a ball delivered to the batter by the pitcher. All other deliveries of the ball by one player to another are thrown balls. A pitch is considered legal when the umpire calls it a strike. A strike is a legal pitch when the batter swings and misses, the ball passes through the strike zone without being swung at, the batter fouls the ball with less than two strikes, the batter bunts the ball foul, the ball touches the batter as they swing, the ball touches the batter in flight in the strike zone, or the ball becomes a foul tip. The strike zone is the area over home plate that is between the top of the batter's shoulders and the bottom of their kneecap. \n"

Follow-up Question (same session):

In [52]:
question2 = "What happens after three strikes?"

In [54]:
request = dict(
    query = dict(text = question2),
    searchSpec = dict(
        searchParams = dict(
            maxReturnResults = 10,

        )
    ),
    queryUnderstandingSpec = dict(
        queryRephraserSpec = dict(
            maxRephraseSteps = 5 # 5 is max, default is 1
        ),
        queryClassificationSpec = dict(
            types = ['NON_ANSWER_SEEKING_QUERY', 'ADVERSARIAL_QUERY']
        )
    ),
    session = session_info['name']
)
data = json.dumps(request)

Make Request:

In [55]:
answer = answer_api(data)

Review Answer:

In [56]:
answer['answer']['answerText']

"After three strikes are called, the batter is declared out. This means the batter is removed from the game and the next batter in the batting order takes their place. If the batter does not take their proper position in the batter's box before three strikes are called, they are also declared out. The umpire will give the batter a reasonable opportunity to take their proper position after a strike is called. If the batter is out on strikes, no runners may advance. \n"

Review the full history in the same response:

In [57]:
answer

{'answer': {'name': 'projects/1026793852137/locations/global/collections/default_collection/dataStores/applied-genai-grounding-overview/sessions/2875604999015314083/answers/13075977912363165876',
  'answerText': "After three strikes are called, the batter is declared out. This means the batter is removed from the game and the next batter in the batting order takes their place. If the batter does not take their proper position in the batter's box before three strikes are called, they are also declared out. The umpire will give the batter a reasonable opportunity to take their proper position after a strike is called. If the batter is out on strikes, no runners may advance. \n",
  'steps': [{'thought': 'I need to know what a strike is in baseball, next I need to query what happens after three strikes.',
    'actions': [{'searchAction': {'query': 'What is a strike in baseball?'},
      'observation': {'searchResults': [{'document': 'projects/1026793852137/locations/global/collections/defa

Delete The session:
- there are also commands to update, list, and list for user/state

In [59]:
token = !gcloud auth application-default print-access-token
headers = {
  "content-type": "application/json",
  "Authorization": f'Bearer {token[0]}'
}
response = requests.delete(
  f"https://discoveryengine.googleapis.com/v1/{session_info['name']}",
  headers = headers
)
json.loads(response.text)

{}

Note: response is expected to be empty

In [61]:
response.status_code

200