# Showcase - Google Cloud Enterprise Search - Unstructured Search Engine
_Unstructured Data are stored in Google Cloud Enterprise Search with document metadata from PDFs_


---


* Authors: Nutchanon Leelapornudom (nutchanon@google.com)
* Created: 03/08/2023
* Last Updated: 03/08/2023

---

## Objective

This notebook is an example of using **Google Cloud Enterprise Search API** with **Unstructured Search Engine** of **Generative AI App Builder**.

We will walkthrough many Google Cloud Enterprise Search features that exists on API, which you can use via REST API, RPC API, or Cloud Client/SDK.

With these functionalities, you can use Google Cloud Enterprise Search to integrate with your application to enhance inteligent search on your systems.

**Google Cloud Enterprise Search - Unstructured Search Engine contains the following features**

Search/Query features: (cover in this notebook)
1. Semantic Search
2. Native built-in an LLM Summarization
3. Controling snippet size and number of results
4. Metadata Filtering (Post Filteriing)
5. Spell Correction
6. Query Expansion
7. Metadata Boost and Bury
8. Metadata Dynamic Facets
9. Search Autocomplete

Engine configuration and data management features: (not cover in this notebook)
1. Search Engine Management (Provisioning, Delete)
2. Schema Management (Metadata, Data Type, Field Semantic Meaning)
3. Data Management (Import, List, Delete, Purge)
4. Field Metadata Management (Retrivable, Searchable, Indexable, Completable)

Some features exists on the console and widget only: (not cover in this notebook)
* Metrics Analytics
* User events
* Result Feedback

Other features for architects: (not cover in this notebook)
* Security control
* Serverless operations

In order to run this notebook you must have access to Google Cloud Enterprise Search.

In this example, we use **Google Cloud Databases related topic**, a public information, which we uploaded into Google Cloud Storage.

This notebook is tested on Google Colab.

Users may wish to:
1. Search or ask questions in document or multiple documents
2. Get relevant documents based-on the question
3. Get summary of results in human language
4. Use Enterprise Search as ground data for LLM integration in RAG pattern

---

In this notebook the following examples will be elaborated:

- ✅ Example of searching a unstructured data on Enterprise Search with Python SDK

---

**References:**

- [Google Cloud Enterprise Search Documentation](https://cloud.google.com/generative-ai-app-builder/docs/enterprise-search-introduction)

##Download and install required SDKs
Additional packages that required for running this notebook
* langchain
* google-cloud-discoveryengine

**Note:** Do not forget to restart the kernel, once you've install packages

In [None]:
required_restart = False

# Install langchain
try:
  import langchain
except ImportError:
  ! pip install langchain==0.0.236

# Install Enterprise Search SDK and Vertex PaLM endpoint
try:
    from google.cloud import discoveryengine_v1beta
except ImportError:
    ! pip install google-cloud-discoveryengine
    required_restart = True

# Restart after pacakages installation
print("Do I need to restart the runtime ?: {}".format(required_restart))

Do I need to restart the runtime ?: False


---

#### ⚠️ Do not forget to click the "RESTART RUNTIME" button above.

---

##Authentication to the platform

If running in Colab authenticate with `google.colab.google.auth` otherwise assume that running on Vertex Workbench.

In [None]:
import sys

if 'google.colab' in sys.modules:
  from google.colab import auth as google_auth
  google_auth.authenticate_user()

## Configure Google Cloud project
Configure your Google Cloud project to use Gen App Builder, and Cloud Storage.

You need to have these permissions as below for running this notebook. However, if you want do use some codes here to the production system, you need to do fine-gain access control to reduce security risk.
* Discovery Engine Admin

**Note:** During this notebook launched, your Google Cloud project may need a whitelist to access Gen App Builder platform. Please contact Google Cloud Account Team for asking the access.

In [None]:
GOOGLE_CLOUD_PROJECT = '<google_cloud_project_id>' #@param {"type": "string"}
GOOGLE_CLOUD_REGION = 'global' #@param {"type": "string"}

##Extended GoogleCloudEnterpriseSearchRetriever class from official "langchain.retrievers"
Main Enterprise Search class to wrap search functionality.

In [None]:
import json
from typing import List, Any
from langchain.retrievers import GoogleCloudEnterpriseSearchRetriever
from google.cloud import discoveryengine_v1beta
from google.cloud.discoveryengine_v1beta.types import SearchResponse, Document
from google.protobuf.json_format import MessageToDict
from google.cloud.discoveryengine_v1beta import CompletionServiceClient

class ExtendedGoogleCloudEnterpriseSearchRetriever(GoogleCloudEnterpriseSearchRetriever):
    """Entended of GoogleCloudEnterpriseSearchRetriever of langchain.retrivers."""
    # variables from GoogleCloudEnterpriseSearchRetriever
    _client = None
    _serving_config = None
    _project_id = None
    _search_engine_id = None
    def __init__(self, **data: Any) -> None:
        super().__init__(**data)
        self._client = self._client # get from GoogleCloudEnterpriseSearchRetriever
        self._serving_config = self._serving_config # get from GoogleCloudEnterpriseSearchRetriever
        self._project_id = self.project_id # get from GoogleCloudEnterpriseSearchRetriever
        self._search_engine_id = self.search_engine_id # get from GoogleCloudEnterpriseSearchRetriever
        self._complete_client = CompletionServiceClient()

    # another variables
    _spell_correction: str = None
    _facets: dict = None
    _summary: str = None

    # auto complete
    _complete_client: CompletionServiceClient = None

    # get relevant documents from structured search engine
    def get_relevant_documents(self,
                               query: str,
                               filter: str = None,
                               order_by: str = None,
                               spell_correction: int = 2,
                               query_expansion: int = 1,
                               boost_spec: dict = None,
                               facet_specs: dict = None,
                               content_search_spec: dict = None,) -> List[Document]:
        # search on the engine
        request = discoveryengine_v1beta.SearchRequest(
            serving_config=self._serving_config,
            query=query,
            filter=filter,
            order_by=order_by,
            spell_correction_spec={'mode': spell_correction},
            query_expansion_spec={'condition': query_expansion},
            boost_spec=boost_spec,
            facet_specs=facet_specs,
            content_search_spec=content_search_spec,
        )
        res = self._client.search(request)

        # collect other metadata informations
        self._set_spell_correction(res.corrected_query)

        # set facets
        self._set_facets(res.facets)

        # set summary
        self._set_summary(res.summary)

        # return the results
        return self._unstructured_documents_formatting(res)

    def _unstructured_documents_formatting(self, res: SearchResponse) -> list:
        # formatting the response
        documents = []
        for result in res.results:
            document = {}
            data = MessageToDict(result.document._pb)

            # process common information
            document['name'] = data['name']
            document['id'] = data['id']
            document['link'] = data['derivedStructData']['link']

            if 'extractive_answers' in data['derivedStructData']:
              document['extractive_answers'] = data['derivedStructData']['extractive_answers']

            if 'snippets' in data['derivedStructData']:
              snippets = []
              for snippet in data['derivedStructData']['snippets']:
                snippet_obj = {}
                if snippet['snippet_status'] == 'SUCCESS':
                  snippet_obj['snippet'] = snippet['snippet']
                else:
                  snippet_obj['snippet'] = ""
                snippets.append(snippet_obj)
              document['snippets'] = snippets

            if 'extractive_segments' in data['derivedStructData']:
              document['extractive_segments'] = data['derivedStructData']['extractive_segments']

            if 'structData' in data:
              document['metadata'] = data['structData']

            # combine into output
            documents.append(document)
        return documents

    def pretty_results(self, results: list, top : int = 20, show_metadata: bool = False) -> dict:
        #formatting the json
        top_results = []

        if show_metadata:
          top_results = results[:top]
        else:
          for doc in results[:top]:
            del doc['metadata']
            top_results.append(doc)

        return json.dumps(top_results, indent=2)

    # metdata control for structured search engine
    def _set_spell_correction(self, corrected_query: str) -> None:
        self._spell_correction = corrected_query

    def get_spell_correction(self) -> str:
        return self._spell_correction

    def _set_facets(self, facets: list) -> None:
        facets_dict = []

        for facet in facets:
          facet_dict = {}
          facet_values = []
          facet_dict['key'] = facet.key
          for doc in facet.values:
              facet_values.append(doc.value)
          facet_dict['values'] = facet_values
          facets_dict.append(facet_dict)

        self._facets = facets_dict

    def get_facets(self) -> dict:
        return self._facets

    def get_autocompletes(self, search_complete: str, query_model: str = "document", user_pseudo_id: str = "user-0001"):
        data_store="projects/{}/locations/global/collections/default_collection/dataStores/{}".format(self._project_id, self._search_engine_id)
        request = discoveryengine_v1beta.CompleteQueryRequest(
            data_store=data_store,
            query=search_complete,
            query_model=query_model,
            user_pseudo_id=user_pseudo_id,
        )
        res = self._complete_client.complete_query(request)

        # format the response
        suggestions = []
        for suggestion in res.query_suggestions:
          suggestions.append(suggestion.suggestion)

        return suggestions

    def _set_summary(self, summary: str) -> None:
        self._summary = summary.summary_text

    def get_summary(self) -> str:
        return self._summary

##Gen App Builder - Enterprise Search - Unstructured Engine Configuration
Configure your Enterprise Search Engine here. You need the full "Engine ID" here.

In [None]:
ENTERPRISE_SEARCH_ENGINE = "<enterprise_search_engine_id>" #@param {type: "string"}
retriever = ExtendedGoogleCloudEnterpriseSearchRetriever(
    project_id=GOOGLE_CLOUD_PROJECT, search_engine_id=ENTERPRISE_SEARCH_ENGINE
)

##Test: Running a search on Enterprise Search - Unstructured Engine
Let's try to search and see the result of the Search Engine.

In [None]:
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY)
print(res)

[{'name': 'projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001', 'id': '001', 'link': 'gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf', 'extractive_answers': [{'content': 'Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization.', 'pageNumber': '4'}], 'metadata': {'metadata_com': 'google-cloud', 'metadata_group': 'cloudsql', 'metadata_owner': 'nutchanon'}}, {'name': 'projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010', 'id': '010', 'link': 'gs://nutchanon-genapp-storage-01/databases-pdfs/microservices_on_cloudsql_whitepaper.pdf', 'extractive_answers': [{'pageNumber': '17

##Walkthrough Search features
Search/Query features: (cover in this notebook)
1. Semantic Search
2. Native built-in an LLM Summarization
3. Controling snippet size and number of results
4. Metadata Filtering (Post Filteriing)
5. Spell Correction
6. Query Expansion
7. Metadata Boost and Bury
8. Metadata Dynamic Facets
9. Search Autocomplete

##1. Semantic Search
[Semantic Search](https://en.wikipedia.org/wiki/Semantic_search) is a similarity search based-on the meaning of the words/sentences for query.

In [None]:
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY)
res_json = retriever.pretty_results(res, 5)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "4",
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization."
      }
    ]
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/microservices_on_cloudsql_whitepaper.pdf",
    "extractive_answers": [
      {
        "pageNumber": "17",
        "c

In [None]:
#@title Let try another example
SEARCH_QUERY = "how cloud spanner can help my organization?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY)
res_json = retriever.pretty_results(res)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "17",
        "content": "IDC&#39;s research demonstrates the significant benefits that organizations can achieve through use of fully managed relational databases with Cloud SQL. Study participants reported capturing significant efficiencies in terms of DBA time requirements and direct database-related costs while also enabling business activities through much enhanced database agility, reliability, and performance."
      }
    ]
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/005",
    "id": "005",
    "link": "gs://nutc

##2. Native built-in an LLM Summarization
Google Cloud Enterprise Search Unstructured Search Engine has built-in Generative AI, Large Language Model, as part of response for summarizing output in a human language form.

You do not need to integrate external LLM, if you want to get the answer directly.

In addition, you have options to integrate with external LLM as well.

In [None]:
#@title Get summary result from the search query
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
NUMBER_OF_TOP_RESULTS_FOR_SUMMARY = 5 #@param {type: "string"}
content_search_spec = {'summary_spec': {'summary_result_count': NUMBER_OF_TOP_RESULTS_FOR_SUMMARY}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
summary = retriever.get_summary()
print(summary)

Cloud SQL is a fully managed relational database service that runs on the same infrastructure that Google uses for its end-user products, such as Google Search and YouTube. It offers a simple, scalable, and cost-effective way to host your databases.


In [None]:
#@title Adjust number of top results for doing summarization
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
NUMBER_OF_TOP_RESULTS_FOR_SUMMARY = 1 #@param {type: "string"}
content_search_spec = {'summary_spec': {'summary_result_count': NUMBER_OF_TOP_RESULTS_FOR_SUMMARY}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
summary = retriever.get_summary()
print(summary)

Cloud SQL is a fully managed database service that makes it easy to deploy, manage, and scale MySQL and PostgreSQL databases in the cloud. It provides a simple and cost-effective way to host your databases, and it includes all the features you need to keep your databases running smoothly, including high availability, scalability, and security.


##3. Controling snippet size and number of results
Google Cloud Enterprise Search has features to return the results of the search, which contains document id, url link to document, and also document snippet (a brief extract of text). This will help you understand what the content in the document before opening the source.

You have an option to control the size of snippet as well as number of snippet that you want to see for each document. There are three options that you can configure.
* Snippet - A brief extract of text for providing a preview of a search result's content
* Extractive answers - An extractive answer is verbatim text that is returned with each search result.
* Extractive segments - An extractive segment is usually more verbose than an extractive answer.

In [None]:
#@title Show Snippet for each document
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
content_search_spec = {'snippet_spec': {'max_snippet_count': 1}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
res_json = retriever.pretty_results(res, 3)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "snippets": [
      {
        "snippet": "#US49250622 The Business Value of <b>Cloud SQL</b>: Google Cloud&#39;s Relational Database Service for MySQL, PostgreSQL, and SQL Server Lower Overall Cost of Database&nbsp;..."
      }
    ]
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/microservices_on_cloudsql_whitepaper.pdf",
    "snippets": [
      {
        "snippet": "<b>Cloud SQL</b> (PostgreSQL, MySQL, SQL Server) can have multiple databases per instance. This potentially maximizes the densit

In [None]:
#@title Show Extractive Answers for each document
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
NUMBER_OF_ANSWERS = 1 #@param {type: "string"}
content_search_spec = {'extractive_content_spec': {'max_extractive_answer_count': NUMBER_OF_ANSWERS}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
res_json = retriever.pretty_results(res, 3)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "4",
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization."
      }
    ]
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/microservices_on_cloudsql_whitepaper.pdf",
    "extractive_answers": [
      {
        "pageNumber": "17",
        "c

In [None]:
#@title Adjust Extractive Answers to 3
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
NUMBER_OF_ANSWERS = 3 #@param {type: "string"}
content_search_spec = {'extractive_content_spec': {'max_extractive_answer_count': NUMBER_OF_ANSWERS}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
res_json = retriever.pretty_results(res, 2)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "4",
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization."
      },
      {
        "content": "Table of Contents 12 A Business Value White Paper, sponsored by Google Cloud June 2022 | Doc. #US49250622 The Business Value of Cloud SQL: Google Cloud&#39;s Relational Database Service for MySQL, PostgreSQL, and SQL Server Lower Overall Cost of Database Operations Study participants explained that Cloud SQL also provides direct cost savings

In [None]:
#@title Show Extractive Segment
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
content_search_spec = {'extractive_content_spec': {'max_extractive_segment_count': 1}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
res_json = retriever.pretty_results(res, 2)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_segments": [
      {
        "content": "Lower Overall Cost of Database Operations\nStudy participants explained that Cloud SQL also provides direct cost savings for their databases\nand related infrastructure. They especially appreciated having increased clarity about\ndatabase-related costs, as well as the ability to add databases with less need to consider\ninfrastructure provisioning requirements. One study participant explained: \u201cOne huge benefit of\nCloud SQL is the overall clarity on how much the infrastructure costs. With Google, you have a\nclear idea on that piece of information so you can plan accordingly .... .Another huge business\nimpact is that if we need to e

In [None]:
#@title Combine everything together
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
NUMBER_OF_TOP_RESULTS_FOR_SUMMARY = 5 #@param {type: "string"}
NUMBER_OF_ANSWERS = 1 #@param {type: "string"}
content_search_spec = {'summary_spec':
 {'summary_result_count': NUMBER_OF_TOP_RESULTS_FOR_SUMMARY},
                       'snippet_spec':
 {'max_snippet_count': 1},
                       'extractive_content_spec':
 {'max_extractive_answer_count': NUMBER_OF_ANSWERS,
  'max_extractive_segment_count': 1}}
res = retriever.get_relevant_documents(query=SEARCH_QUERY,content_search_spec=content_search_spec)
res_json = retriever.pretty_results(res, 3)
summary = retriever.get_summary()
print(res_json)
print("Summary: {}".format(summary))

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization.",
        "pageNumber": "4"
      }
    ],
    "snippets": [
      {
        "snippet": "#US49250622 The Business Value of <b>Cloud SQL</b>: Google Cloud&#39;s Relational Database Service for MySQL, PostgreSQL, and SQL Server Situation Overview Database&nbsp;..."
      }
    ],
    "extractive_segments": [
      {
        "content": "Table of Contents\n\n4\n\nA Business Value White Paper, sponsored by Goo

##4. Metadata Filtering (Post Filteriing)
Metadata Filtering (Post Filteriing) - You have an option to filter only specific results based-on your business functions.

Think as a WHERE clause in SQL statement where you can control which rows/documents that you want to get on your application.

You can see the filtering syntax on [Google Cloud Retail Filter](https://cloud.google.com/retail/docs/filter-and-order)

In [None]:
#@title Show metadata for each documents
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY)
res_json = retriever.pretty_results(res, 3, True)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "4",
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization."
      }
    ],
    "metadata": {
      "metadata_group": "cloudsql",
      "metadata_com": "google-cloud",
      "metadata_owner": "nutchanon"
    }
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-

In [None]:
#@title Filter only specific metadata which is "group" on "cloudsql"
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
FILTER_RESULT = "metadata_group: ANY(\"cloudsql\")" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY, filter=FILTER_RESULT)
res_json = retriever.pretty_results(res, 2, True)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "4",
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization."
      }
    ],
    "metadata": {
      "metadata_group": "cloudsql",
      "metadata_owner": "nutchanon",
      "metadata_com": "google-cloud"
    }
  }
]


##5. Spell Correction
Spell Correction - The ability of the search engine to understand what actually you want to search and help to correct typo before query.

(0) MODE_UNSPECIFIED - Go with the default of the engine. This is case, it is AUTO.

(1) SUGGESTION_ONLY - Try to find a spell suggestion, but it will not be used as the search query.

(2) AUTO (default) - Automatic spell correction

In [None]:
SEARCH_QUERY = "what is could sql?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY, spell_correction=2)
res_json = retriever.pretty_results(res, 2)
print(res_json)
print("Search Spell Correction: {}".format(retriever.get_spell_correction()))

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization.",
        "pageNumber": "4"
      }
    ]
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/microservices_on_cloudsql_whitepaper.pdf",
    "extractive_answers": [
      {
        "pageNumber": "17",
        "c

##6. Query Expansion
Query Expansion - Expand search query to be synonyms, related words, etc. for getting more results.

(0) CONDITION_UNSPECIFIED - Go with the default of the engine. This is case, it is DISABLED.

(1) DISABLED (default) - Disabled query expansion. Only the exact search query is used.

(2) AUTO - Automatic query expansion

In [None]:
SEARCH_QUERY = "what is cloud structured query language?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY, query_expansion=1)
res_json = retriever.pretty_results(res, 5)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/009",
    "id": "009",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/hbr_analytic_services_cloud_databases_report.pdf",
    "extractive_answers": [
      {
        "pageNumber": "6",
        "content": "Key Attributes of Modern Cloud Databases As companies look to improve efficiencies, spur innovation, and create compelling customer experiences, they should carefully determine whether their operational databases provide the scaling, availability, security, data life cycle integration, freedom, and flexibility they require."
      }
    ]
  }
]


In [None]:
SEARCH_QUERY = "swhat is cloud structured query language?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY, query_expansion=2)
res_json = retriever.pretty_results(res, 5)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/009",
    "id": "009",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/hbr_analytic_services_cloud_databases_report.pdf",
    "extractive_answers": [
      {
        "content": "Key Attributes of Modern Cloud Databases As companies look to improve efficiencies, spur innovation, and create compelling customer experiences, they should carefully determine whether their operational databases provide the scaling, availability, security, data life cycle integration, freedom, and flexibility they require.",
        "pageNumber": "6"
      }
    ]
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-a

##7. Metadata Boost and Bury
Boost is a way that you can control sorting order of the result set. Rather than using the default, which is sorted by relevance score, you have an option to boost or bury based-on your matching conditions.

"boost" parameter can be used to control by [-1.0,1.0] of floating point, which 1.0 means increasing, where -1.0 mean decreasing.

In [None]:
#@title Without Boosting
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
res = retriever.get_relevant_documents(SEARCH_QUERY)
res_json = retriever.pretty_results(res, 2, True)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization.",
        "pageNumber": "4"
      }
    ],
    "metadata": {
      "metadata_com": "google-cloud",
      "metadata_owner": "nutchanon",
      "metadata_group": "cloudsql"
    }
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-

In [None]:
#@title With Boosting
SEARCH_QUERY = "what is cloud sql?" #@param {type: "string"}
BOOST_CONDITIONS ="metadata_group: ANY(\"cloudsql\")" #@param {type: "string"}
boost_spec = {'condition_boost_specs' : [
    {'condition': BOOST_CONDITIONS, 'boost': -1.0}
]}
res = retriever.get_relevant_documents(SEARCH_QUERY, boost_spec=boost_spec)
res_json = retriever.pretty_results(res, 2, True)
print(res_json)

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/microservices_on_cloudsql_whitepaper.pdf",
    "extractive_answers": [
      {
        "pageNumber": "17",
        "content": "Database - Each microservice is deployed in a separate database within the same instance Microservices in this architecture can share a Cloud SQL instance but are deployed within a single isolated database. Cloud SQL (PostgreSQL, MySQL, SQL Server) can have multiple databases per instance."
      }
    ],
    "metadata": {
      "metadata_com": "google-cloud",
      "metadata_owner": "nutchanon",
      "metadata_group": "databases"
    }
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/002",
    "id": "002

##8. Metadata Dynamic Facets
Facets can be shown next to results in order to search, which can help users filter the results that related.

Dynamic Facets return based-on your search query.

In [None]:
#@title Return dynamic facets on metadata "group"
SEARCH_QUERY = "what is cloud structured query language?" #@param {type: "string"}
facets = [{'facet_key': {'key': 'metadata_group'}, 'limit': 10}]
res = retriever.get_relevant_documents(query=SEARCH_QUERY,facet_specs=facets)
res_json = retriever.pretty_results(res, 2, True)
print(res_json)
print(retriever.get_facets())

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/009",
    "id": "009",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/hbr_analytic_services_cloud_databases_report.pdf",
    "extractive_answers": [
      {
        "content": "Key Attributes of Modern Cloud Databases As companies look to improve efficiencies, spur innovation, and create compelling customer experiences, they should carefully determine whether their operational databases provide the scaling, availability, security, data life cycle integration, freedom, and flexibility they require.",
        "pageNumber": "6"
      }
    ],
    "metadata": {
      "metadata_com": "google-cloud",
      "metadata_owner": "nutchanon",
      "metadata_group": "databases"
    }
  }
]
[{'key': 'metadata_group', 'values': ['databases']}]


In [None]:
#@title Try on another search query, and see how dynamic facets work
SEARCH_QUERY = "what is cloud sq;?" #@param {type: "string"}
facets = [{'facet_key': {'key': 'metadata_group'}, 'limit': 10}]
res = retriever.get_relevant_documents(query=SEARCH_QUERY,facet_specs=facets)
res_json = retriever.pretty_results(res, 2, True)
print(res_json)
print(retriever.get_facets())

[
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/001",
    "id": "001",
    "link": "gs://nutchanon-genapp-storage-01/databases-pdfs/idc-business-value-cloud-sql-analyst-report.pdf",
    "extractive_answers": [
      {
        "content": "Among the many services that Google Cloud offers is Cloud SQL, which can deploy and manage database servers on behalf of subscription customers. This service includes all maintenance activities, as well as database tuning and optimization.",
        "pageNumber": "4"
      }
    ],
    "metadata": {
      "metadata_com": "google-cloud",
      "metadata_group": "cloudsql",
      "metadata_owner": "nutchanon"
    }
  },
  {
    "name": "projects/939158786986/locations/global/collections/default_collection/dataStores/nutchanon-google-databases_1687922379274/branches/0/documents/010",
    "id": "010",
    "link": "gs://nutchanon-genapp-storage-

##9.Search Autocomplete
Enterprise Search has an ability to do autocompletion for the search bar. This will help the user to get better experience for the search.

You have options to configure the autocompletion function to look for difference information
* **document** - Using suggestions generated from user-imported documents
* **search-history (default)** - Using suggestions generated from the past history of search
* **user-event** - Using suggestions generated from user-imported search events
* **document-completable** - Using suggestions taken directly from user-imported document fields marked as completable.

In [None]:
#@title Perform autocomplete for wh
SEARCH_COMPLETE="wh" #@param {type: "string"}
res = retriever.get_autocompletes(search_complete=SEARCH_COMPLETE, query_model='search-history')
print(res)

['what is cloud sql?']


#Congratulations !!

##I hope you've learned a lot from this notebook.

Please stay tune on new features of Enterprise Search

I'm Nutchanon, a Googler, want to say "Thank you" to all of you.