## Vertex AI Search > Data Source Access Control



Refs:

https://cloud.google.com/generative-ai-app-builder/docs/data-source-access-control#acl-storage-unstructured



## Pre-requisites 

* Setup GCP




## Setup
inputs:

In [29]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
print(PROJECT_ID)

demos-vertex-ai


In [30]:
from google.cloud import storage


import pandas as pd
from sklearn import datasets

parameters:

In [31]:
# PROJECT_ID = '' # set above
REGION = 'us-central1'
EXPERIMENT = 'search-alphabet-investor-pdfs'
SERIES = "generative-ai"

In [32]:
BUCKET = SERIES + EXPERIMENT 
BUCKET_URI = f"gs://{BUCKET}"

### Create Storage Bucket

In [33]:
gcs = storage.Client(project = PROJECT_ID)

In [34]:
if not gcs.lookup_bucket(BUCKET):
    bucketDef = gcs.bucket(BUCKET)
    bucket = gcs.create_bucket(bucketDef, project=PROJECT_ID, location=REGION)
    print(bucket)
else:
    print(gcs.lookup_bucket(BUCKET))

<Bucket: generative-aisearch-alphabet-investor-pdfs>


## ingest data into GCS



### PDFs 

TODO - copy from public gcs folder to one we created

gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs

#### Upload

In [36]:
! gsutil -m cp gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/* $BUCKET_URI

Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/20040630_google_10Q.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/20040930_google_10Q.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2004Q3_earnings.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2004_google_annual_report.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2004Q4_earnings_google.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/2005Q3_earnings_google.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/20050331_google_10Q.pdf [Content-Type=application/pdf]...
Copying gs://cloud-samples-data/gen-app-

### metadata 

#### format

following format

https://cloud.google.com/generative-ai-app-builder/docs/data-source-access-control#acl-storage-unstructured

```json
metadata = {
   "id": "",
   "jsonData": "",
   "content": {
     "mimeType": "<application/pdf>",
     "uri": "gs://generative-aisearch-alphabet-investor-pdfs/20040630_google_10Q.pdf"
   },
   "acl_info": {
     "readers": [
       {
         "principals": [
           { "group_id": "group_1" },
           { "user_id": "user_1" }
         ]
       }
     ]
   }
 }
```


#### Create JSON meta data 

Create JSON file of metadata for setting acl rules.

To start, we simply specify ACLs for a single file.

https://cloud.google.com/generative-ai-app-builder/docs/data-source-access-control#acl-storage-unstructured

In [50]:
# TODO - add ACL for all files 
## get list of files from GCS 
## pick 5 files to be "secret"
## add bruce to all except "secret"
## save file
## upload file
## create new datastore and search App

In [45]:
import json

metadata_filename = "metadata.jsonl"

metadata = {
   "id": "",
   "jsonData": "",
   "content": {
     "mimeType": "<application/pdf>",
     "uri": "gs://generative-aisearch-alphabet-investor-pdfs/20040630_google_10Q.pdf"
   },
   "acl_info": {
     "readers": [
       {
         "principals": [
           { "user_id": "bruce@justinjm.altostrat.com"}
         ]
       }
     ]
   }
 }

# Convert the dictionary to a JSON string
json_string = json.dumps(metadata)

# Write to a .jsonl file
with open(metadata_filename, 'w') as file:
    file.write(json_string + '\n')
    # file.write(json_string)

#### upload

upload metadata file just created

In [48]:
! gsutil -m cp $metadata_filename $BUCKET_URI/$metadata_filename

Copying file://metadata.jsonl [Content-Type=application/octet-stream]...
/ [1/1 files][  245.0 B/  245.0 B] 100% Done                                    
Operation completed over 1 objects/245.0 B.                                      


## Create Vertex AI Search Datastore


TODO - ???



### Ingest data from Cloud Storage 

Create datastore via UI

* https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#cloud-storage
* https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#discoveryengine_v1_generated_DocumentService_ImportDocuments_sync-python

TODO - update SCRIPT

In [None]:
# TODO  API:When creating data store, include the flag "aclEnabled": "true" in your JSON payload.
# https://cloud.google.com/generative-ai-app-builder/docs/data-source-access-control#acl-storage-unstructured

from typing import Optional

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"

# Must specify either `gcs_uri` or (`bigquery_dataset` and `bigquery_table`)
# Format: `gs://bucket/directory/object.json` or `gs://bucket/directory/*.json`
# gcs_uri = "YOUR_GCS_PATH"
# bigquery_dataset = "YOUR_BIGQUERY_DATASET"
# bigquery_table = "YOUR_BIGQUERY_TABLE"


# def import_documents_sample(
#     project_id: str,
#     location: str,
#     data_store_id: str,
#     gcs_uri: Optional[str] = None,
#     bigquery_dataset: Optional[str] = None,
#     bigquery_table: Optional[str] = None,
# ) -> str:
#     #  For more information, refer to:
#     # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
#     client_options = (
#         ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
#         if location != "global"
#         else None
#     )

#     # Create a client
#     client = discoveryengine.DocumentServiceClient(client_options=client_options)

#     # The full resource name of the search engine branch.
#     # e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
#     parent = client.branch_path(
#         project=project_id,
#         location=location,
#         data_store=data_store_id,
#         branch="default_branch",
#     )

#     if gcs_uri:
#         request = discoveryengine.ImportDocumentsRequest(
#             parent=parent,
#             gcs_source=discoveryengine.GcsSource(
#                 input_uris=[gcs_uri], data_schema="custom"
#             ),
#             # Options: `FULL`, `INCREMENTAL`
#             reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
#         )
#     else:
#         request = discoveryengine.ImportDocumentsRequest(
#             parent=parent,
#             bigquery_source=discoveryengine.BigQuerySource(
#                 project_id=project_id,
#                 dataset_id=bigquery_dataset,
#                 table_id=bigquery_table,
#                 data_schema="custom",
#             ),
#             # Options: `FULL`, `INCREMENTAL`
#             reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
#         )

#     # Make the request
#     operation = client.import_documents(request=request)

#     print(f"Waiting for operation to complete: {operation.operation.name}")
#     response = operation.result()

#     # Once the operation is complete,
#     # get information from operation metadata
#     metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

#     # Handle the response
#     print(response)
#     print(metadata)

#     return operation.operation.name


## Create Vertex AI Search App 

TODO - console


* https://cloud.google.com/generative-ai-app-builder/docs/create-engine-es

