# Exercise 3.2: Run data pipeline to vectorize documents

Instead of doing all the steps by yourself, as it was shown in the previous exercise, you can also leverage the pipeline API.

The pipline collects documents and segments the data into chunks. It generates embeddings, shich are multidimensional representations of textual information, and stores them efficiently in the vector database.

In this Exercise you will do the following steps:
* Perform initial one time admin tasks 
    1. Create a generic secret 
    2. Upload data to your object store
* Prepare Vector knowlegde Base
    1. Configure Pipeline API to read files from the object store and store it in the vector database. 


## Create a generic secret for Object Store 

Before you can prepare your data for the Pipeline API, you must create a generic secret at the resource group level. Secrets are a means of allowing and controlling connections accross directions and tools, without compromising your credentials.

Overall the grounding module in AI Core supports as of today the following data repositories:
* Microsoft Share Point 
* AWS S3
* SFTP 
* SAP Build Work Zone
* SAP Document Management Service.
For this Hands-on Session we will use the AWS S3 object store as data repository. 

In [1]:

import init_env
import variables

init_env.set_environment_variables()

In [2]:
from ai_core_sdk.ai_core_v2_client import AICoreV2Client
import os, json
import base64

#Get the Service Key and Instantiate AI API Client
##aic_config_file = "/Users/d054176/Downloads/aicore_service_key.json"
#with open(aic_config_file,'rb') as config:
 #   aic_config_file = json.load(config)
  #  config.close()



client = AICoreV2Client (base_url=os.environ["AICORE_BASE_URL"]+'/v2',
                         auth_url= os.environ["AICORE_AUTH_URL"],
                         client_id=os.environ["AICORE_CLIENT_ID"],
                         client_secret=os.environ["AICORE_CLIENT_SECRET"],
                         resource_group=os.environ["AICORE_RESOURCE_GROUP"]
                         )

os_config_file = "/Users/d054176/Downloads/objectstore_skey.json"
with open(os_config_file,'rb') as config:
    os_config_file = json.load(config)
    config.close()

access_key = os_config_file['access_key_id']
secret = os_config_file['secret_access_key']
bucket = os_config_file['bucket']
uri = os_config_file['uri']
user = os_config_file['username']

To create the generic secrets we will send the POST with URL {{apiurl}}/v2/admin/secrets. 

**Note**: 
* Every value in the *data* dictionary needs to be base64-encoded. 
* lables need to contain key-value pair *"ext.ai.sap.com/document-grounding"* and *"ext.ai.sap.com/documentRepositoryType"* with value S3. This is needed to enable grounding and declair S3 as the repository source. 

In [None]:

def b64(val):
     return base64.b64encode(val.encode("utf-8")).decode("utf-8")

def secret_dict():
        return {
            'name': 'aws3-secret-3',
            'data': {
            "url": b64("https://s3-eu-central-1.amazonaws.com"),
            "authentication": b64("NoAuthentication"),
            "description": b64("For Grounding"),
            "access_key_id": b64(access_key),
            "bucket": b64(bucket),
            "host": b64("s3-eu-central-1.amazonaws.com"),
            "region": b64("eu-central-1"), 
            "secret_access_key": b64(secret),
            "username": b64(user)            
            },
            "labels": [
                {
                    "key": "ext.ai.sap.com/document-grounding",
                    "value": "true"
                },
                {
                    "key": "ext.ai.sap.com/documentRepositoryType",
                    "value": "S3"
                }
         ]
        }

body = {
    'name': secret_dict()['name'],
    'data': secret_dict()['data'],
    'labels': secret_dict()['labels']
}

print(client.rest_client.base_url)
import requests

response_dict = requests.post(
        url=f"{client.rest_client.base_url}/admin/secrets", 
        headers={
            "Content-Type": "application/json",
            "AI-Tenant-Scope": "false",
            "Authorization": client.rest_client.get_token(),
            "AI-Resource-Group": "AI167"
        },
        json=body
    )
print(response_dict)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2
<Response [200]>


#### Upload files to your S3 bucket

Via Bruno -- Set up already done in the getting started stuff

#### Create Data Pipeline

In [4]:
from gen_ai_hub.proxy import get_proxy_client
from gen_ai_hub.document_grounding.client import PipelineAPIClient
from gen_ai_hub.document_grounding.models.pipeline import S3PipelineCreateRequest, CommonConfiguration

aicore_client = get_proxy_client();
print(aicore_client.resource_group) 
print(aicore_client.base_url)

pipeline_api_client = PipelineAPIClient(aicore_client)

generic_secret_s3_bucket = "aws3-secret-3"
s3_config = S3PipelineCreateRequest(configuration= CommonConfiguration(destination=generic_secret_s3_bucket))
response = pipeline_api_client.create_pipeline(s3_config)

print(f"Reference the Vector knowledge base using the pipeline ID: {response.pipelineId}")

None
None
Reference the Vector knowledge base using the pipeline ID: 4fa94134-4d5d-4cfc-90ef-047767d2ddf6


In [9]:
# check the status of the vectorization pipeline until it is completed
print(pipeline_api_client.get_pipeline_status(response.pipelineId))

lastStarted='' status='NEW'


In [5]:
print(pipeline_api_client.get_pipeline_status('c3b0d2bf-d6f6-46aa-9924-69546e9d0454'))

lastStarted='2025-09-29T15:22:10.000Z' status='FINISHED'


* Retrieval 
* Grounding Orchestration Layer
* 
*

> ðŸŸ¨ **TODO:**  
> _Add Prompt_