# Exercise 3.3: Run data pipeline to vectorize documents

Instead of doing all the steps by yourself, as it was shown in the previous exercise, you can also leverage the pipeline API.

The pipeline collects documents and segments the data into chunks. It generates embeddings, which are multidimensional representations of textual information, and stores them efficiently in the vector database.

In this Exercise you will do the following steps:
* Perform initial one time admin tasks: Create a generic secret 
* Prepare Vector knowledge Base: Configure Pipeline API to read files from the object store and store it in the vector database. 


## Create a generic secret for Object Store 

We first must create a generic secret at the resource group level. Secrets are a means of allowing and controlling connections across directions and tools, without compromising your credentials.

In [None]:
import init_env
init_env.set_environment_variables()

To create the generic secrets we will send the POST with URL {{apiurl}}/v2/admin/secrets. 

**Note**: 
* Every value in the *data* dictionary needs to be base64-encoded. 
* labels need to contain key-value pair *"ext.ai.sap.com/document-grounding"* and *"ext.ai.sap.com/documentRepositoryType"* with value S3. This is needed to enable grounding and declare S3 as the repository source. 

In [None]:
import base64

def b64(val):
     return base64.b64encode(val.encode("utf-8")).decode("utf-8")


In [None]:

def secret_dict():
        return {
            'name': 'aws3-secret-3',
            'data': {
            "url": b64("https://s3-eu-central-1.amazonaws.com"),
            "authentication": b64("NoAuthentication"),
            "description": b64("For Grounding"),
            "access_key_id": b64(os.environ["ACCESS_KEY_ID"]),
            "bucket": b64(os.environ["BUCKET"]),
            "host": b64("s3-eu-central-1.amazonaws.com"),
            "region": b64("eu-central-1"), 
            "secret_access_key": b64(os.environ["SECRET"]),
            "username": b64(os.environ["USER"])            
            },
            "labels": [
                {
                    "key": "ext.ai.sap.com/document-grounding",
                    "value": "true"
                },
                {
                    "key": "ext.ai.sap.com/documentRepositoryType",
                    "value": "S3"
                }
         ]
        }

body = {
    'name': secret_dict()['name'],
    'data': secret_dict()['data'],
    'labels': secret_dict()['labels']
}


In [None]:
from ai_core_sdk.ai_core_v2_client import AICoreV2Client
import os

client = AICoreV2Client (base_url=os.environ["AICORE_BASE_URL"]+'/v2',
                         auth_url= os.environ["AICORE_AUTH_URL"],
                         client_id=os.environ["AICORE_CLIENT_ID"],
                         client_secret=os.environ["AICORE_CLIENT_SECRET"],
                         resource_group=os.environ["AICORE_RESOURCE_GROUP"]
                         )

In [None]:
import requests

response_dict = requests.post(
        url=f"{client.rest_client.base_url}/admin/secrets", 
        headers={
            "Content-Type": "application/json",
            "AI-Tenant-Scope": "false",
            "Authorization": client.rest_client.get_token(),
            "AI-Resource-Group": "AI167"
        },
        json=body
    )
print(response_dict)

## Create Data Pipeline

In [None]:
from gen_ai_hub.proxy import get_proxy_client
from gen_ai_hub.document_grounding.client import PipelineAPIClient
from gen_ai_hub.document_grounding.models.pipeline import S3PipelineCreateRequest, CommonConfiguration


In [None]:

aicore_client = get_proxy_client();
pipeline_api_client = PipelineAPIClient(aicore_client)

In [None]:

generic_secret_s3_bucket = "aws3-secret-3"
s3_config = S3PipelineCreateRequest(configuration= CommonConfiguration(destination=generic_secret_s3_bucket))


In [None]:

response = pipeline_api_client.create_pipeline(s3_config)
print(f"Reference the Vector knowledge base using the pipeline ID: {response.pipelineId}")

In [None]:
# check the status of the vectorization pipeline until it is completed
print(pipeline_api_client.get_pipeline_status(response.pipelineId))

Once the status switched to ```'FINISHED'``` the vectorization is completed and we can continue with the next steps. Our PDF is vectorized and stored in the HANA Vector Store. 

If you want to see all pipelines you can run ```get_pipelines()``` this will list all the pipelines in your resource group. 

In [None]:
print(pipeline_api_client.get_pipelines())

ðŸŽ‰ Congratulations you successfully created you first data repository via the pipeline API .ðŸŽ‰  
Now let us use this data repository to generate more accurate responses. 
You will use again the Orchestration Services as we did in Exercise 2. 

## Use the data repository to ground the response

### Assign the model you want to use 

In [None]:
from gen_ai_hub.orchestration.models.llm import LLM

llm = LLM(
    name="gemini-2.5-flash",
    parameters={
        'temperature': 0.0,
    }
)

### Create a prompt Template

This time we would like to question answered that are related to SAP TechEd 2025 and the mascot Kasimir.

In [None]:
from gen_ai_hub.orchestration.models.template import Template
from gen_ai_hub.orchestration.models.message import SystemMessage, UserMessage

template = Template(
            messages=[
                SystemMessage("You are a helpful SAP TechEd assistant."),
                UserMessage("""Answer the request by providing relevant answers that fit to the request.
                Request: {{ ?user_query }}
                Context:{{ ?grounding_response }}
                """),
            ]
        )

### Define the data repository
We need again configure the Grounding Module, where we first again the data repository that we want to use via the **filter** parameter. 
Will use our repository that we created via the Pipeline API. 


In [None]:
from gen_ai_hub.orchestration.models.document_grounding import DocumentGroundingFilter
from gen_ai_hub.orchestration.models.document_grounding import DataRepositoryType
filters = [
            DocumentGroundingFilter(    id="KasimirTechEd2025", 
                                        data_repository_type= DataRepositoryType.VECTOR.value,
                                        data_repositories=["50beb7ac-de7d-4127-b74b-3515146cbdea"]) # replace <data_repository_type> by "help.sap.com"
        ]

### Create Grounding Configuration
Next we create the grounding configuration by using **GroundingModule** for managing and applying grounding configurations.

In [None]:

from gen_ai_hub.orchestration.models.document_grounding import GroundingModule
from gen_ai_hub.orchestration.models.document_grounding import GroundingType
from gen_ai_hub.orchestration.models.document_grounding import DocumentGrounding

grounding_config = GroundingModule(
            type=GroundingType.DOCUMENT_GROUNDING_SERVICE.value,
            config=DocumentGrounding(input_params=["user_query"], output_param="grounding_response", filters=filters)
        )

### Create orchestration configuration including Grounding Config

In [None]:
from gen_ai_hub.orchestration.models.config import OrchestrationConfig

config = OrchestrationConfig(
    template=template,
    llm=llm,
    grounding=grounding_config
)

### Execute the  Query
Configuration will be added again to the OrchestrationService and then we run to retrieve the answer.

In [None]:
import importlib
import variables
from gen_ai_hub.orchestration.models.template import TemplateValue
from gen_ai_hub.orchestration.service import OrchestrationService

variables = importlib.reload(variables)

orchestration_service = OrchestrationService(
    api_url=variables.AICORE_ORCHESTRATION_DEPLOYMENT_URL,
    config=config
)

response = orchestration_service.run(
    template_values=[
        TemplateValue( 
            name="user_query",
            value="Does Kasimir like dogs?"
        )
    ]
)

print(response.orchestration_result.choices[0].message.content)