# Deploy the RAG Pipeline on SAP AI Core

SAP AI Core is a service on the SAP Business Technology Platform for managing AI assets in a scalable, standardized, and cloud-agnostic way. It integrates seamlessly with SAP solutions, supports open-source frameworks, and enables full lifecycle management of AI scenarios, including generative AI and prompt management through the Generative AI Hub.

In this notebook, we will walk you through a step-by-step process to deploy the RAG Pipeline on SAP AI Core.

### Prerequisites
- SAP BTP Enterprise Account
- SAP AI Core with "Extended" [Service Plan](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/service-plans)
- SAP AI Launchpad with "Standard" [Service Plan](https://help.sap.com/docs/ai-launchpad/sap-ai-launchpad/service-plans)
- A [personal access token (PAT)](https://docs.docker.com/security/for-developers/access-tokens/) to access the Docker registry
- A [GitHub repository](https://docs.github.com/en/get-started/quickstart/create-a-repo) and [personal access token (PAT)](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) to access the repository

### Table of Contents

1.  [Step 1: Install SAP AI Core SDK](#install_aicore_sdk)

1.  [Step 2: Prepare input parameters, variables, and credentials](#input_params)

1.  [Step 3. Build the RAG Pipeline application container image](#input_params)

1.  [Step 4: Prepare the serving template](#prepare_serving_template)

1.  [Step 5: Connect to SAP AI Core](#connect_to_aicore)

1.  [Step 6: Onboard a GitHub repository to SAP AI Core](#onboard)

1.  [Step 7: Create a Docker registry secret](#create_docker_registry_secret)

1.  [Step 8: Create an Application](#create_application)

1.  [Step 9: Create a Configuration](#create_configuration)

1.  [Step 10: Create a Deployment](#create_deployment)

1.  [Step 11: Test the RAG Pipeline](#test_rag_pipeline)

1.  [Step 12: Stop the deployment](#stop_deployment)

1.  [Summary](#summary)

<a id="install_aicore_sdk"></a>
## Step 1: Install SAP AI Core SDK

Run the code cell below to install SAP AI Core SDK, if it's missing.

In [1]:
try:
    from ai_core_sdk.ai_core_v2_client import AICoreV2Client
    %pip show ai-core-sdk
except:
    %pip install ai-core-sdk

Name: ai-core-sdk
Version: 2.4.12
Summary: SAP AI Core SDK
Home-page: https://www.sap.com/
Author: SAP SE
Author-email: 
License: SAP DEVELOPER LICENSE AGREEMENT
Location: /opt/homebrew/Caskroom/miniconda/base/envs/sap-genai-hub/lib/python3.11/site-packages
Requires: ai-api-client-sdk, click
Required-by: generative-ai-hub-sdk
Note: you may need to restart the kernel to use updated packages.


<a id="input_params"></a>
## Step 2: Prepare input parameters, variables, and credentials

Download the Service Key for your SAP AI Core instance, save it in the same directory as this Jupyter notebook, and name it as `aicore_service_key.json`. The content of the file should look like this:

```json
{
    "clientid": "sb-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx!xxxxxxx|aicore!xxxx",
    "clientsecret": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx$***",
    "url": "https://***.authentication.***.hana.ondemand.com",
    "identityzone": "***",
    "identityzoneid": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "appname": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx!xxxxxxx|aicore!xxxx",
    "serviceurls": {
        "AI_API_URL": "https://api.ai.***.hana.ondemand.com"
    }
}
```

Create a file named `rag_pipeline_params.json` in the same directory as this Jupyter notebook. The content of the file should look like this:

```json
{
    "aicore_auth_url": "https://***.authentication.***.hana.ondemand.com",
    "aicore_client_id": "sb-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx!xxxxxxx|aicore!xxxx",
    "aicore_client_secret": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx$***",
    "aicore_ai_api_url": "https://api.ai.***.hana.ondemand.com",
    "aicore_resource_group": "***",
    "application_name": "***",
    "application_path": "***",
    "scenario_id": "***",
    "executable_id": "***",
    "configuration_name": "***",
    "orc_api_endpoint": "https://api.ai.***.hana.ondemand.com/v2/inference/deployments/***",
    "hana_db_host": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx.***.***.hanacloud.ondemand.com",
    "hana_db_user": "***",
    "hana_db_password": "***",
    "hana_db_table_name": "***"
}
```

Create a file named `github_info.json` in the same directory as this Jupyter notebook. The content of the file should look like this:

```json
{
    "name": "my-github-secret",
    "url": "https://github.com/kevinxhuang/myrepo",
    "username": "kevinxhuang",
    "password": "ghp_***"
}
```

Create a file named `docker_registry_secret.json` in the same directory as this Jupyter notebook. The content of the file should look like this:

```json
{
    "name": "my-dockerhub-secret",
    "data": {
        ".dockerconfigjson": "{\"auths\":{\"https://index.docker.io\":{\"kevinxhuang\":\"\",\"password\":\"dckr_pat_***\"}}}"
    }
}
```

Load the input parameters, variables, and credentials, from the JSON files:

In [2]:
import json

# Load the Service Key for the SAP AI Core instance
aicore_service_key_path = './aicore_service_key.json'
with open(aicore_service_key_path) as ask:
    aicore_service_key = json.load(ask)

# Load the input parameters, variables, and credentials
rag_pipeline_params_path = './rag_pipeline_params.json'
with open(rag_pipeline_params_path) as rpp:
    rag_pipeline_params = json.load(rpp)

# Load the GitHub credential
github_info_path = 'github_info.json'
with open(github_info_path) as ghi:
    github_info = json.load(ghi)

# Load the Docker registry credentials for creating a Docker registry secret
docker_registry_secret_path = 'docker_registry_secret.json'
with open(docker_registry_secret_path) as drs:
    docker_registry_secret = json.load(drs)

<a id="build_image"></a>
## Step 3. Build the RAG Pipeline application container image


The complete source code and step-by-step instructions are described [here](../code/ai-core-rag-pipeline):
- Build a docker container
- Push it to a docker registry

Take a note of the docker image url. You'll need it for preparing the serving template in the next step.

<a id="prepare_serving_template"></a>
## Step 4. Prepare the serving template


Serving templates allow you to manage the deployment of a model/applicataion at the main tenant level. They define how models/applications are deployed, and are used to deploy one or more models on a server that handles inference requests. These templates are stored in your git repository for versioning.

In SAP AI Core, serving templates are mapped as executables, requiring specific metadata attributes. Models are deployed using a Kubernetes Custom Resource Definition (CRD) provided by KServe, with a YAML specification that outlines the necessary input parameters and artifacts.

We have also prepared a Git repository with a serving template for this example. The reference YAML and repository can be found [here](../code/ai-core-serving-template).

**NOTE:**
- You should create a separate Git repository for versioning the serving template.
- You can specify a resource plan for the deployment with `ai.sap.com/resourcePlan`. Otherwise, default `starter` resource plan will apply, which should be enough to run the RAG pipeline. Find out more about the available resource plans in [SAP Help Portal](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/choose-resource-plan-train).
- You can set the auto scaling of the server with the parameters: `minReplicas` and `maxReplicas`.
- You can use a Docker registry secret in AI Core via `imagePullSecrets` to access a container image in a private container registry. If you are using an image from `docker.io`, the Docker registry secret must points to the URL `https://index.docker.io`. The name of the secret should match what's set in the `docker_registry_secret.json` file.
- You must set the name of the container to `kfserving-container` or `kserve-container`.
- Make sure you the values of the following metadata match what's set in the `rag_pipeline_params.json` file:
  - `metadata.name` = `executable_id`
  - `metadata.name.annotations.scenarios.ai.sap.com/id` = `scenario_id`

```yaml
apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
  name: rag-pipeline-20241118
  annotations:
    scenarios.ai.sap.com/description: "RAG Pipeline"
    scenarios.ai.sap.com/name: "rag-pipeline"
    executables.ai.sap.com/description: "RAG Pipeline: retriever, building prompt, generation"
    executables.ai.sap.com/name: "rag-pipeline-executable"
  labels:
    scenarios.ai.sap.com/id: "rag-pipeline"
    ai.sap.com/version: "1.0.0"
spec:
  inputs:
    parameters:
      - name: myResourcePlan
        type: string
        default: Starter
      - name: AICORE_AUTH_URL
        type: string
      - name: AICORE_CLIENT_ID
        type: string
      - name: AICORE_CLIENT_SECRET
        type: string
      - name: AICORE_BASE_URL
        type: string
      - name: AICORE_RESOURCE_GROUP
        type: string
      - name: ORC_API_URL
        type: string
      - name: HANA_DB_HOST
        type: string
      - name: HANA_DB_USER
        type: string
      - name: HANA_DB_PASSWORD
        type: string
      - name: HANA_DB_TABLE_NAME
        type: string
  template:
    apiVersion: "serving.kserve.io/v1beta1"
    metadata:
      annotations: |
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: 1
        autoscaling.knative.dev/targetBurstCapacity: 0
      labels: |
        ai.sap.com/resourcePlan: "{{inputs.parameters.myResourcePlan}}"
    spec: |
      predictor:
        imagePullSecrets:
          - name: my-dockerhub-secret
        minReplicas: 1
        maxReplicas: 5
        containers:
        - name: kserve-container
          image: docker.io/kevinxhuang/rag-pipeline:latest
          command: ["python3", "main.py"]
          ports:
            - containerPort: 3001
              protocol: TCP
          env:
            - name: AICORE_AUTH_URL
              value: "{{inputs.parameters.AICORE_AUTH_URL}}"
            - name: AICORE_CLIENT_ID
              value: "{{inputs.parameters.AICORE_CLIENT_ID}}"
            - name: AICORE_CLIENT_SECRET
              value: "{{inputs.parameters.AICORE_CLIENT_SECRET}}"
            - name: AICORE_BASE_URL
              value: "{{inputs.parameters.AICORE_BASE_URL}}"
            - name: AICORE_RESOURCE_GROUP
              value: "{{inputs.parameters.AICORE_RESOURCE_GROUP}}"
            - name: ORC_API_URL
              value: "{{inputs.parameters.ORC_API_URL}}"
            - name: HANA_DB_HOST
              value: "{{inputs.parameters.HANA_DB_HOST}}"
            - name: HANA_DB_USER
              value: "{{inputs.parameters.HANA_DB_USER}}"
            - name: HANA_DB_PASSWORD
              value: "{{inputs.parameters.HANA_DB_PASSWORD}}"
            - name: HANA_DB_TABLE_NAME
              value: "{{inputs.parameters.HANA_DB_TABLE_NAME}}"
```

<a id="connect_to_aicore"></a>
## Step 5: Connect to SAP AI Core

In [3]:

from ai_core_sdk.ai_core_v2_client import AICoreV2Client

# Create an AI API client instance
ai_core_client = AICoreV2Client(
    base_url = aicore_service_key["serviceurls"]["AI_API_URL"] + "/v2", # The present AI API version is 2
    auth_url=  aicore_service_key["url"] + "/oauth/token",
    client_id = aicore_service_key['clientid'],
    client_secret = aicore_service_key['clientsecret']
)

# Query the number of GitHub repositories using the client instance
response = ai_core_client.repositories.query()
print(response.count)

1


<a id="onboard"></a>
## Step 6: Onboard a GitHub repository to SAP AI Core

You can use your own git repository to version control your SAP AI Core templates. The GitOps onboarding to SAP AI Core instances involves setting up your git repository and synchronizing your content.

On-board a new GitHub repository:

In [4]:
ai_core_client.repositories.create(
    name = github_info["name"],
    url = github_info["url"],
    username = github_info["username"],
    password = github_info["password"]
)

<ai_core_sdk.models.base_models.Message at 0x10db449d0>

Execute the command below to check the on-boarding status:

In [5]:
response = ai_core_client.repositories.query()

for repository in response.resources:
    if repository.name == github_info["name"]:
        print('Name:', repository.name)
        print('URL:', repository.url)
        print('Status:', repository.status)

Name: my-github-secret
URL: https://github.com/kevinxhuang/aicore-rag-pipeline
Status: RepositoryStatus.COMPLETED


<a id="create_docker_registry_secret"></a>
## Step 7: Create a Docker registry secret

The credentials for your Docker registries are managed using secrets. The following code cell sends the credentials through a POST request. The encoded data is then stored as a secret on SAP AI Core.

In [6]:
response = ai_core_client.docker_registry_secrets.create(
    name = docker_registry_secret["name"],
    data = docker_registry_secret["data"]
)

print(response.__dict__)

{'message': 'secret has been created'}


<a id="create_application"></a>
## Step 8: Create an Application

Create an `Application` in SAP AI Core.

In [7]:
response = ai_core_client.applications.create(
    application_name = rag_pipeline_params["application_name"],
    repository_url = github_info["url"],
    path = rag_pipeline_params["application_path"],
    revision = "HEAD"
)

print(response.__dict__)

{'id': 'my-rag-pipeline-app', 'message': 'Application has been successfully created.'}


In [8]:
import time

time.sleep(60)
response = ai_core_client.applications.get_status(application_name=rag_pipeline_params["application_name"])

print(response.__dict__)
print('*'*80)
print(response.sync_ressources_status[0].__dict__)

{'health_status': 'Healthy', 'sync_status': 'Synced', 'message': 'successfully synced (all tasks run)', 'source': <ai_core_sdk.models.application_source.ApplicationSource object at 0x108704f90>, 'sync_finished_at': '2024-11-19T07:02:11Z', 'sync_started_at': '2024-11-19T07:02:11Z', 'reconciled_at': '2024-11-19T07:02:12Z', 'sync_resources_status': [<ai_core_sdk.models.application_resource_sync_status.ApplicationResourceSyncStatus object at 0x10871c510>], 'sync_ressources_status': [<ai_core_sdk.models.application_resource_sync_status.ApplicationResourceSyncStatus object at 0x10871c510>]}
********************************************************************************
{'name': 'rag-pipeline-20241118', 'kind': 'ServingTemplate', 'status': 'Synced', 'message': 'servingtemplate.ai.sap.com/rag-pipeline-20241118 created'}


After your workflows are synced, a `Scenario` will be automatically created in SAP AI Core. The name and ID of the scenario will be same as the one mentioned in your workflows. After The syncing, your workflow will be recognized as an executable.

<a id="create_configuration"></a>
## Step 9: Create a Configuration

Here are the important pieces of your configuration:

- The `scenario_id` should contain the same value as in your executable.
- The `executable_id` is the name key of your executable.

In [9]:
# Replace the artifact_id field value with your own ID, then execute the code.
from ai_core_sdk.models import InputArtifactBinding, ParameterBinding

response = ai_core_client.configuration.create(
    name = rag_pipeline_params["configuration_name"],
    resource_group = "default",
    scenario_id = rag_pipeline_params["scenario_id"],
    executable_id = rag_pipeline_params["executable_id"],
    parameter_bindings = [
        ParameterBinding(key="AICORE_AUTH_URL", value=rag_pipeline_params["aicore_auth_url"]),
        ParameterBinding(key="AICORE_CLIENT_ID", value=rag_pipeline_params["aicore_client_id"]),
        ParameterBinding(key="AICORE_CLIENT_SECRET", value=rag_pipeline_params["aicore_client_secret"]),
        ParameterBinding(key="AICORE_BASE_URL", value=rag_pipeline_params["aicore_ai_api_url"] + "/v2"),
        ParameterBinding(key="AICORE_RESOURCE_GROUP", value=rag_pipeline_params["aicore_resource_group"]),
        ParameterBinding(key="ORC_API_URL", value=rag_pipeline_params["orc_api_endpoint"]),
        ParameterBinding(key="HANA_DB_HOST", value=rag_pipeline_params["hana_db_host"]),
        ParameterBinding(key="HANA_DB_USER", value=rag_pipeline_params["hana_db_user"]),
        ParameterBinding(key="HANA_DB_PASSWORD", value=rag_pipeline_params["hana_db_password"]),
        ParameterBinding(key="HANA_DB_TABLE_NAME", value=rag_pipeline_params["hana_db_table_name"])
    ]
)

print(response.__dict__)
configuration_id=response.__dict__['id']

{'id': 'e2192eee-066f-46c3-9fd9-1b6dc963bf6e', 'message': 'Configuration created'}


<a id="create_deployment"></a>
## Step 10: Create a Deployment

Execute the code below to create a Deployment from the Configuration:

In [10]:
response = ai_core_client.deployment.create(
    resource_group = "default",
    configuration_id = configuration_id
)

print(response.__dict__)
deployment_id=response.__dict__['id']

{'id': 'dead9a76aed1d730', 'message': 'Deployment scheduled.', 'deployment_url': '', 'status': <Status.UNKNOWN: 'UNKNOWN'>, 'ttl': None}


<div class="alert alert-block alert-warning">
<b>Important:</b>

Note the unique ID of your deployment. You may create multiple deployments using the same configuration ID, each of which would have a unique endpoint.

</div>

Check the deployment status:

In [11]:
import time

while str(response.status)!='Status.RUNNING':

    response = ai_core_client.deployment.get(
        resource_group = "default",
        deployment_id = deployment_id
    )

    print("Status: ", response.status)
    print('*'*80)
    time.sleep(30)

print(response.__dict__)
deployment_url=response.__dict__['deployment_url']

Status:  Status.UNKNOWN
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.PENDING
********************************************************************************
Status:  Status.RUNNING
********************************************************************************
{'id': 'dead9a76aed1d730', 'configuration_id': 'e2192ee

<div class="alert alert-block alert-info">

<b>Tip:</b>

This may take a few minutes for the status to change: `UNKNOWN` > `PENDING` > `RUNNING`.

</div>

Now query the deployment logs to view its output:

In [12]:
from ai_core_sdk.models import TargetStatus

response = ai_core_client.deployment.query_logs(
    resource_group = "default",
    deployment_id = deployment_id
)

for log in response.data.result:
    print(log.msg)
    print("---")

INFO:     Will watch for changes in these directories: ['/app/backend']
---
INFO:     Uvicorn running on http://0.0.0.0:3001 (Press CTRL+C to quit)
---
INFO:     Started reloader process [1] using StatReload
---
INFO:     Started server process [8]
---
INFO:     Waiting for application startup.
---
INFO:     Application startup complete.
---


<a id="test_rag_pipeline"></a>
## Step 11: Test the RAG Pipeline

In [13]:
import json
import requests

data = {
    "query": "what is ibm cloud?"
}

endpoint = f"{deployment_url}/v2/generate"
headers = {"Content-Type": "application/json",
           "Authorization": ai_core_client.rest_client.get_token(),
           "AI-Resource-Group": "default"}
response = requests.post(endpoint, headers=headers, data=json.dumps(data))

print(response.text)

{"response":"IBM Cloud is a suite of cloud computing services offered by IBM. It includes Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) offerings, among others. These services are designed to help businesses and organizations build, deploy, and manage applications and services in the cloud. IBM Cloud is built on a secure, scalable, and highly available global network, ensuring consistent performance and reliability. It also offers a wide range of tools and resources to help developers and businesses build, manage, and optimize their cloud applications and services.\n\nI cannot answer that question based on the provided document, as it does not contain information about what IBM Cloud is. However, I can provide you with the information I mentioned earlier."}


<a id="stop_deployment"></a>
## Step 12: Stop the Deployment

A running deployment incurs cost for the allocated cloud resources. There is no charge for a deployment of status `Stopped`. The following code cell can stop the deployment, if you enter `yes` when prompted.

In [14]:
from ai_core_sdk.models import TargetStatus

while True:
    user_input=input("The deployment will be stopped in the next step, and it cannot be started again.\nContinue?(yes/no):")
    if user_input == "yes":
        print('Stopping the deployment...')
        response = ai_core_client.deployment.modify(
            resource_group = "default",
            deployment_id = deployment_id,
            target_status = TargetStatus.STOPPED
        )
        print(response.__dict__)
        break
    else:
        if user_input == "no":
            raise UserWarning('User has interrupted the code excution.')

Stopping the deployment...
{'id': 'dead9a76aed1d730', 'message': 'Deployment modification scheduled'}


<div class="alert alert-block alert-info">

<b>Tip:</b>

You cannot restart a deployment. You must create a new deployment, reusing the configuration. Each deployment will have a different URL.
</div>

<a id="summary"></a>
## Summary

Now we've demonstrated how you can deploy the containerized RAG Pipeline as a REST service on SAP AI Core and securely send queries to its endpoints.