### Introduction
This notebook illustrates and automates the Continuous Deployment process for bringing the popular open-source embedding models inference service [Infinity](https://michaelfeil.eu/infinity/latest/) into SAP AI Core. 
Subsequently, with Infinity, you can load popular sentence-transformer embedding models and frameworks e.g. [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) into it, exposing it as a service in SAP AI Core through BYOM(Bring Your Own Model) approach. <br/>

### Prerequisites
Before running this notebook, please assure you have perform the [Prerequisites](../../README.md)<br/><br/>

If the configuration of infinity scenario is created through SAP AI Launchpad instead of running [00-init-config.ipynb](../00-init-config.ipynb), please manually update the configuration_id in [env.json](env.json)
```json
{
    "configuration_id": "<YOUR_CONFIGURATION_ID_OF_INFINITY_SCENARIO>",
    "deployment_id": "<WILL_BE_UPDATED_BY_THIS_NOTEBOOK>"
}
```

### The high-level flow of this Continuous Deployment process:
- Build a custom Infinity docker image adapted for SAP AI Core<br/>
- Push the docker image to docker hub<br/>
- Connect to SAP AI Core via SDK<br/>
- Create a deployment<br/>
- Check the status and logs of the deployment<br/>


#### 1.Build a custom Infinity docker image adapted for SAP AI Core
Please refer to [Dockerfile](Dockerfile) for details.

In [1]:
%%sh
# 0.Login to docker hub
# docker login -u <YOUR_DOCKER_USER> -p <YOUR_DOCKER_ACCESS_TOKEN>

# 1.Build the docker image
docker build --platform=linux/amd64 -t docker.io/seojungsierra/infinity-ko-sroberta-multitask:ai-core .

#0 building with "desktop-linux" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.00kB 0.0s done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
#2 ...

#3 [auth] pytorch/pytorch:pull token for registry-1.docker.io
#3 DONE 0.0s

#2 [internal] load metadata for docker.io/pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime
#2 DONE 2.1s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [1/6] FROM docker.io/pytorch/pytorch:2.3.0-cuda12.1-cudnn8-runtime@sha256:0279f7aa29974bf64e61d0ff6e979b41a249b3662a46e30778dbf80b8c99c361
#5 DONE 0.0s

#6 [internal] load build context
#6 transferring context: 724B 0.0s done
#6 DONE 0.0s

#7 [4/6] RUN python3 -m pip install --upgrade pip==23.2.1 --no-cache-dir &&     python3 -m pip install "infinity-emb[all]==0.0.54" --no-cache-dir &&     rm -rf /root/.cache/pip
#7 CACHED

#8 [5/6] COPY run.sh /usr/src/run.sh
#8 C

#### 2.Push the docker image to docker hub

In [2]:
%%sh
# 2.Push the docker image to docker hub to be used by deployment in SAP AI Core
docker push docker.io/seojungsierra/infinity-ko-sroberta-multitask:ai-core 

The push refers to repository [docker.io/seojungsierra/infinity-ko-sroberta-multitask]
19bcc2e93034: Preparing
b172e6fbf491: Preparing
c734c149e46d: Preparing
da884695c81c: Preparing
5f70bf18a086: Preparing
23fcdddbb6de: Preparing
5f70bf18a086: Preparing
36e21b1812d2: Preparing
2d49b5c9bc32: Preparing
e0a9f5911802: Preparing
36e21b1812d2: Waiting
2d49b5c9bc32: Waiting
e0a9f5911802: Waiting
23fcdddbb6de: Waiting
da884695c81c: Mounted from seojungsierra/infinity-2-cesco-tenant
b172e6fbf491: Mounted from seojungsierra/infinity-2-cesco-tenant
5f70bf18a086: Mounted from seojungsierra/infinity-2-cesco-tenant
19bcc2e93034: Mounted from seojungsierra/infinity-2-cesco-tenant
c734c149e46d: Mounted from seojungsierra/infinity-2-cesco-tenant
23fcdddbb6de: Mounted from seojungsierra/infinity-2-cesco-tenant
2d49b5c9bc32: Mounted from seojungsierra/infinity-2-cesco-tenant
e0a9f5911802: Mounted from seojungsierra/infinity-2-cesco-tenant
36e21b1812d2: Mounted from seojungsierra/infinity-2-cesco-tenant


#### 3.Initiate an SAP AI Core SDK client
- resource_group loaded from [../config.json](../config.json)
- ai_core_sk(service key) loaded from [../config.json](../config.json)

In [16]:
import requests, json, time, datetime
from datetime import datetime
from ai_core_sdk.ai_core_v2_client import AICoreV2Client

In [17]:
# load the configuration from ../config.json 
with open("/Users/sierra/Desktop/CESCO/aicore-infinity-test-2-ko-sroberta-multitask/config.json") as f:
    config = json.load(f)

resource_group = config.get("resource_group", "default")
print( "resource group: ", resource_group)

resource group:  oss-llm


In [18]:
# Initiate an AI Core SDK client with the information of service key
ai_core_sk = config["ai_core_service_key"]
base_url = ai_core_sk.get("serviceurls").get("AI_API_URL") + "/v2/lm"
client = AICoreV2Client(base_url=ai_core_sk.get("serviceurls").get("AI_API_URL")+"/v2",
                        auth_url=ai_core_sk.get("url")+"/oauth/token",
                        client_id=ai_core_sk.get("clientid"),
                        client_secret=ai_core_sk.get("clientsecret"),
                        resource_group=resource_group)

In [19]:
# Prepare the http header which will be used later through request.
token = client.rest_client.get_token()
headers = {
    "Authorization": token,
    "ai-resource-group": resource_group,
    "Content-Type": "application/json",
}

#### 4.Create a deployment for Infinity scenario
To create a deployment in SAP AI Core, it requires the corresponding resource_group and configuration_id
- resource_group loaded from [../config.json](../config.json)
- configuration_id of  loaded from [env.json](env.json)

In [20]:
# resource_group: The target resource group to create the deployment
# configuration_id: The target configuration to create the deployment, which is created in ../00-init-config.ipynb 
with open("./env.json") as f:
    env = json.load(f)

configuration_id = env["configuration_id"]
print("configuration id:", configuration_id)

configuration id: dd70fd0b-5dfd-40d2-a46f-7dc0f73d7c2a


**Helper function**
- get the current UTC time in yyyy-mm-dd hh:mm:ss format, to be used to filter deployments logs

In [21]:
# Helper function to get the current time in UTC, used to filter deployments logs
def get_current_time():  
    current_time = datetime.utcnow()
    # Format current time in the desired format
    formatted_time = current_time.strftime("%Y-%m-%dT%H:%M:%S.%fZ")
    return formatted_time

**Helper function**
- Write back the configuration value back to configuration json file

In [22]:
# Helper function to write back the configuration value back to configuration json file
def update_json_file(file_path, key, value):
    # Load the JSON configuration file
    with open(file_path, 'r') as file:
        config = json.load(file)

    # Update the value
    config[key] = value

    # Write the updated configuration back to the file
    with open(file_path, 'w') as file:
        json.dump(config, file, indent=4)
        print(f"{file_path} updated. {key}: {value}")

**Create a deployment for Infinity in SAP AI Core**
- configuration_id
- resource_group
<br/><br/>
The created deployment id will be written back to [env.json](env.json), which will be used in
- [01-deployment.ipynb](01-deployment.ipynb) and [02-embedding.ipynb](02-embedding.ipynb) to test the inference of open-source embedding with Infinity in SAP AI Core
- [04-cleanup.ipynb](04-cleanup.ipynb) to stop the deployment and clean up the resource.

In [23]:
# Create a Deployment in SAP AI Core
print("Creating deployment.")
response = client.deployment.create(
    configuration_id=configuration_id,
    resource_group=resource_group
)

# last_check_time will be used to check the deployment status continuously afterwards
# set initial last_check_time right after creating deployment
last_check_time = get_current_time()
deployment_start_time = datetime.now()

deployment_id = response.id
status = response.status
update_json_file("env.json", "deployment_id", deployment_id)
print("Deployment Result:\n", response.__dict__)

Creating deployment.


AIAPIInvalidRequestException: Failed to post /deployments: Invalid Request, Runtime Adapter Exception; The Configuration dd70fd0b-5dfd-40d2-a46f-7dc0f73d7c2a you've provided is invalid. Please ensure you supply a valid Configuration.. 
 Status Code: 400, Request ID:ea1cbefb-3de1-4dd9-99bd-2af8005d99d3

#### 5.Check the status and logs of the deployment

In [10]:
print("4.Checking deployment status.")
deployment_url = f"{base_url}/deployments/{deployment_id}"
deployment_log_url = f"{deployment_url}/logs?start="
interval_s = 20

while status != "RUNNING" and status != "DEAD":
    current_time = get_current_time()
    #check deployment status
    response = requests.get(url=deployment_url, headers=headers)
    resp = response.json()
    
    status = resp['status']
    print(f'...... Deployment Status at {current_time}......', flush=False)
    print(f"Deployment status: {status}")

    #retrieve deployment logs
    response_log = requests.get(url=f"{deployment_log_url}{last_check_time}", headers=headers)
    last_check_time = current_time
    print(f"Deployment logs: {response_log.text}")

    # Sleep for 60 secs to avoid overwhelming the API with requests
    time.sleep(interval_s)

deployment_end_time = datetime.now()
duration_in_min = (deployment_end_time - deployment_start_time) / 60

if status == "RUNNING":
    print("Deployment is up and running now!")
else:
    print(f"Deployment {deployment_id} failed!")   

print(f"Deployment duration: {duration_in_min} mins")

4.Checking deployment status.
...... Deployment Status at 2024-09-19T08:18:19.996269Z......
Deployment status: UNKNOWN
Deployment logs: {
  "error": {
    "code": "05011000",
    "message": "DeploymentNotFoundError: Deployment d034a9b1577bc081 not found.",
    "target": "/api/v4alpha/deployments/d034a9b1577bc081/logs"
  }
}

...... Deployment Status at 2024-09-19T08:18:45.037555Z......
Deployment status: UNKNOWN
Deployment logs: {
  "error": {
    "code": "05011000",
    "message": "DeploymentNotFoundError: Deployment d034a9b1577bc081 not found.",
    "target": "/api/v4alpha/deployments/d034a9b1577bc081/logs"
  }
}

...... Deployment Status at 2024-09-19T08:19:12.393026Z......
Deployment status: UNKNOWN
Deployment logs: {
  "error": {
    "code": "05011000",
    "message": "DeploymentNotFoundError: Deployment d034a9b1577bc081 not found.",
    "target": "/api/v4alpha/deployments/d034a9b1577bc081/logs"
  }
}

...... Deployment Status at 2024-09-19T08:19:34.480968Z......
Deployment status

KeyboardInterrupt: 