# Initial configuration of SAP AI Core for BYOM-OSS-LLM-AI-CORE
This notebook automates the initial configurations for application BYOM-OSS-LLM-AI-CORE to bring open-sourced llms into SAP AI Core. Alternatively, you can perform the same with SAP AI Launchpad.
- Review and update the configuration in config.json
- Initialize a client of SAP AI Core SDK
- Create a resource group
- Register a docker secret
- Onboarding Git Repository and Create an Application for BYOM-OSS-LLM-AI-CORE
- Synchronize the application and check its status
- Create the configurations for scenarios ollama, local-ai, llama.cpp, vllm and infinity(embedding)

#### 1: Copy [config.template.json](config.template.jso) as [config.json](config.json) 

In [None]:
%%sh
cp config.template.json config.json

#### 2: Review and Update configuration in [config.json](config.json)
Please read the **comments** carefully in [config.json](config.json) and update the necessary configurations.  
- **name**: used as name of git repository and application. 
- **resource_group**: "default" will be used if not specified. It is optional but recommended to create a dedicate resource group, and update it [config.json](config.json). By default, "default" resource group is in place for all the AI Core instances.AI Core with tree tier plan is not able to create a new resource group.
- **ai_core_sk**: update with your own AI Core Service Key
- **docker_secret**: Update the user and password etc in .dockerconfigjson. Please refer to [this document](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/register-your-docker-registry-secret) for more detail.
    - username: Replace <REPLACE_WITH_YOUR_DOCKER_USERNAME> with your docker user name. 
    - password: Replace <REPLACE_WITH_YOUR_DOCKER_ACCESS_TOKEN> with your docker access token.
- **git_repo**: update the git repo configuration with your owns
    - repo_url: url to your forked repository. It should be: https://github.com/<YOUR_GITHUB_ORG_OR_USER>/btp-generative-ai-hub-use-cases
    - user: Update with your github user
    - access_token: Update with your github user access token
- **application**: The SAP AI Core application hosts the scenarios of ollama etc to serving open sourced llms in SAP AI Core
    - path_in_repo: relative path to the serving templates. No change needed.
- **configurations**: Review the configurations of the scenarios. By default, it is configured to load the mistral-7b quantization model with [resource plan infer.s in SAP AI Core](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/choose-resource-plan-c58d4e584a5b40a2992265beb9b6be3c) defined in [../byom-oss-llm-templates](../byom-oss-llm-templates). It is recommend to go ahead first with the default configurations in config.json.
    - **Ollama**: By default, **[Phi3:14b](https://ollama.com/library/phi3)** model is configured. Pull the model dynamically in [ollama/ollama.ipynb](ollama/ollama.ipynb)
    - **LocalAI**: LocalAI allows you to [preload model during startup](https://localai.io/advanced/#preloading-models-during-startup). The initial configuration in config.json will preload model [Mistral-7B-OpenOrca-GGUF](https://github.com/go-skynet/model-gallery/blob/main/mistral.yaml) with local-ai on resource plan 'infer.s' defined in [local-ai-template.yaml](../byom-oss-llm-templates/local-ai-template.yaml). In its model config file, GPU acceleration isn't enabled, hence it is quite slow. To have GPU acceleration for a model, you may set in its model config yaml file. For example [mixtral-Q6.yaml](https://github.com/go-skynet/model-gallery/blob/main/mixtral-Q6.yaml). Please review the [full config model file reference](https://localai.io/advanced/#full-config-model-file-reference)
        ```sh
        f16: true 
        mmap: true 
        gpu_layers: xx 
        ```
        In addition, you can install more models through end point /model/apply in [local-ai/local-ai.ipynb](local-ai/local-ai.ipynb). Please refer to https://localai.io/advanced/#preloading-models-during-startup
    - **llama.cpp**: By default, model [mistral-7b-instruct-v0.2.Q5_K_M.gguf](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF) is configured with alias as mistral. Unlike ollama and local-ai, llama.cpp scenario only supports one model in one configuration. If you need multiple models to be served with llama.cpp, please create multiple configurations through SAP AI Launchpad
    - **vllm**: By default, model [TheBloke/Mistral-7B-Instruct-v0.2-AWQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-AWQ) is configured. Unlike ollama and local-ai, vllm scenario only supports one model in one configuration. If you need multiple model to be served with llama.cpp, please create multiple configurations through SAP AI Launchpad.
    - **transformer**: By default, [Microsoft's Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) is configured.
    - **infinity**: By default, open-source text embedding model [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) is configured. One model in per configuration. If you need multiple models to be served with infinity, please create multiple configurations through SAP AI Launchpad

#### 3: Load the configurations from [config.json](config.json)
The service key of AI Core are located in section ai_core_sk of [config.json](config.json).<br/>
Please update it with your own service key before running this notebook 

In [53]:
import json
from time import sleep

with open("config.json") as f:
    config = json.load(f)

# Initializations
resource_group = config.get("resource_group", "default")
name = config.get("name", "open-source-llms")
print("Configurations loaded from config.json")
print("name: ", name, "resource_group: ", resource_group )

Configurations loaded from config.json
name:  byom-open-source-llms resource_group:  oss-llm


#### 4: Initialize AI Core SDK Client
The service key of AI Core are located in section ai_core_sk of [config.json](config.json).<br/>
Please update it with your own service key before running this notebook 

In [47]:
from ai_core_sdk.ai_core_v2_client import AICoreV2Client

ai_core_sk = config["ai_core_service_key"]
client = AICoreV2Client(base_url=ai_core_sk.get("serviceurls").get("AI_API_URL")+"/v2",
                        auth_url=ai_core_sk.get("url")+"/oauth/token",
                        client_id=ai_core_sk.get("clientid"),
                        client_secret=ai_core_sk.get("clientsecret"),
                        resource_group=resource_group)
print(f"resource group: {resource_group}, name: {name}")


resource group: oss-llm, name: byom-open-source-llms


#### 5: Create a dedicated resource group (Optional but recommended)
resource_group defined here must be matched with resource_group in [config.json](config.json). Default as "oss-llm"

In [12]:
response = client.resource_groups.create(resource_group_id = resource_group)
print(response.__dict__)

{'resource_group_id': 'oss-llm-2', 'labels': None, 'status': None, 'created_at': None}


### 6: Register Docker Secret within SAP AI Core

Please skip this step if you have already registered your docker secret within SAP AI Core.

In [15]:
docker_secret = config["docker_secret"]
response = client.docker_registry_secrets.create(
    name = docker_secret["name"],
    data = docker_secret["data"]
)

print(response.__dict__)
print(f'Docker Registry Secret: {docker_secret["name"]}')

{'message': 'secret has been created'}
Docker Registry Secret: docker-secret


### 7: Update the serving templates
Please replace the place holders in the following serving templates.
- <YOUR_DOCKER_SECRET> to be replaced with **docker-secret** created in step 5 or your own docker secret.
- <YOUR_DOCKER_USER> to be replaced with your own docker hub user.
- ai.sap.com/resourcePlan: By default, the resource plan is as **infer.s**, which is sufficient for 7B model in the sample tests notebooks afterwards. If you would like to run 13B or 30B beyond etc, please use **infer.m** or **infer.l** resource plan. Check out more detail about [Resource Plan in SAP AI Core](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/choose-resource-plan-c58d4e584a5b40a2992265beb9b6be3c).
```yaml
    metadata:
      #...
      labels: |
        ai.sap.com/resourcePlan: infer.s
    spec: |
      predictor:
        imagePullSecrets:
          - name: <YOUR_DOCKER_SECRET>
          ...
        containers:
            - name: kserve-container
              image: docker.io/<YOUR_DOCKER_USER>/ollama:ai-core
```
- [../byom-oss-llm-templates/llama.cpp-template.yaml](../byom-oss-llm-templates/llama.cpp-template.yaml)
- [../byom-oss-llm-templates/local-ai-template.yaml](../byom-oss-llm-templates/local-ai-template.yaml)
- [../byom-oss-llm-templates/ollama-template.yaml](../byom-oss-llm-templates/ollama-template.yaml)
- [../byom-oss-llm-templates/vllm-template.yaml](../byom-oss-llm-templates/vllm-template.yaml)
- [../byom-oss-llm-templates/infinity-template.yaml](../byom-oss-llm-templates/infinity-template.yaml)

### 8: Onboard github repository and create an application

In [64]:
# Onboard repository
repo_config = config["git_repo"]
repository = client.repositories.create(name,
                                        url=repo_config.get("repo_url"),
                                        username=repo_config.get("user"),
                                        password=repo_config.get("access_token")
                                        )
print(repository)

# Create application
app_config = config["application"]
application = client.applications.create(revision=app_config.get("revision", "HEAD"),
                                        path=app_config.get("path_in_repo"),
                                        application_name=name,
                                        repository_name=name
                                        )
print(application)

Message: Repository has been on-boarded.
Id: byom-open-source-llms, Message: Application has been successfully created.


As a result, github repo is on-boarded in SAP AI Core.<br/>
![github-repo-onboarded](../resources/02-onboarding-github-repo.jpg)

### 9: Check if application has synced and scenario created

In [3]:
max_tries = 10
i = 0
interval_s = 20
while i < max_tries:
    i = i +1
    app_status = client.applications.get_status(name)
    print(f"Health Status: {app_status.health_status}, Sync Status: {app_status.sync_status}, Sync Finished at: {app_status.sync_finished_at}" )
    
    if(app_status.sync_status == "Synced"):
        break

    # Synchronize the application and wait
    client.applications.refresh(name) 
    sleep(interval_s)

if app_status.sync_status == "Synced":
    print("Application synced")
    # Check scenarios
    scenarios = client.scenario.query()

    scenario_list = config["scenarios"]
    for scenario in scenario_list:
        scenario_name = scenario["name"]
        scenario_exists = scenario_name in [s.name for s in scenarios.resources]
        print(f"Scenario {scenario} synced") if scenario_exists else print(f"Scenario {scenario_name} not yet available")

else:
    #print(f"Application not yet synced after 10 time retry. Likely, something wrong in the templates under git repo {repository.url}/{app_config.get("path_in_repo")}.\nPlease check it. You can run this cell again once it is fixed.")
    print(f"Application not yet synced after 10 time retry. Please execute this cell again")

Health Status: Healthy, Sync Status: Synced, Sync Finished at: 2024-05-09T08:36:31Z
Application synced
Scenario {'name': 'ollama', 'id': 'byom-ollama-server'} synced
Scenario {'name': 'local-ai', 'id': 'byom-local-ai-server'} synced
Scenario {'name': 'llama.cpp', 'id': 'byom-llama.cpp-server'} synced
Scenario {'name': 'vllm', 'id': 'byom-vllm-server'} synced
Scenario {'name': 'infinity', 'id': 'byom-infinity-server'} synced


As a result, the scenarios are created<br/>
![byom-oss-llm-app](../resources/03-byom-oss-llm-app-ai-launchpad.jpg)

### 10: Create configurations
Create a configuration for each scenario based on [config.json](config.json) described in step 2.<br/>
<br/>
If you need different models from the default configuration in the matrix below, please create a configuration for the target scenario through SAP AI Launchpad.
Scenario | Model | Resource Plan
---------|----------|----------
ollama   | [Phi3:14b](https://ollama.com/library/phi3) | infer.s
local-ai | [Mistral-7B-OpenOrca-GGUF](https://github.com/go-skynet/model-gallery/blob/main/mistral.yaml) | infer.s
llama.cpp   | [mistral-7b-instruct-v0.2.Q5_K_M.gguf](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF)| infer.s
vllm   | [TheBloke/Mistral-7B-Instruct-v0.2-AWQ](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-AWQ) | infer.s
transformer   | [Microsoft's Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) | infer.s
infinity   | [nreimers/MiniLM-L6-H384-uncased](https://huggingface.co/nreimers/MiniLM-L6-H384-uncased) | infer.s

In [50]:
"""
Helper function to update json file(e.g. config.json) with (key, value) pair in json
"""
def update_json_file(file_path, key, value):
    # Load the JSON configuration file
    with open(file_path, 'r') as file:
        config = json.load(file)

    # Update the value
    config[key] = value

    # Write the updated configuration back to the file
    with open(file_path, 'w') as file:
        json.dump(config, file, indent=4)
        print(f"{file_path} updated. {key}: {value}")

In [52]:
from ai_core_sdk.models import InputArtifactBinding,ParameterBinding
from ai_api_client_sdk.models.artifact import Artifact
from ai_api_client_sdk.models.label import Label

# Create serving configurations
conf_list = config["configurations"]

for conf in conf_list:
    print(f'--------------{conf["scenario_id"]}--------------')
    # Create input artifacts for model
    input_artifact_bindings = []
    for ia in conf["input_artifacts"]:
        # Since it is a dummy model place holder, we only create the artifact when it is missing
        # Otherwise, skip it.
        if len(ia["artifact_id"]) == 0:
            artifact = client.artifact.create(
                name = ia["name"], # Custom Non-unqiue identifier
                kind = Artifact.Kind.MODEL,
                url = ia["url"], 
                scenario_id = conf["scenario_id"],
                description = ia["description"]
                # labels = [
                #      Label(key="ext.ai.sap.com/model", value=ia["name"]) # any descriptive key-value pair, helps in filtering, key must have the prefix ext.ai.sap.com/
                # ]
            )
            
            print(artifact.__dict__)
            # Write back the artifact_id to configuration
            ia["artifact_id"] = artifact.id
        input_artifact_bindings.append(InputArtifactBinding(key=ia['key'], artifact_id=ia["artifact_id"]))
    parameter_bindings = [ParameterBinding(key=pb['key'], value=pb['value']) for pb in conf["parameters"]] 
  
    # Create the configuration with associated parameters and input artifacts
    configuration = client.configuration.create(
        name=conf["name"],
        scenario_id=conf["scenario_id"],
        executable_id=conf["executable_id"],
        parameter_bindings=parameter_bindings,
        input_artifact_bindings = input_artifact_bindings
    )
    
    print(configuration)

    # Update the configuration_id in env.json under the corresponding folder
    # which will be used in continuos-deployment.ipynb to create deployment automatically.
    update_json_file(f'{conf["executable_id"]}/env.json',"configuration_id", configuration.id)
    config_id = configuration.id

# write back to config.json
with open("config.json", 'w') as file:
        json.dump(config, file, indent=4)
        print("config.json updated.")

--------------byom-ollama-server--------------
{'id': 'afb4c65a-be36-4bf4-adad-94a3ccf466d8', 'message': 'Artifact acknowledged', 'url': 'ai://object-store-secret/Dummy'}
Id: 9338f418-36be-40c9-b739-5a12ec4f853e, Message: Configuration created
ollama/env.json updated. configuration_id: 9338f418-36be-40c9-b739-5a12ec4f853e
--------------byom-local-ai-server--------------
{'id': '1849757a-cd5a-4bb3-9b3c-c5717fb727a9', 'message': 'Artifact acknowledged', 'url': 'ai://object-store-secret/Dummy'}
Id: 05c671e4-5723-44ac-91e7-f769c8f1aaea, Message: Configuration created
local-ai/env.json updated. configuration_id: 05c671e4-5723-44ac-91e7-f769c8f1aaea
--------------byom-llama.cpp-server--------------
{'id': 'e0be0462-0df5-48e2-a1b8-283c31db2b8c', 'message': 'Artifact acknowledged', 'url': 'ai://object-store-secret/Dummy'}
Id: d93b2fa1-5c21-447c-ba3a-d39e9fcd35c8, Message: Configuration created
llama.cpp/env.json updated. configuration_id: d93b2fa1-5c21-447c-ba3a-d39e9fcd35c8
--------------byom