### Introduction
In this notebook, we will test out inferencing Microsoft's [Phi3:14b](https://ollama.com/library/phi3) model in [Ollama](https://ollama.com/) within SAP AI Core through [SAP Generative AI Hub SDK](https://pypi.org/project/generative-ai-hub-sdk/), which can significantly simplify the integration of self-hosted open-source LLMs in SAP AI Core with your own application, and provides the same interface as proprietary LLMs in SAP Generative AI Hub. You can also run LLaMa 3, Phi3, Mistral, Mixtral, LLaVa and other [supported models in Ollama](https://ollama.com/library). <br/><br/>

Please refer to this [blog post](https://community.sap.com/t5/artificial-intelligence-and-machine-learning-blogs/bring-open-source-llms-into-sap-ai-core-with-ollama/ba-p/13659769) about Bring Open-Source LLMs into SAP AI Core with Ollama for more details.

### Prerequisites
Before running this notebook, please assure you have performed the [Prerequisites](../../README.md) and [01-deployment.ipynb](01-deployment.ipynb). As a result, a deployment of Ollama scenario is running in SAP AI Core.<br/><br/>

If the configuration and deployment are created through SAP AI Launchpad, please manually update the configuration_id and deployment_id in [env.json](env.json)
```json
{
    "configuration_id": "<YOUR_CONFIGURATION_ID_OF_OLLAMA_SCENARIO>",
    "deployment_id": "<YOUR_DEPLOYMENT_ID_BASED_ON_CONFIG_ABOVE>"
}
```
 
### The high-level flow:
- Load configurations info
- Connect to SAP AI Core via its SDK
- Check the status and logs of the deployment
- Pull the target model from ollama model repository through API
- Register the byom-open-source-llm scenario as a foundation model scenario via SAP Generative AI Hub SDK
- Inference the model through **SAP Generative AI Hub SDK**
    - Option 1: Proxy with OpenAI-like interface
    - Option 2: Proxy with Langchain-like interface
    - Option 3: Proxy with Langchain-like interface, together with Langchain components


#### 1.Load config info 
- resource_group loaded from [config.json](../config.json)
- deployment_id(created in 01-deployment.ipynb) loaded [env.json](env.json)

In [70]:
import requests, json
from ai_core_sdk.ai_core_v2_client import AICoreV2Client

In [71]:
# Please replace the configurations below.
# config_id: The target configuration to create the deployment. Please create the configuration first.
with open("../config.json") as f:
    config = json.load(f)

with open("./env.json") as f:
    env = json.load(f)

deployment_id = env["deployment_id"]
resource_group = config.get("resource_group", "default")
print("deployment id: ", deployment_id, " resource group: ", resource_group)

deployment id:  df240ccfb2d899b0  resource group:  oss-llm


#### 2.Initiate connection to SAP AI Core 

In [72]:
# Initiate an AI Core SDK client with the information of service key
ai_core_sk = config["ai_core_service_key"]
base_url = ai_core_sk.get("serviceurls").get("AI_API_URL") + "/v2/lm"
ai_core_client = AICoreV2Client(base_url=ai_core_sk.get("serviceurls").get("AI_API_URL")+"/v2",
                        auth_url=ai_core_sk.get("url")+"/oauth/token",
                        client_id=ai_core_sk.get("clientid"),
                        client_secret=ai_core_sk.get("clientsecret"),
                        resource_group=resource_group)


In [73]:
token = ai_core_client.rest_client.get_token()
headers = {
        "Authorization": token,
        'ai-resource-group': resource_group,
        "Content-Type": "application/json"}


#### 3.Check the deployment status 

In [74]:
# Check deployment status before inference request
deployment_url = f"{base_url}/deployments/{deployment_id}"
response = requests.get(url=deployment_url, headers=headers)
resp = response.json()    
status = resp['status']

deployment_log_url = f"{base_url}/deployments/{deployment_id}/logs"
if status == "RUNNING":
        print(f"Deployment-{deployment_id} is running. Ready for inference request")
else:
        print(f"Deployment-{deployment_id} status: {status}. Not yet ready for inference request")
        #retrieve deployment logs
        #{{apiurl}}/v2/lm/deployments/{{deploymentid}}/logs.

        response = requests.get(deployment_log_url, headers=headers)
        print('Deployment Logs:\n', response.text)


Deployment-df240ccfb2d899b0 is running. Ready for inference request


#### 4.Pull the model into Ollama 

In [75]:
model = "phi3:14b"                   #phi3-medium
#model = "llama3:latest"             #8b
#model = "llama3:70b"                #Very slow even with resource plan infer.l. Not recommended
#model = "llama3:70b-instruct-q2_K"  #Very slow even with resource plan infer.l. Not recommended
#model = "mistral:latest"            #7b-q4_0
#model = "mistral:7b-instruct-q5_K_M"
#model = "mixtral:8x7b-instruct-v0.1-q4_0" #Important: please use resource plan as train.l or infer.l in configuration

deployment = ai_core_client.deployment.get(deployment_id)
inference_base_url = f"{deployment.deployment_url}/v1"

In [None]:
# pull the model from ollama model repository
endpoint = f"{inference_base_url}/api/pull"
print(endpoint)

#let's pull the target model from ollama
json_data = {  "name": model}

response = requests.post(endpoint, headers=headers, json=json_data)
print('Result:', response.text)

Next, let's list the model and check if the target model is listed. 

In [76]:
# Check the model list 
endpoint = f"{inference_base_url}/api/tags"
print(endpoint)

response = requests.get(endpoint, headers=headers)
print('Result:', response.text)

https://api.ai.prod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/df240ccfb2d899b0/v1/api/tags
Result: {"models":[{"name":"phi3:14b","model":"phi3:14b","modified_at":"2024-05-23T09:50:06.013426087Z","size":7897126241,"digest":"1e67dff39209b792d22a20f30ebabe679c64db83de91544693c4915b57e475aa","details":{"parent_model":"","format":"gguf","family":"phi3","families":["phi3"],"parameter_size":"14.0B","quantization_level":"F16"},"expires_at":"0001-01-01T00:00:00Z"},{"name":"llama3:latest","model":"llama3:latest","modified_at":"2024-05-23T02:47:43.160643007Z","size":4661224676,"digest":"365c0bd3c000a25d28ddbf732fe1c6add414de7275464c4e4d1c3b5fcb5d8ad1","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"8.0B","quantization_level":"Q4_0"},"expires_at":"0001-01-01T00:00:00Z"},{"name":"llama3:8b","model":"llama3:8b","modified_at":"2024-05-23T02:19:25.163506171Z","size":4661224676,"digest":"365c0bd3c000a25d28ddbf732fe1c6add414de7275

#### 5.Inference completion and chat completion APIs with SAP Generative AI Hub SDK

##### 5.0 Register the scenario as a foundation model scenario

In [77]:
from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient

GenAIHubProxyClient.add_foundation_model_scenario(
    scenario_id="byom-ollama-server",
    config_names="ollama*",
    prediction_url_suffix="/v1/chat/completions",
)

proxy_client = GenAIHubProxyClient(ai_core_client = ai_core_client)

##### 5.1 Option 1-Proxy with OpenAI-like interface
Now let's test Ollama's [OpenAI compatible API for Chat Completion](https://github.com/ollama/ollama/blob/main/docs/openai.md) via Proxy with OpenAI-like interface in SAP Generative AI Hub SDK, which is the exact API interface of Chat Completion of GPT-3.5/4 in SAP Generative AI Hub. 

In [80]:
# Option 1: Proxy with OpenAI-like interface
from gen_ai_hub.proxy.native.openai import OpenAI

openai = OpenAI(proxy_client=proxy_client)
messages = [{"role": "user", "content": "Tell me a joke"}]
# kwargs = dict(deployment_id='xxxxxxx', model=model,messages = messages)
result = openai.chat.completions.create(
    # **kwargs
    deployment_id=deployment_id,
    model=model,
    messages=messages
)

print("Option 1: Proxy with OpenAI-like interface\n", result.choices[0].message.content)

Option 1: Proxy with OpenAI-like interface
 Here's one:

Why don't scientists trust atoms?

Because they make up everything!

Hope that made you smile!


##### 5.2 Option 2-Proxy with Langchain-like interface
Now let's test Ollama's [OpenAI compatible API for Chat Completion](https://github.com/ollama/ollama/blob/main/docs/openai.md) via Proxy with Langchain-like interface in SAP Generative AI Hub SDK, which is the exact API interface of Chat Completion of GPT-3.5/4 in SAP Generative AI Hub. 

In [81]:
# Option 2: Proxy with Langchain-like interface
from gen_ai_hub.proxy.langchain.openai import ChatOpenAI
from langchain.schema.messages import HumanMessage, SystemMessage

messages = [HumanMessage(content="Tell me a joke")]
llm = ChatOpenAI(
    proxy_client=proxy_client,
    deployment_id=deployment_id,
    model_name=model
)
completion = llm.invoke(messages)
print("Option 2: Proxy with Langchain-like interface\n", completion.content)

Option 2: Proxy with Langchain-like interface
 Here's one:

Why don't scientists trust atoms?

Because they make up everything!

Hope that made you smile! Do you want to hear another?


##### 5.3 Option 3-Proxy with Langchain-like interface, together with Langchain components
Now let's test Ollama's [OpenAI compatible API for Chat Completion](https://github.com/ollama/ollama/blob/main/docs/openai.md) via Proxy with  Langchain-like interface, together with Langchain components in SAP Generative AI Hub SDK, which is the exact API interface of Chat Completion of GPT-3.5/4 in SAP Generative AI Hub. 

In [82]:
# Option 3: Proxy with Langchain-like interface, together with Langchain components
from langchain.prompts import PromptTemplate

llm = ChatOpenAI(
    proxy_client=proxy_client,
    deployment_id=deployment_id,
    model_name=model,
    temperature=0.5,
    max_tokens=400,
    # model_kwargs={
    #     "frequency_penalty": -2, "presence_penalty": -1
    # }
)

template = "Tell me a joke about {topic}"
prompt = PromptTemplate(template=template, input_variables=["topic"])
llm_chain = prompt | llm

completion = llm_chain.invoke({"topic": "Generative AI"})

print("Option 3: Proxy with Langchain-like interface, together with Langchain components\n",completion.content)

Option 3: Proxy with Langchain-like interface, together with Langchain components
 Why did the Generative AI model go to therapy?

Because it was having some "generated" issues! It kept producing outputs that were just a little too repetitive, and its creators were worried it had become stuck in an infinite loop of self-fulfillment!

Hope that one generated a smile on your face!


##### 5.4 Sample#4: Customer Message Processing via Proxy with OpenAI-like Interface in JSON mode
In our sample [btp-industry-use-cases/04-customer-interaction-gpt4](https://github.com/SAP-samples/btp-industry-use-cases/tree/main/04-customer-interaction-gpt4),GPT-3.5/4 is used to process customer messages in customer interactions and output in json schema with plain prompting.
- Summarize customer message into title and a short description
- Analyze the sentiment of the customer message
- Extract the entities from the customer message, such as customer, product, order no etc.

Let's see if the same scenario could be achieved with llama3-8b or mistral-7b.


In [51]:
# Prepare the messages to process customer message with
# summarization, sentiment analysis and entities extraction and output as json
sys_msg = r'''
You are an AI assistant to process the input text. Here are your tasks on the text.
1.Apply Sentiment Analysis
2.Generate a title less than 100 characters,and summarize the text into a short description less than 200 characters
3.Extract the entities such as customer,product,order,delivery,invoice etc from the text Here is a preliminary list of the target entity fields and description. Please extract all the identifiable entities even not in the list below. Don't include any field with unknown value. 
-customer_no: alias customer number, customer id, account id, account number which could be used to identify a customer.
-customer_name: customer name, account name
-customer_phone: customer contact number. -product_no: product number, product id
-product_name
-order_no: sales order number, order id
-order_date 
-delivery_no: delivery number, delivery id
-delivery_date: delivery date, shipping date
-invoice_no: alias invoice number, invoice id, receipt number, receipt id etc. which can be used to locate a invoice.
-invoice_date: invoice date, purchase date
-store_name
-store_location
etc.
    
For those fields not in list must follow the Snakecase name conversation like product_name, no space allow. 

Output expected in JSON format as below: 
{\"sentiment\":\"{{Positive/Neutral/Negative}}\",\"title\":\"{{The generated title based on the input text less than 100 characters}}\",\"summary\":\"{{The generated summary based on the input text less than 300 characters}}\",\"entities\":[{\"field\":\"{{the extracted fields such as product_name listed above}}\",\"value\":\"{{the extracted value of the field}}\"}]}
'''

user_msg = r'''
Input text: 
Everything was working fine one day I went to make a shot of coffee it stopped brewing after 3 seconds Then I tried the milk frother it stopped after 3 seconds again I took it back they fixed it under warranty but it’s happening again I don’t see this machine lasting more then 2 years to be honest I’m spewing I actually really like the machine It’s almost like it’s losing pressure somewhere, they wouldn’t tell my what the problem was when they fixed it.. Purchased at Harvey Norman for $1,349. 
Product is used: Several times a week
 
JSON:
'''

messages = [
            {
                "role": "system",
                "content": sys_msg
            },
            {
                "role": "user",
                "content": user_msg
            }
        ]

#JSON Mode
response_format={"type": "json_object"} 

In [None]:
# Option 1: Proxy with OpenAI-like interface

# kwargs = dict(deployment_id=deployment_id, model=model,messages = messages)
result = openai.chat.completions.create(
    # **kwargs
    deployment_id=deployment_id,
    model=model,
    response_format=response_format,
    messages=messages
)

print("Option 1: Proxy with OpenAI-like interface\n", result.choices[0].message.content)

##### 5.5 Sample#5: Customer Message Processing with Option 2-Langchain-compatible Interface

In [None]:
# Option 2: Proxy with Langchain-like interface
# summarization, sentiment analysis and entities extraction and output as json

messages = [SystemMessage(content=sys_msg),
            HumanMessage(content=user_msg)]
llm = ChatOpenAI(
    proxy_client=proxy_client,
    deployment_id=deployment_id,
    model_name=model
).bind(
 response_format=response_format
)

completion = llm.invoke(messages)
print("Option 2: Proxy with Langchain-like interface\n", completion.content)

##### 5.6 Sample#5: Citizen Reporting Use Case with Option 2-Langchain-compatible Interface

Prepare the schema of entities about out of social medial post in citizen reporting app.<br/>
Output example:<br/>
```json
{
    "address": "Oakwood Road",
    "category": "PUBLIC CLEANLINESS",
    "description": "The public area on Oakwood Road in Sagenai is in a disgraceful state with piles of rubbish and litter scattered everywhere. The author is frustrated with the local authorities for not maintaining cleanliness despite the taxes they pay. They hope for immediate action.",
    "location": "51.57470453612761,0.003792117010085437",
    "priority": "3-Medium",
    "sentiment": "NEGATIVE",
    "summary": "Dirty public area on Oakwood Road"
}
```

In [66]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
        
category = '''Identify if the social media reports a situation related to one of the following categories: 
            1. PUBLIC CLEANLINESS: Dirty public areas, overflowing dustbins and littering. Bulky waste in common areas.  
            2. ROADS & FOOTPATHS: Including covered linkways, signboards & streetlights. E.g. Pot holes, huge cracks, etc.
            3. FACILITY & PARK MAINTENANCE: Fallen trees, overgrown grass, and maintenance of park lighting and facilities.
            4. PESTS: Sighting of bees and hornets, potential mosquito breeding sites, and more.
            5. DRAINS & SEWERS: Choked, overflowing, or damaged drains, bad sewage smells, flooding.   
            Output the category name. If none of the categories fits, or in doubt, return OTHER - PLEASE CHECK.  
            '''

            
priority = '''Identify the priority to be given to the reported issues:
            4-Low : the issue does not pose any problem with public safety and does not necessarily need to be handled urgently. 
            3-Medium : the issue does not cause any immediate danger, but it has significant and negative impact on the daily life of people in the neighborhood.
            2-High : the issue needs to be resolved quickly because it can potentially cause dangerous situations or disruptions. 
            1-Very High : the issue needs to be handled as soon as possible, as it is a matter of public safety. 
            Return the priority level. If in doubt, return 3-Medium '''
            
        
sentiment ='''Extract the sentiment of the post: 
            1. NEUTRAL: if the issue is reported politely
            2. NEGATIVE: if the post shows irritation, impatience, annoyance
            3. VERY NEGATIVE: the post expresses rage, hatred
            '''

address = ResponseSchema(name="address",
            description="Extract the address where the issue has been noticed. Return the street only and omit the town or country. For example: Oakwood Road.")
category = ResponseSchema(name="category",
            description=category)
description = ResponseSchema(name="description",
            description="Summarize the issue that is being reported in not more that 300 characters and a neutral tone.")
location = ResponseSchema(name="location",
            description="Extract the coordinates where the issue has been notices. The format should be: (51.57470453612761,0.003792117010085437).")
priority = ResponseSchema(name="priority",
            description=priority)
sentiment = ResponseSchema(name="sentiment",
            description=sentiment)
summary = ResponseSchema(name="summary",
            description="Summarize the issue that is being reported in 40 characters and a neutral tone.")
        
response_schemas = [
            address,
            category,
            description,
            location,
            priority,
            sentiment,
            summary
        ]

Helper function to convert the social media post into string.

In [67]:
def post_to_str(input_message):
    #message = f"redditPostId: {input_message["id"]}, author: {input_message["author"]}, title: {input_message["title"]}, message: {input_message["longText"]}, postingDate: {input_message["postingDate"]}"
    message = "redditPostId: " + input_message["id"]+\
            ", author: "+input_message["author"]+", title: "+input_message["title"]+\
            ", message: "+input_message["longText"]+", postingDate: "+input_message["postingDate"]
    return message

Prepare the final prompt to extract the entities from the social medial post about public facility issue through citizen reporting app

In [68]:
template_string = '''Extract information from the following social media post: 
            {post}
            {format_instructions}
            '''
        
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()
print(format_instructions)
        
prompt_template = ChatPromptTemplate.from_template(template=template_string)
input_message = {
        "id": "198qqqm",
        "author": "jacobtan89",
        "title": "Dirty public area",
        "longText": "The public area on Oakwood Road in Sagenai is in a disgraceful state with piles of rubbish and litter scattered everywhere. The author is frustrated with the local authorities for not maintaining cleanliness despite the taxes they pay. They hope for immediate action. #CleanUpYourAct #OakwoodRoadNightmare #DisgustingNeighborhood Coordinates:(51.57470453612761,0.003792117010085437)",
        "postingDate": "2024-01-17T07:13:48.000Z"
    }

message = post_to_str(input_message)
complete_prompt = prompt_template.format_messages(
            post = message,
            format_instructions = format_instructions
)

print(complete_prompt)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"address": string  // Extract the address where the issue has been noticed. Return the street only and omit the town or country. For example: Oakwood Road.
	"category": string  // Identify if the social media reports a situation related to one of the following categories: 
            1. PUBLIC CLEANLINESS: Dirty public areas, overflowing dustbins and littering. Bulky waste in common areas.  
            2. ROADS & FOOTPATHS: Including covered linkways, signboards & streetlights. E.g. Pot holes, huge cracks, etc.
            3. FACILITY & PARK MAINTENANCE: Fallen trees, overgrown grass, and maintenance of park lighting and facilities.
            4. PESTS: Sighting of bees and hornets, potential mosquito breeding sites, and more.
            5. DRAINS & SEWERS: Choked, overflowing, or damaged drains, bad sewage smells, flooding.   
        

In [69]:
llm = ChatOpenAI(
    proxy_client=proxy_client,
    deployment_id=deployment_id,
    model_name=model,
    temperature=0.5,
    max_tokens=400,
    # model_kwargs={
    #     "frequency_penalty": -2, "presence_penalty": -1
    # }
)

#llm_chain = complete_prompt | llm
completion = llm.invoke(complete_prompt)

print("Option 3: Proxy with Langchain-like interface, together with Langchain components\n",completion.content)

Option 3: Proxy with Langchain-like interface, together with Langchain components
 Here is the extracted information in the required markdown code snippet:

```json
{
    "address": "Oakwood Road",
    "category": "PUBLIC CLEANLINESS",
    "description": "Piles of rubbish and litter scattered everywhere, despite taxes being paid.",
    "location": "(51.57470453612761,0.003792117010085437)",
    "priority": "2-High",
    "sentiment": "NEGATIVE",
    "summary": "Dirty public area with litter"
}
```
