### Introduction
In this notebook, we will test out [Microsoft's Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct) with the [custom inference server with hugging face transformers library](app/server.py) in SAP AI Core. Enhance [btp-generative-ai-hub-use-cases/01-social-media-citizen-reporting](https://github.com/SAP-samples/btp-generative-ai-hub-use-cases/tree/main/01-social-media-citizen-reporting) with Vision capability powered by the Phi-3-vision models, in which the issues of public facilities reported by citizen can be automatically analyzed by the Vision API with its image. <br/><br/>

In addition, You can also run another transformer-based open-source LLM on [Hugging Face](https://huggingface.co), like [microsoft/Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) etc.

### Prerequisites
Before running this notebook, please assure you have performed the [Prerequisites](../../README.md) and [01-deployment.ipynb](01-deployment.ipynb). As a result, a deployment of transformer scenario is running in SAP AI Core.<br/><br/>

If the configuration and deployment are created through SAP AI Launchpad, please manually update the configuration_id and deployment_id in [env.json](env.json)
```json
{
    "configuration_id": "<YOUR_CONFIGURATION_ID_OF_TRANSFORMER_SCENARIO>",
    "deployment_id": "<YOUR_DEPLOYMENT_ID_BASED_ON_CONFIG_ABOVE>"
}
```

### The high-level flow:

- Load configurations info
- Connect to SAP AI Core via SDK
- Check the status and logs of the deployment
- Pull model from ollama model repository through API
- Inference the model with OpenAI-compatible chat completion API


#### 1.Load config info

- resource_group loaded from [config.json](../config.json)
- deployment_id(created in 01-deployment.ipynb) loaded [env.json](env.json)


In [1]:
import requests, json
from ai_api_client_sdk.ai_api_v2_client import AIAPIV2Client

In [19]:
# Please replace the configurations below.
# config_id: The target configuration to create the deployment. Please create the configuration first.
with open("../config.json") as f:
    config = json.load(f)

with open("./env.json") as f:
    env = json.load(f)

deployment_id = env["deployment_id"]
resource_group = config.get("resource_group", "default")
print("deployment id: ", deployment_id, " resource group: ", resource_group)

deployment id:  d2156e8d87b824ad  resource group:  oss-llm


#### 2.Initiate connection to SAP AI Core


In [3]:
aic_sk = config["ai_core_service_key"]
base_url = aic_sk["serviceurls"]["AI_API_URL"] + "/v2/lm"
ai_api_client = AIAPIV2Client(
    base_url=base_url,
    auth_url=aic_sk["url"] + "/oauth/token",
    client_id=aic_sk["clientid"],
    client_secret=aic_sk["clientsecret"],
    resource_group=resource_group,
)

In [4]:
token = ai_api_client.rest_client.get_token()
headers = {
    "Authorization": token,
    "ai-resource-group": resource_group,
    "Content-Type": "application/json",
}

#### 3.Check the deployment status


In [None]:
# Check deployment status before inference request
deployment_url = f"{base_url}/deployments/{deployment_id}"
response = requests.get(url=deployment_url, headers=headers)
resp = response.json()
status = resp["status"]

deployment_log_url = (
    f"{base_url}/deployments/{deployment_id}/logs?start=2024-03-25T01:35:58"
)

if status == "RUNNING":
    print(f"Deployment-{deployment_id} is running. Ready for inference request")
else:
    print(
        f"Deployment-{deployment_id} status: {status}. Not yet ready for inference request"
    )
    # retrieve deployment logs
    # {{apiurl}}/v2/lm/deployments/{{deploymentid}}/logs.
    response = requests.get(url=deployment_log_url, headers=headers)
    print("Deployment Logs:\n", response.text)

#### 4.Prepare for inferencing

We'll use [Microsoft's Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct).

In [None]:
model = "microsoft/Phi-3-vision-128k-instruct"
deployment = ai_api_client.deployment.get(deployment_id)

inference_base_url = f"{deployment.deployment_url}"
openai_chat_api_endpoint = f"{inference_base_url}/v1/chat/completions"

#### 5.Inference OpenAI-like chat completion API for Vision

In the following samples, we'll enhance [btp-generative-ai-hub-use-cases/01-social-media-citizen-reporting](https://github.com/SAP-samples/btp-generative-ai-hub-use-cases/tree/main/01-social-media-citizen-reporting) with Vision capability powered by the Phi-3-vision model serving with the [custom inference server with hugging face transformers library](app/server.py) in SAP AI Core

##### 5.1 Vision Sample#1: Describe the image (a cracked bridge)

Let's start with a simple question to ask the model to describe the image in nature language.<br/><br/>
Display the sample image.


In [9]:
from IPython.display import display, Image

# URL of the image
# To replace an image with raw image url
image_url = "https://raw.githubusercontent.com/SAP-samples/btp-generative-ai-hub-use-cases/main/10-byom-oss-llm-ai-core/resources/10-bridge-crack.png"

# Display the image
display(Image(url=image_url))

In [23]:
user_msg = "What is in the image?"
#image_url = "https://raw.githubusercontent.com/SAP-samples/btp-generative-ai-hub-use-cases/main/10-byom-oss-llm-ai-core/resources/11-dirty-street.jpg"
json_data = {
    "model": model,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": user_msg},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url
                    },
                },
            ]
        }
    ],
    "max_new_tokens": 500,
    "temperature": 0.7,
    "do_sample": "True"
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print("Result:", response.text)

Result: {"id":"1718385804","object":"chat.completion","created":1718385804,"model":"microsoft/Phi-3-vision-128k-instruct","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"message":{"role":"assistant","content":"The image shows a large concrete overpass with a van parked underneath it. The overpass has pillars supporting it, and there is graffiti on one of the pillars. Below the overpass, there are buildings and a parking lot with various vehicles. A puddle of water is visible on the ground."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":1,"total_tokens":1}}


##### 5.2 Vision Sample#2: Public Facilities Issue Spotter for [Citizen Reporting use case](https://github.com/SAP-samples/btp-generative-ai-hub-use-cases/tree/main/01-social-media-citizen-reporting) (e.g. Bridge Damage)

In next sample, we'll ask the [Microsoft's Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct) to be an Assistant of Public Facilities Issue Spotter for city council.
Responsible for analyzing images reported by citizens through a mobile app to identify issues related to public facilities. <br/>
Here are the tasks: <br/>

- 1.Analyze images reported by citizens through a mobile app to identify issues related to public facilities. If no issue identified, go to step 5, otherwise continue with next steps
- 2.Extract photographic date and location information from images for accurate documentation.
- 3.Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
- 4.Assess the severity and priority of identified issues to determine appropriate action plans.
- 5.Output with JSON schema in triple quote as below:

```json
{ "issue_identified": "{{true or false}}",
#below section only output when there is an issue identified
"title": "{{A title about the issue less than 100 characters}}",
"description": "{{A short description about the issue less than 300 characters}}",
"photo_date": "{{Extracted photographic date from its metadata in yyyy-mm-dd:hh:mm:ss format}}",
"longitude": "{{Extracted the longitude of photographic location from its metadata. Output -1 if fails to extract location info from image}}",
"latitude": "{{Extracted the latitude of photographic location from its metadata. Output -1 if fails to extract location info from image}}",
"category": "{{Identified category: 01-Infrastructure Damage, 02-Cleanliness, 03-Safety Hazards, 04-Duplicated}}",
"priority": "{{Identified priority: 01-Very High, 02-High, 03-Medium, 04-Low}}",
"suggested_action": "{{01-Immediate Attendance, 02-Schedule Inspection, 03-Schedule Service, 04-Refer to similar issue}}"
}
```


In [24]:
user_msg = '''
You are a helpful Assistant of Public Facilities Issue Spotter for city council.
Responsible for analyzing images reported by citizens through a mobile app to identify issues related to public facilities. 
Here are your tasks:
1.Analyze images reported by citizens through a mobile app to identify issues related to public facilities. 
If no issue identified, go to step 5, otherwise continue with next steps 
2.Extract photographic date and location information from images for accurate documentation. 
3.Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
4.Assess the severity and priority of identified issues to determine appropriate action plans. 
5.Only Output JSON result with schema as below:
{ "issue_identified": "{{true or false}}", 
#below section only output when there is an issue identified
"title": "{{A short title about the issue}}", 
"description": "{{A detail description about the issue}}", 
"photo_date": "{{Extracted photographic date from its metadata in yyyy-mm-dd:hh:mm:ss format. Leave it blank if no metadata found in it.}}", 
"longitude": "{{Extracted longitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"latitude": "{{Extracted latitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"category": "{{Identified category: 01-Infrastructure Damage, 02-Cleanliness, 03-Safety Hazards, 04-Duplicated}}",
"priority": "{{Suggested Priority: 01-Very High, 02-High, 03-Medium, 04-Low}}",
"suggested_action": "{{01-Immediate Attendance, 02-Schedule Inspection, 03-Schedule Service, 04-Refer to similar issue }}"
} 
'''

json_data = {
    "model": model,
    "response_format": {"type": "json_object"},
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": user_msg},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url
                    },
                },
            ]
        }
    ],
    "max_new_tokens": 1000,
    "temperature": 0.7,
    "do_sample": "True"
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print("Result:", response.text)

Result: {"id":"1718385918","object":"chat.completion","created":1718385918,"model":"microsoft/Phi-3-vision-128k-instruct","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"message":{"role":"assistant","content":"{\n  \"issue_identified\": true,\n  \"title\": \"Damage to Concrete\",\n  \"description\": \"The concrete under the bridge appears to be cracked and damaged, posing a potential safety hazard.\",\n  \"photo_date\": \"2023-04-01 10:30:00\",\n  \"longitude\": -85.3687,\n  \"latitude\": 43.8500,\n  \"category\": 01,\n  \"priority\": 01,\n  \"suggested_action\": \"Immediate Attendance\"\n}"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":1,"total_tokens":1}}


##### 5.3 Vision Sample#3: Describe the image (e.g. a littered street)

Let's continue with another sample image about a littered street, asking the model to describe the image in nature language.


Display the sample image


In [25]:
image_url = "https://raw.githubusercontent.com/SAP-samples/btp-generative-ai-hub-use-cases/main/10-byom-oss-llm-ai-core/resources/11-dirty-street.jpg"
# Display the image
display(Image(url=image_url))

Let's ask it to spots on the public facility issue in the image, and Describe the issue in detail if any.


In [26]:
user_msg = "Spots on the public facility issue in the image, and Describe the issue in detail if any"

json_data = {
    "model": model,
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": user_msg},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url
                    },
                },
            ]
        }
    ],
    "max_new_tokens": 500,
    "temperature": 0.7,
    "do_sample": "True"
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print("Result:", response.text)

Result: {"id":"1718385980","object":"chat.completion","created":1718385980,"model":"microsoft/Phi-3-vision-128k-instruct","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"message":{"role":"assistant","content":"There are no visible spots on the public facility in the image that can be described in detail."},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":1,"total_tokens":1}}


##### 5.4 Vision Sample#4: Public Facilities Issue Spotter for [Citizen Reporting use case](https://github.com/SAP-samples/btp-generative-ai-hub-use-cases/tree/main/01-social-media-citizen-reporting) (e.g. a dirty street)

In next sample, we'll ask the [Microsoft's Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct) to be an Assistant of Public Facilities Issue Spotter for city council.
Responsible for analyzing images reported by citizens through a mobile app to identify issues related to public facilities. <br/>
Here are the tasks: <br/>

- 1.Analyze images reported by citizens through a mobile app to identify issues related to public facilities. If no issue identified, go to step 5, otherwise continue with next steps
- 2.Extract photographic date and location information from images for accurate documentation.
- 3.Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
- 4.Assess the severity and priority of identified issues to determine appropriate action plans.
- 5.Output with JSON schema in triple quote as below:

```json
{ "issue_identified": "{{true or false}}",
#below section only output when there is an issue identified
"title": "{{A title about the issue less than 100 characters}}",
"description": "{{A short description about the issue less than 300 characters}}",
"photo_date": "{{Extracted photographic date from its metadata in yyyy-mm-dd:hh:mm:ss format}}",
"longitude": "{{Extracted the longitude of photographic location from its metadata. Output -1 if fails to extract location info from image}}",
"latitude": "{{Extracted the latitude of photographic location from its metadata. Output -1 if fails to extract location info from image}}",
"category": "{{Identified category: 01-Infrastructure Damage, 02-Cleanliness, 03-Safety Hazards, 04-Duplicated}}",
"priority": "{{Identified priority: 01-Very High, 02-High, 03-Medium, 04-Low}}",
"suggested_action": "{{01-Immediate Attendance, 02-Schedule Inspection, 03-Schedule Service, 04-Refer to similar issue}}"
}
```

In [28]:
user_msg = '''
You are a helpful Assistant of Public Facilities Issue Spotter for city council.
Responsible for analyzing images reported by citizens through a mobile app to identify issues related to public facilities. 
Here are your tasks:
1.Analyze images reported by citizens through a mobile app to identify issues related to public facilities. 
If no issue identified, go to step 5, otherwise continue with next steps 
2.Extract photographic date and location information from images for accurate documentation. 
3.Categorize identified issues based on predefined categories (e.g., infrastructure damage, cleanliness, safety hazards).
4.Assess the severity and priority of identified issues to determine appropriate action plans. 
5.Only Output JSON result with schema as below:
{ "issue_identified": "{{true or false}}", 
#below section only output when there is an issue identified
"title": "{{A short title about the issue}}", 
"description": "{{A detail description about the issue}}", 
"photo_date": "{{Extracted photographic date from its metadata in yyyy-mm-dd:hh:mm:ss format. Leave it blank if no metadata found in it.}}", 
"longitude": "{{Extracted longitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"latitude": "{{Extracted latitude of photographic location from its metadata. Do not make up any number. Output -1 if fails to extract location info from image}}",
"category": "{{Identified category: 01-Infrastructure Damage, 02-Cleanliness, 03-Safety Hazards, 04-Duplicated}}",
"priority": "{{Suggested Priority: 01-Very High, 02-High, 03-Medium, 04-Low}}",
"suggested_action": "{{01-Immediate Attendance, 02-Schedule Inspection, 03-Schedule Service, 04-Refer to similar issue }}"
} 
'''

json_data = {
    "model": model,
    "response_format": {"type": "json_object"},
    "messages": [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": user_msg},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": image_url
                    },
                },
            ]
        }
    ],
    "max_new_tokens": 1000,
    "temperature": 0.7,
    "do_sample": "True"
}

response = requests.post(url=openai_chat_api_endpoint, headers=headers, json=json_data)
print("Result:", response.text)

Result: {"id":"1718386052","object":"chat.completion","created":1718386052,"model":"microsoft/Phi-3-vision-128k-instruct","system_fingerprint":"fp_44709d6fcb","choices":[{"index":0,"message":{"role":"assistant","content":"```json\n\n{\n\n  \"issue_identified\": true,\n\n  \"title\": \"Litter on Roadway\",\n\n  \"description\": \"A street with litter on the ground and along the side of the road.\",\n\n  \"photo_date\": \"2023-04-01 13:47:35\",\n\n  \"longitude\": \"123.456789\",\n\n  \"latitude\": \"45.678910\",\n\n  \"category\": \"01-Infrastructure Damage\",\n\n  \"priority\": \"01-Very High\",\n\n  \"suggested_action\": \"01-Immediate Attendance\"\n\n}\n\n```"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":0,"completion_tokens":1,"total_tokens":1}}
