In [73]:
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from dotenv import load_dotenv
import os

load_dotenv()

True

In [74]:
instruction = """You are a medical classification engine for health conditions. Classify the prompt into into one of the following possible treatment options: 'doctor_required' (serious condition), 'pharmacist_required' (light condition) or 'rest_required' (general tiredness). If you cannot classify the prompt, output 'unknown'. 
Only respond with the single word classification. Do not produce any additional output.

# Examples:
User: "I did not sleep well." Assistant: "rest_required"
User: "I chopped off my arm." Assistant: "doctor_required"

# Task
User: 
"""

In [75]:
user_inputs = [
    "I'm tired.", # rest_required
    "I'm bleeding from my eyes.", # doctor_required
    "I have a headache." # pharmacist_required
]

In [76]:
AZURE_OPENAI_RESOURCE = os.environ["AZURE_OPENAI_RESOURCE"]
AZURE_OPENAI_KEY = os.environ["AZURE_OPENAI_KEY"]

The first example shows using the inference client against an Azure OpenAI endpoint. In this case, three arguments are mandatory: 
 * an endpoint URL in the form of `https://<resouce-name>.openai.azure.com/openai/deployments/<deployment-name>` 
 * the credential to access it (could be either the key or the integrated Azure SDK authentication)
 * the API version (this is mandatory in Azure OpenAI API access)

In [77]:
client = ChatCompletionsClient(
    endpoint=f"https://{AZURE_OPENAI_RESOURCE}.openai.azure.com/openai/deployments/gpt-4o-mini/",
    credential=AzureKeyCredential(AZURE_OPENAI_KEY),
    api_version="2024-06-01",
)

run_inference()

I'm tired. -> rest_required
I'm bleeding from my eyes. -> doctor_required
I have a headache. -> pharmacist_required


In [78]:
def run_inference():
    for user_input in user_inputs:
        messages = [{
            "role": "user",
            "content": f"{instruction}{user_input} Assistant: "
        }]
        print(f"{user_input} -> ", end="")
        stream = client.complete(
            messages=messages,
            stream=True
        )
        for chunk in stream:
            if chunk.choices and chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="")
        print()

The final example bootstraps a ChatCompletionsClient pointing at the local completion server from LM Studio. In this case, we do not need to supply the credentials as the server is running locally and we can access it without authentication.

With this, the inference code is still the same, but everything happens completely locally.

In [79]:
client = ChatCompletionsClient(
    endpoint="http://localhost:1234/v1",
    credential=AzureKeyCredential("")
)

run_inference()

I'm tired. ->  rest_required
I'm bleeding from my eyes. ->  doctor_required
I have a headache. ->  pharmacist_required
