# Semantic Kernel Orchestrator


In [None]:
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
import os

from dotenv import load_dotenv

load_dotenv()

# To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings. 
# Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
base_url="https://models.inference.ai.azure.com"
api_key=os.environ["GITHUB_TOKEN"]

from semantic_kernel.connectors.ai.azure_ai_inference import AzureAIInferenceChatCompletion

chat_completion_service = AzureAIInferenceChatCompletion(
    ai_model_id="gpt-4o",
    api_key=api_key,
    endpoint=base_url, # Used to point to your service
    service_id="azure_openai", # Optional; for targeting specific services within Semantic Kernel
)

You can start using the completion service right away or add the chat completion service to a kernel. You can use the following code to add a service to the kernel.

In [None]:
from semantic_kernel import Kernel

# Initialize the kernel
kernel = Kernel()

# Add the chat completion service created above to the kernel
kernel.add_service(chat_completion_service)

Once you've added chat completion services to your kernel, you can retrieve them using the get service method. Below is an example of how you can retrieve a chat completion service from the kernel.

In [None]:
from semantic_kernel.connectors.ai.chat_completion_client_base import ChatCompletionClientBase

# Retrieve the chat completion service by type
chat_completion_service = kernel.get_service(type=ChatCompletionClientBase)

# Retrieve the chat completion service by id
chat_completion_service = kernel.get_service(service_id="azure_openai")

# Retrieve the default inference settings
execution_settings = kernel.get_prompt_execution_settings_from_service_id("azure_openai")

## Using chat completion service

Now that you have a chat completion service, you can use it to generate responses from an AI agent. There are two main ways to use a chat completion service:

**Non-streaming**: You wait for the service to generate an entire response before returning it to the user.

**Streaming**: Individual chunks of the response are generated and returned to the user as they are created.
Before getting started, you will need to manually create an execution settings instance to use the chat completion service if you did not register the service with the kernel

Non-Streaming

In [None]:
from semantic_kernel.contents.chat_history import ChatHistory

chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")

response = await chat_completion_service.get_chat_message_content(
    chat_history=chat_history,
    settings=execution_settings,
)

print(response)

Streaming

In [None]:
from semantic_kernel.contents.chat_history import ChatHistory

chat_history = ChatHistory()
chat_history.add_user_message("Hello, how are you?")

response = chat_completion_service.get_streaming_chat_message_content(
    chat_history=chat_history,
    settings=execution_settings,
)

async for chunk in response:
    print(chunk, end="")

## Conclusion

In this notebook, we've explored how to use the Semantic Kernel orchestrator:

1. Set up and configured Azure OpenAI chat completion services.
2. Integrated chat completion services into a Semantic Kernel instance.
3. Demonstrated retrieving and using chat completion services.
4. Explored both non-streaming and streaming methods for generating AI responses.

Semantic Kernel orchestrators simplify the integration and management of AI services, enabling efficient and flexible AI-driven interactions.

For more information, visit the [Semantic Kernel documentation](https://learn.microsoft.com/en-us/semantic-kernel/).