# Generating images with AI

This notebook demonstrates how to use OpenAI DALL-E 3 to generate images, in combination with other LLM features like text and embedding generation.

Here, we use Chat Completion to generate a random image description and DALL-E 3 to create an image from that description, showing the image inline.

Lastly, the notebook asks the user to describe the image. The embedding of the user's description is compared to the original description, using Cosine Similarity, and returning a score from 0 to 1, where 1 means exact match.

In [13]:
// Usual setup: importing Semantic Kernel SDK and SkiaSharp, used to display images inline.

#r "nuget: Microsoft.SemanticKernel, 1.0.1"
#r "nuget: System.Numerics.Tensors, 8.0.0"
#r "nuget: SkiaSharp, 2.88.3"
#r "nuget: SkiaSharp.NativeAssets.Linux.NoDependencies, 2.88.3"

#!import ../setup/config/Settings.cs
#!import ../setup/config/Utils.cs
#!import ../setup/config/SkiaUtils.cs

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.TextToImage;
using Microsoft.SemanticKernel.Embeddings;
using Microsoft.SemanticKernel.ChatCompletion;
using Microsoft.SemanticKernel.Connectors.OpenAI;
using System.Numerics.Tensors;

# Setup, using three AI services: images, text, embedding

The notebook uses:

* **OpenAI Dall-E 3** to transform the image description into an image
* **text-embedding-ada-002** to compare your guess against the real image description

**Note:**: For Azure OpenAI, your endpoint should have DALL-E API enabled.

In [9]:
using Kernel = Microsoft.SemanticKernel.Kernel;

#pragma warning disable SKEXP0001, SKEXP0002, SKEXP0011, SKEXP0012

// Load OpenAI credentials from config/settings.json
var (useAzureOpenAI, model, azureEndpoint, apiKey, orgId) = Settings.LoadFromFile("../setup/config/settings.json");

// Configure the three AI features: text embedding (using Ada), chat completion, image generation (DALL-E 3)
var builder = Kernel.CreateBuilder();

if(useAzureOpenAI)
{
    builder.AddAzureOpenAITextEmbeddingGeneration("text-embedding-ada-002", azureEndpoint, apiKey);
    builder.AddAzureOpenAIChatCompletion("gpt-4-vision", azureEndpoint, apiKey);
    builder.AddAzureOpenAITextToImage("Dalle3", azureEndpoint, apiKey);
}
else
{
    builder.AddOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey, orgId);
    builder.AddOpenAIChatCompletion("gpt-4-vision", apiKey, orgId);
    builder.AddOpenAITextToImage(apiKey, orgId);
}

var kernel = builder.Build();

// Get AI service instance used to generate images
var dallE = kernel.GetRequiredService<ITextToImageService>();

// Get AI service instance used to extract embedding from a text
var textEmbedding = kernel.GetRequiredService<ITextEmbeddingGenerationService>();

# Generate a (random) image with DALL-E 3

**genImgDescription** is a Semantic Function used to generate a random image description. 
The function takes in input a random number to increase the diversity of its output.

The random image description is then given to **Dall-E 3** asking to create an image.

In [10]:
#pragma warning disable SKEXP0002

var prompt = @"
Think about an artificial object correlated to number {{$input}}.
Describe the image with one detailed sentence. The description cannot contain numbers.";

var executionSettings = new OpenAIPromptExecutionSettings
{
    MaxTokens = 256,
    Temperature = 1
};

// Create a semantic function that generate a random image description.
var genImgDescription = kernel.CreateFunctionFromPrompt(prompt, executionSettings);

var random = new Random().Next(0, 200);
var imageDescriptionResult = await kernel.InvokeAsync(genImgDescription, new() { ["input"] = random });
var imageDescription = imageDescriptionResult.ToString();

// Use DALL-E 3 to generate an image. OpenAI in this case returns a URL (though you can ask to return a base64 image)
var imageUrl = await dallE.GenerateImageAsync(imageDescription.Trim(), 1024, 1024);

await SkiaUtils.ShowImage(imageUrl, 1024, 1024);

# Let's play a guessing game

Try to guess what the image is about, describing the content.

You'll get a score at the end 😉

In [11]:
// Prompt the user to guess what the image is
var guess = await InteractiveKernel.GetInputAsync("Describe the image in your words");

// Compare user guess with real description and calculate score
var origEmbedding = await textEmbedding.GenerateEmbeddingsAsync(new List<string> { imageDescription } );
var guessEmbedding = await textEmbedding.GenerateEmbeddingsAsync(new List<string> { guess } );
var similarity = TensorPrimitives.CosineSimilarity(origEmbedding.First().Span, guessEmbedding.First().Span);

Console.WriteLine($"Your description:\n{Utils.WordWrap(guess, 90)}\n");
Console.WriteLine($"Real description:\n{Utils.WordWrap(imageDescription.Trim(), 90)}\n");
Console.WriteLine($"Score: {similarity:0.00}\n\n");

//Uncomment this line to see the URL provided by OpenAI
//Console.WriteLine(imageUrl);

Your description:
73


Real description:
The image depicts a brightly lit scoreboard displaying the home team's score next to a
bold, illuminated "SEVENTY-THREE" with the away team's score trailing behind.


Score: 0.82




In [34]:
const string ImageUri = "https://media.licdn.com/dms/image/D4D1FAQFTEXiTqfELaw/feedshare-document-images_1280/2/1706096961111?e=1707350400&v=beta&t=jNxHoHCBw4KFwbIurcqIdRcKKQ9P--l_UEDEn6jinbI";

var chatCompletionService = kernel.GetRequiredService<IChatCompletionService>();

var chatHistory = new ChatHistory("You are trained to interpret images about people and make responsible assumptions about them.");

chatHistory.AddUserMessage(new ChatMessageContentItemCollection
{
    new TextContent("Give me a json with the names, roles and company of the people in the image"),
    new ImageContent(new Uri(ImageUri))
});

var executionSettings = new OpenAIPromptExecutionSettings
{
    MaxTokens = 256,
    Temperature = 0.2
};

var reply = await chatCompletionService.GetChatMessageContentAsync(chatHistory, executionSettings);

Console.WriteLine(reply.Content);

```json
{
  "people": [
    {
      "name": "José Camacho",
      "role": "Tech Lead",
      "company": "Xpand IT"
    },
    {
      "name": "Jorge Borralho",
      "role": "Tech Lead",
      "company": "Xpand IT"
    },
    {
      "name": "Stefan Müller",
      "role": "Germany General Manager",
      "company": "Xpand IT"
    },
    {
      "name": "Horst Urlberger",
      "role": "Business Lead GTM",
      "company": "Microsoft Germany"
    }
  ]
}
```
