Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Add overload ctor to ImageContent that takes an string "url" #4781

Closed
dersia opened this issue Jan 29, 2024 · 10 comments · Fixed by #5008
Closed

.Net: Add overload ctor to ImageContent that takes an string "url" #4781

dersia opened this issue Jan 29, 2024 · 10 comments · Fixed by #5008
Assignees
Labels
.NET Issue or Pull requests regarding .NET code

Comments

@dersia
Copy link
Contributor

dersia commented Jan 29, 2024

When using the Vision models of AzureOpenAI and OpenAI, the API allows for an Image to be attached either as an URL or as a Data URI. In Semantic Kernel we do have ImageContent as a KernelContent-Component, but unfortunately, it takes only an System.Uri as the parameter for the object.

I suggest to following addition to ImageContent:

Proposed API

public sealed class ImageContent : KernelContent
{
    // existing, I think we should just change the type
    // this is a breaking change, although it don't think we should have both
    // but alternatively we could just add another Property 'DataUri' instead of changing the type
    public Uri? Uri { get; set; }
    // replaced by this:
    public string Uri { get; set; }

    // existing, though changed to call the new overload instead of base
    public ImageContent(
        Uri uri,
        string? modelId = null,
        object? innerContent = null,
        Encoding? encoding = null,
        IReadOnlyDictionary<string, object?>? metadata = null)
        : this(uri.ToString(), modelId, innerContent, encoding, metadata)
    {
        this.Uri = uri;
    }

    public ImageContent(
        string dataUri,
        string? modelId = null,
        object? innerContent = null,
        Encoding? encoding = null,
        IReadOnlyDictionary<string, object?>? metadata = null)
        : base(innerContent, modelId, metadata)
    {
        this.Uri = uri;
    }
}

This would allow us to use Data URIs with embedded images for the Vision APIs.

Usage

var chatHistory = new ChatHistory();
var image = System.Convert.ToBase64String(imageBytes);
var imageDataLink = $"data:{imageContentType};base64,{image}";
chatHistory.Add(new ImageContent(imageDataLink));
var result = await chat.GetChatMessageContentAsync(chatHistory);

Alternate Design

This would be a breaking Change, since we change the Type of the Uri property on ImageContent, so instead

  • we could add another property DataUri to ImageContent, but this would also mean that we have to check which of the two properties are used and prefer one other the other, if both are filled.
  • We could also just make a private string field that holds the DataUri and when the Uri ctor is used we just store the Uri as string in that field and only use the field on implementation site.
  • We could also make ImageContent take a ROS, byte[] or even just a Stream of the image and handle all the Base64Encoding in the ImageContent ctor (I think I would prefer a static Create method over ctor for this option, sonce we can than read the stream async), but this would mean, that we also need a second parameter with the image-ContentType and this would not allow for non base64 encoded data uris

I am happy to work on the implementation, since I have to do it anyway, because we need the solution fairly soon.

@shawncal shawncal added .NET Issue or Pull requests regarding .NET code triage labels Jan 29, 2024
@github-actions github-actions bot changed the title Add overload ctor to ImageContent that takes an string "url" .Net: Add overload ctor to ImageContent that takes an string "url" Jan 29, 2024
@hez2010
Copy link

hez2010 commented Jan 31, 2024

Another way is to remove the length limitation of Uri in .NET: dotnet/runtime#96544

data:image/jpeg;base64,... is a valid uri and can be parsed by System.Uri without issue, but the .NET implementation of Uri has a length limitation (65520 bytes) so it can only process small images.

@dersia
Copy link
Contributor Author

dersia commented Jan 31, 2024

Another way is to remove the length limitation of Uri in .NET: dotnet/runtime#96544

data:image/jpeg;base64,... is a valid uri and can be parsed by System.Uri without issue, but the .NET implementation of Uri has a length limitation (65520 bytes) so it can only process small images.

this was my initial thought, but I am not sure how this would work with validation that is done today in System.Uri and if the validation needs to be changed, than this might be a breaking change.
if dotnet/runtime#96544 is a valid option, then I would also suggest to extend the UriKind enum to also include Data.

@Krzysztof318
Copy link
Contributor

I think this is bad idea to introduce this breaking change in abstraction layer.
So better would be create additional property.

@dersia
Copy link
Contributor Author

dersia commented Feb 1, 2024

I think this is bad idea to introduce this breaking change in abstraction layer. So better would be create additional property.

I would also prefer one of the alternative designs, when I think more about it. I still think it is a too bad, that Uri is a set-able public property :/

@dmytrostruk
Copy link
Member

@dersia This is exactly the reason why ImageContent.Uri is nullable at the moment - to avoid breaking changes for this class in the future. In OpenAI API, there is image_url.url object and it's not clear what data it accepts now and what possibly can be accepted in the future:
image

But from Semantic Kernel point of view, we want developers to use Uri in case if it's URL to image located on some server, and we want new properties like string? Base64 to pass image in base64 format or string Path to provide path to local image file (we should not add this, just as an example).

So, answering your question - we don't want to introduce breaking changes and new property that keeps image as base64 should work. Hope that makes sense, thank you!

@dersia
Copy link
Contributor Author

dersia commented Feb 6, 2024

@dersia This is exactly the reason why ImageContent.Uri is nullable at the moment - to avoid breaking changes for this class in the future. In OpenAI API, there is image_url.url object and it's not clear what data it accepts now and what possibly can be accepted in the future: image

But from Semantic Kernel point of view, we want developers to use Uri in case if it's URL to image located on some server, and we want new properties like string? Base64 to pass image in base64 format or string Path to provide path to local image file (we should not add this, just as an example).

So, answering your question - we don't want to introduce breaking changes and new property that keeps image as base64 should work. Hope that makes sense, thank you!

@dmytrostruk sounds good to me. I have started on the implementation (as I said, we need this sooner rather than later) and in this implementation I have gone down the non breaking path. I also added overloads for byte[] and ROS<byte>. I am just adding new tests for this and then I could create a pr if you'd like.

also I was wondering if there would be a need for a stream version that would use a crypto stream to base64 encode on the fly while writing to the stream. this would be interesting for larger image files, but I don't think this is a concern as of now, right?

@dmytrostruk
Copy link
Member

I also added overloads for byte[] and ROS<byte>

Did you consider using System.Memory.Data.BinaryData? It supports ReadOnlyMemory<byte>, Stream, string and byte[].

I am just adding new tests for this and then I could create a pr if you'd like.

That would be awesome, thank you!

also I was wondering if there would be a need for a stream version that would use a crypto stream to base64 encode on the fly while writing to the stream. this would be interesting for larger image files, but I don't think this is a concern as of now, right?

Yes, I think it's not a concern, at least for now.

@dersia
Copy link
Contributor Author

dersia commented Feb 7, 2024

I also added overloads for byte[] and ROS<byte>

Did you consider using System.Memory.Data.BinaryData? It supports ReadOnlyMemory<byte>, Stream, string and byte[].

I am just adding new tests for this and then I could create a pr if you'd like.

That would be awesome, thank you!

also I was wondering if there would be a need for a stream version that would use a crypto stream to base64 encode on the fly while writing to the stream. this would be interesting for larger image files, but I don't think this is a concern as of now, right?

Yes, I think it's not a concern, at least for now.

System.Memory.Data.BinaryData is a great idea! I didn't think of it. unfortunately the Stream ctor for BinaryData still isn't good enough for large files, but since that isn't a concern for now, I will just implement it with BinaryData .

I will also adjust the tests for to work with BinaryData.

dersia added a commit to dersia/semantic-kernel that referenced this issue Feb 7, 2024
dersia added a commit to dersia/semantic-kernel that referenced this issue Feb 7, 2024
dersia added a commit to dersia/semantic-kernel that referenced this issue Feb 7, 2024
@dersia
Copy link
Contributor Author

dersia commented Feb 7, 2024

@dmytrostruk please find my PR #4919

dersia added a commit to dersia/semantic-kernel that referenced this issue Feb 14, 2024
dersia added a commit to dersia/semantic-kernel that referenced this issue Feb 15, 2024
github-merge-queue bot pushed a commit that referenced this issue Feb 20, 2024
Reopend PR. See #4919
for previous PR.
Reopening this PR from another feature branch makes rebasing easier.

### Motivation and Context

As Described in #4781 right now there is no possibility in SK to add
Images as DataUris to ChatCompletion APIs, although the Azure OpenAI API
and the Open AI API both support this.

Fixes #4781

### Description

As per Discussion added overload to the ImageContent ctor that takes
BinaryData.
For backward Compat we kept the ctor that takes an URI.

Also the new ctor throws, if the BinaryData is null, empty or if there
is not MediaType provided.

I thought about allowing plain, non base64 encoded DataUris with
BinaryData.
The Idea was to not encode to base64, if the MediaType is set to
"text/plain", but then I decided, that this is not needed, since `Uri`
in general allows for DataUris like `new
Uri("data:text/plain;http://exmpaledomain.com")` just not for DataUris
that are longer than 65520 bytes. I feel like that is ok, for plain
DataUris.

We can still add this if needed.

Also as per discussion in the issue, I did not add additional overloads
for direct Streams support.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
@dersia
Copy link
Contributor Author

dersia commented Feb 20, 2024

Thanks for merging this @RogerBarreto, however I thought we would wait for Azure/azure-rest-api-specs#27780 to be done so we can fully integrate this. Because as of now I does not do anything :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code
Projects
Archived in project
6 participants