Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: HuggingFace implement also IChatCompletionService? #5403

Closed
jkone27 opened this issue Mar 10, 2024 · 14 comments · Fixed by #5785
Closed

.Net: HuggingFace implement also IChatCompletionService? #5403

jkone27 opened this issue Mar 10, 2024 · 14 comments · Fixed by #5785
Assignees
Labels
.NET Issue or Pull requests regarding .NET code

Comments

@jkone27
Copy link

jkone27 commented Mar 10, 2024

i am testing the hugging face preview package for .NET, but i cannot make use of IChatCompletionService

it fails to resolve that, not using openai, only huggingface apis..

#r "nuget: Microsoft.SemanticKernel"
#r "nuget: Microsoft.Extensions.Logging.Debug"
#r "nuget: Microsoft.SemanticKernel.Abstractions"
#r "nuget: Microsoft.SemanticKernel.Connectors.HuggingFace, 1.5.0-preview"
#r "nuget: Microsoft.Extensions.DependencyInjection"

open Microsoft.Extensions.Logging
//open Microsoft.SemanticKernel.Plugins.Core
open Microsoft.SemanticKernel
open Microsoft.SemanticKernel.Connectors.HuggingFace
open System
open System.ComponentModel
open Microsoft.Extensions.DependencyInjection
open Microsoft.Extensions.Logging.Abstractions
open Microsoft.SemanticKernel.ChatCompletion
open Microsoft.SemanticKernel.Connectors.OpenAI
open System.Threading.Tasks

// type Test() =
//     interface IChatCompletionService with
//         member this.GetChatMessageContentAsync(chatHistory, executionSettings, kernel, cancellationToken) =
//             task { return [ new ChatMessageContent() ] :> IReadonlyList }


let builder =
    let apikey = "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"
    let model = "openchat/openchat-3.5-0106"

    Kernel
        .CreateBuilder()
        .AddHuggingFaceTextGeneration(model = model, apiKey = apikey)
        // .AddOpenAIChatCompletion(model) i am not using openai, tried adding this, but also fails to load

type LightPlugin() =
    let mutable isOn = false

    [<KernelFunction>]
    [<Description("Gets the state of the light.")>]
    member this.GetState() = if isOn then "on" else "off"

    [<KernelFunction>]
    [<Description("Changes the state of the light.")>]
    member this.ChangeState(newState: bool) =
        isOn <- newState
        let state = this.GetState()
        printfn $"[Light is now {state}]"
        state

// load a plugin from code
builder.Services.AddLogging(fun c -> c.AddDebug().SetMinimumLevel(LogLevel.Trace) |> ignore)
builder.Plugins.AddFromType<LightPlugin>() |> ignore
let kernel = builder.Build()

// or for fsi fsi.CommandLineArgs[1] for example
printfn "prompt to summarize:"
let request = Console.ReadLine()

let testPrompt =
    $"make a 1 liner quick and short summary of max 20 words of this request: `{request}`"

// 1. test "raw" promts to verify your connection
let result = kernel.InvokePromptAsync(testPrompt).Result

printfn $"> {result}"

// 2. test SK prompts

let history = new ChatHistory()

let chatCompletionService = kernel.GetRequiredService<IChatCompletionService>()


let switchTheLight (cmd: string) = $"switch the light {cmd}"

switchTheLight "ON" |> history.AddUserMessage

// Enable auto function calling
let openAIPromptExecutionSettings = new OpenAIPromptExecutionSettings()
openAIPromptExecutionSettings.ToolCallBehavior <- ToolCallBehavior.AutoInvokeKernelFunctions

// Get the response from the AI
let result2 =
    chatCompletionService
        .GetChatMessageContentAsync(history, openAIPromptExecutionSettings, kernel)
        .Result

printfn $">> {result2.Content}"
history.AddMessage(result2.Role, result2.Content) |> ignore
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Mar 10, 2024
@github-actions github-actions bot changed the title HuggingFace implement also IChatCompletionService? .Net: HuggingFace implement also IChatCompletionService? Mar 10, 2024
@Krzysztof318
Copy link
Contributor

Krzysztof318 commented Mar 10, 2024

I suggest you implementing your own IChatCompletionService for specific model. You can use existing HuggingFaceTextGenerationService but with special prompt.

HuggingFace interference api doesn't have common syntax for chatting, so a chat payload can vary on model you use.

@jkone27
Copy link
Author

jkone27 commented Mar 10, 2024

Can you provide a quick sample on how to implement a very basic one ? I was also thinking the same

@Krzysztof318
Copy link
Contributor

Look at this sample https://github.com/microsoft/semantic-kernel/blob/main/dotnet%2Fsamples%2FKernelSyntaxExamples%2FExample16_CustomLLM.cs

chat completion service is similar but takes ChatHistory param instead of string prompt.

You have to create prompt something like that from chat history

User: hello
Assistant: hi what can I do?
User:  help me with...
Assitant: <here empty so model can complete this>

But keep in mind this pattern can not work with each model.

@markwallace-microsoft markwallace-microsoft added question Further information is requested and removed triage labels Mar 12, 2024
@markwallace-microsoft markwallace-microsoft removed the question Further information is requested label Mar 12, 2024
@markwallace-microsoft
Copy link
Member

@jkone27 if you do want to make a contribution to the Semantic Kernel then @RogerBarreto can help by setting up a feature branch and getting your changes reviewed.

@Krzysztof318
Copy link
Contributor

My oversight, I see now that huggingFace support chatting via interference api https://huggingface.co/docs/api-inference/detailed_parameters?code=python#conversational-task

@JonathanVelkeneers
Copy link

I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?

https://huggingface.co/docs/text-generation-inference/messages_api#hugging-face-inference-endpoints
https://huggingface.co/blog/tgi-messages-api

@jkone27
Copy link
Author

jkone27 commented Mar 20, 2024

@markwallace-microsoft @Krzysztof318 ⬆️

@Krzysztof318
Copy link
Contributor

@jkone27 hi, unfortunately I don't have time for implementing that now. But why don't you want to contribute and create PR fot that? Implementing this should be quite simple.

@RogerBarreto
Copy link
Member

I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?

It is, although not supported by the public API this seems to be valid for the TGI deployments.

Hugging Face - Chat Completion POC.

Image

@JonathanVelkeneers
Copy link

JonathanVelkeneers commented Mar 29, 2024

I was looking for this functionality as well. Is it possible to implement it in the connector seeing as it has been implemented in Huggingface's API recently?

It is, although not supported by the public API this seems to be valid for the TGI deployments.

Hugging Face - Chat Completion POC.

@RogerBarreto
It is supported by the public api, albeit poorly documented. Here is an example of how a HuggingFace model can be used with an existing openAI client library, and thus can be used in chat mode.

This in turn can be translated to a cUrl request.

A requirement for this to work with a specific model is that it has to have the chat_template property in their tokenizer_config.json file example. So not all models will work OOTB with their public interference API, but most popular ones have this implemented.

Making a cUrl request to this model on the public API works in openAI chat style:

    curl https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/chat/completions \
    -X POST \
    -d '{"model":"NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO", "messages": [{"role":"user","content":"How is the weather in Antwerp, Belgium?"}], "parameters": {"temperature": 0.7, "max_new_tokens": 100}}' \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer XXXXXXXXXXXXXXX"

This yields the following result:

{"id":"","object":"text_completion","created":1711714682,"model":"text-generation-inference/Nous-Hermes-2-Mixtral-8x7B-DPO-medusa","system_fingerprint":"1.4.3-sha-e6bb3ff","choices":[{"index":0,"message":{"role":"assistant","content":"As weather data changes constantly, the most accurate and up-to-date weather information for Antwerp, Belgium, can be found through weather websites or apps. These sources provide real-time and forecasted weather updates, including temperature, wind speed, humidity, and chance of precipitation. To get the current data, visit websites like Weather.com or AccuWeather.com and search for 'Antwerp, Belgium'. Or, you can use a weather app like AccuWe"},"logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":20,"completion_tokens":100,"total_tokens":120}}% 

Streaming is also supported when stream:true is present in the request body.

@jkone27
Copy link
Author

jkone27 commented Apr 3, 2024

image

probably all models supported in huggingchat should support this also? just a thought

github-merge-queue bot pushed a commit that referenced this issue Apr 16, 2024
### Motivation and Context

Closes #5403 

1. Adding support to Chat Completion for TGI (Text Generation Inference)
Deployment.
3. Adding Missing UnitTests for Streaming and Non Streaming scenarios
(Text/Chat Completion)
4. Update Metadata + Usage Details for hugging face clients.

### Contribution Checklist

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
@jkone27
Copy link
Author

jkone27 commented Apr 17, 2024

thank you @RogerBarreto

@jkone27
Copy link
Author

jkone27 commented Apr 17, 2024

is there an example for this somewhere in tests or doc with any huggingface chat api?

@JonathanVelkeneers
Copy link

@jkone27

There is an example here.
In theory it's just replacing the localhost url with a hugginface interference api url.
Example:
Either https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO/v1/chat/completions
or https://api-inference.huggingface.co/models/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
should work.

However these changes have not been released in a new version. (last release 2 weeks ago)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants