.Net: Filters use cases to be supported before making the feature non-experimental #5436

matthewbolanos · 2024-03-12T02:37:17Z

Tasks

Create showcase application which demonstrates all functionality below

To make filters non-experimental, the following user stories should be met:

Telemetry – any of the telemetry that is made available via Semantic Kernel logging should be possible to recreate using filters. This would allow a developer to author telemetry with additional coordination IDs and signals that are not available in the out-of-the-box telemetry. To validate this, we should ensure that filters are available in the same spots as existing telemetry and the filters have the same information.
Approving function calls – a developer should be able to request approval from a user before performing an action requested by an AI. If the user approves, the function should be invoked. If the user disapproves, the developer should be able to customize the cancelation message sent to the AI so that it understands that the action was rejected. For example, it should be possible to return back a message like "The function call to wire $1000 was rejected by the current user".
Semantic Caching – with filters, a developer should be able to get a rendered prompt and check if there is a cached result that can be given to a user instead of spending tokens with an LLM. To achieve this, a developer should be able to add a filter after a prompt is rendered to check if there is a cached result. If there is, the dev should be able to cancel the operation and send back the cached result instead. It should also be possible to cache results from an LLM request using the function invoked filter.
FrugalGPT – a dev should be able to implement FrugalGPT with filters. With filters, a developer should be able to call a local AI model to determine how complex a request is. Depending on the complexity of the request, the dev should be able to change the execution settings of the prompt so that it can use different models. This might be possible with the AI service selector. If it is, we should rationalize which path should be used when.
Catching/throwing errors – a function may result in an error. Instead of sending the default error to the LLM, the developer should be able to do one of three things... 1) customize the error message sent to the LLM, 2) let the error propagate up so that it can be caught elsewhere, 3) do nothing and let the existing error message go to the LLM.

Additionally, thought should be considered to make applying filters more easily. These are likely extensions, so they should not be blocking for making filters non-experimental:

Targeting filter – It should be "easy" to apply a filter to a single function or an entire plugin without having to write conditions within the filter. For example, semantic caching may only be necessary (or valid) on a couple semantic functions. The developer should be able to take an off-the-shelf filter (e.g., written by the Redis Cache team) and selectively apply it to only some prompts.
Targeting filters by property – Some filters should only be enabled if a function has a particular property (e.g., if they are a semantic function). One property we may also consider is if a function is "consequential". With this property, a developer could choose which functions should require approvals.
Targeting by invocation – Some filters should only be enabled if the function is invoked via a tool call. For example, it's likely not necessary to request user approval if a function is called within a template or explicitly by the developer. Instead, it's only necessary to run the filter if its invoked by an AI (where less trust is available).

lavinir · 2024-03-12T11:40:55Z

For Approving function calls:

Not sure I completely understand the use case for User vs Developer. In terms of what happens after a 'cancellation' there should be higher flexibility. There are two places for filters Pre function invocation and post invocation.
If the user decides to cancel, it should also bypass (or at least have the option) to bypass any additional call to the LLM and return the relevant data back to the user:

Updated chat history including any successful function invocations that were not cancelled.
Function response (if this is a post invocation filter)
Perhaps some additional context object supplied by the function filter.

@matthewbolanos , does that make sense ?

matthewbolanos · 2024-04-03T14:33:35Z

I created a second issue to track the need for filters at the automatic function call level here: #5470

matthewbolanos · 2024-05-01T17:26:31Z

In terms of prioritization of samples, I would demonstrate filters in the following order:

Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).
Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.
Frugal GPT – The developer should be able to make a request to a cheaper model (e.g., GPT-3.5 turbo) to determine how complex a query is. If the model thinks the request is complex, GPT-4 should be used; otherwise, GPT-3.5 should be used.
Long running memory – After a chat completion has been completed, the developer should be able to cache the previous back-and-forth. Later, during a future prompt rendering, the developer should be able to use RAG to retrieve previous conversations that are relevant to the most recent query/topic and inject them into the chat history.
Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279

dmytrostruk · 2024-05-01T19:55:45Z

3. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279

@matthewbolanos I've already added text moderation together with Prompt Shields in the same PR. But it uses Azure AI Content Safety service instead of OpenAI moderation endpoint:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Demos/ContentSafety/Filters/TextModerationFilter.cs

We can extend this demo app later when OpenAI text moderation connector will be implemented. Let me know what you think about it.

matthewbolanos · 2024-05-01T20:09:58Z

Lowered the priority of the text classification example per your comment

sophialagerkranspandey · 2024-05-02T23:34:58Z

Hi @dmytrostruk, there are 3 top items I have from customer requests to add to this:

dmytrostruk · 2024-05-03T03:16:25Z

Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).

#6109

dmytrostruk · 2024-05-08T00:27:07Z

2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.

#6151

dmytrostruk · 2024-05-09T04:14:24Z

Example of PII detection: #6171

dmytrostruk · 2024-05-15T05:26:16Z

Example of text summarization and translation evaluation: #6262

dmytrostruk · 2024-06-19T16:29:12Z

Example of FrugalGPT: #6815

### Motivation and Context  Related: #5436 Kernel and connectors have out-of-the-box telemetry to capture key information, which is available during requests. In most cases this telemetry should be enough to understand how the application behaves. This example contains the same telemetry recreated using Filters. This should allow to extend existing telemetry if needed with additional information and have the same set of logging messages for custom connectors. ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄

markwallace-microsoft added triage .NET Issue or Pull requests regarding .NET code sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community) and removed triage labels Mar 12, 2024

markwallace-microsoft assigned dmytrostruk Mar 12, 2024

github-actions bot changed the title ~~Make filters non-experimental~~ .Net: Make filters non-experimental Mar 12, 2024

markwallace-microsoft changed the title ~~.Net: Make filters non-experimental~~ .Net: Filters use cases to be supported before making the feature non-experimental Mar 12, 2024

This was referenced Mar 25, 2024

.Net: Graduate Filters feature #5409

Closed

.Net: LLM invoked KernelFunction but final answer does not use result from function #5622

Closed

dmytrostruk mentioned this issue Jun 25, 2024

.Net: Added example with telemetry in Filters #6943

Merged

4 tasks

dmytrostruk closed this as completed Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: Filters use cases to be supported before making the feature non-experimental #5436

.Net: Filters use cases to be supported before making the feature non-experimental #5436

matthewbolanos commented Mar 12, 2024 •

edited by dmytrostruk

Loading

lavinir commented Mar 12, 2024

matthewbolanos commented Apr 3, 2024

matthewbolanos commented May 1, 2024 •

edited

Loading

dmytrostruk commented May 1, 2024

matthewbolanos commented May 1, 2024

sophialagerkranspandey commented May 2, 2024

dmytrostruk commented May 3, 2024

dmytrostruk commented May 8, 2024

dmytrostruk commented May 9, 2024

dmytrostruk commented May 15, 2024

dmytrostruk commented Jun 19, 2024

.Net: Filters use cases to be supported before making the feature non-experimental #5436

.Net: Filters use cases to be supported before making the feature non-experimental #5436

Comments

matthewbolanos commented Mar 12, 2024 • edited by dmytrostruk Loading

lavinir commented Mar 12, 2024

matthewbolanos commented Apr 3, 2024

matthewbolanos commented May 1, 2024 • edited Loading

dmytrostruk commented May 1, 2024

matthewbolanos commented May 1, 2024

sophialagerkranspandey commented May 2, 2024

dmytrostruk commented May 3, 2024

dmytrostruk commented May 8, 2024

dmytrostruk commented May 9, 2024

dmytrostruk commented May 15, 2024

dmytrostruk commented Jun 19, 2024

matthewbolanos commented Mar 12, 2024 •

edited by dmytrostruk

Loading

matthewbolanos commented May 1, 2024 •

edited

Loading