Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Filters use cases to be supported before making the feature non-experimental #5436

Closed
6 tasks done
matthewbolanos opened this issue Mar 12, 2024 · 11 comments
Closed
6 tasks done
Assignees
Labels
.NET Issue or Pull requests regarding .NET code sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)

Comments

@matthewbolanos
Copy link
Member

matthewbolanos commented Mar 12, 2024

Tasks

  • Create showcase application which demonstrates all functionality below

To make filters non-experimental, the following user stories should be met:

  • Telemetry – any of the telemetry that is made available via Semantic Kernel logging should be possible to recreate using filters. This would allow a developer to author telemetry with additional coordination IDs and signals that are not available in the out-of-the-box telemetry. To validate this, we should ensure that filters are available in the same spots as existing telemetry and the filters have the same information.
  • Approving function calls – a developer should be able to request approval from a user before performing an action requested by an AI. If the user approves, the function should be invoked. If the user disapproves, the developer should be able to customize the cancelation message sent to the AI so that it understands that the action was rejected. For example, it should be possible to return back a message like "The function call to wire $1000 was rejected by the current user".
  • Semantic Caching – with filters, a developer should be able to get a rendered prompt and check if there is a cached result that can be given to a user instead of spending tokens with an LLM. To achieve this, a developer should be able to add a filter after a prompt is rendered to check if there is a cached result. If there is, the dev should be able to cancel the operation and send back the cached result instead. It should also be possible to cache results from an LLM request using the function invoked filter.
  • FrugalGPT – a dev should be able to implement FrugalGPT with filters. With filters, a developer should be able to call a local AI model to determine how complex a request is. Depending on the complexity of the request, the dev should be able to change the execution settings of the prompt so that it can use different models. This might be possible with the AI service selector. If it is, we should rationalize which path should be used when.
  • Catching/throwing errors – a function may result in an error. Instead of sending the default error to the LLM, the developer should be able to do one of three things... 1) customize the error message sent to the LLM, 2) let the error propagate up so that it can be caught elsewhere, 3) do nothing and let the existing error message go to the LLM.

Additionally, thought should be considered to make applying filters more easily. These are likely extensions, so they should not be blocking for making filters non-experimental:

  • Targeting filter – It should be "easy" to apply a filter to a single function or an entire plugin without having to write conditions within the filter. For example, semantic caching may only be necessary (or valid) on a couple semantic functions. The developer should be able to take an off-the-shelf filter (e.g., written by the Redis Cache team) and selectively apply it to only some prompts.
  • Targeting filters by property – Some filters should only be enabled if a function has a particular property (e.g., if they are a semantic function). One property we may also consider is if a function is "consequential". With this property, a developer could choose which functions should require approvals.
  • Targeting by invocation – Some filters should only be enabled if the function is invoked via a tool call. For example, it's likely not necessary to request user approval if a function is called within a template or explicitly by the developer. Instead, it's only necessary to run the filter if its invoked by an AI (where less trust is available).
@markwallace-microsoft markwallace-microsoft added triage .NET Issue or Pull requests regarding .NET code sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community) and removed triage labels Mar 12, 2024
@github-actions github-actions bot changed the title Make filters non-experimental .Net: Make filters non-experimental Mar 12, 2024
@markwallace-microsoft markwallace-microsoft changed the title .Net: Make filters non-experimental .Net: Filters use cases to be supported before making the feature non-experimental Mar 12, 2024
@lavinir
Copy link

lavinir commented Mar 12, 2024

For Approving function calls:

Not sure I completely understand the use case for User vs Developer. In terms of what happens after a 'cancellation' there should be higher flexibility. There are two places for filters Pre function invocation and post invocation.
If the user decides to cancel, it should also bypass (or at least have the option) to bypass any additional call to the LLM and return the relevant data back to the user:

  1. Updated chat history including any successful function invocations that were not cancelled.
  2. Function response (if this is a post invocation filter)
  3. Perhaps some additional context object supplied by the function filter.

@matthewbolanos , does that make sense ?

@matthewbolanos
Copy link
Member Author

I created a second issue to track the need for filters at the automatic function call level here: #5470

@matthewbolanos
Copy link
Member Author

matthewbolanos commented May 1, 2024

In terms of prioritization of samples, I would demonstrate filters in the following order:

  1. Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).
  2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.
  3. Frugal GPT – The developer should be able to make a request to a cheaper model (e.g., GPT-3.5 turbo) to determine how complex a query is. If the model thinks the request is complex, GPT-4 should be used; otherwise, GPT-3.5 should be used.
  4. Long running memory – After a chat completion has been completed, the developer should be able to cache the previous back-and-forth. Later, during a future prompt rendering, the developer should be able to use RAG to retrieve previous conversations that are relevant to the most recent query/topic and inject them into the chat history.
  5. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279

@dmytrostruk
Copy link
Member

3. Using moderation classification model – The developer should be able to use a classification model to determine if a prompt is not ok. If a prompt is not "ok" the response should be updated so that the user is provided a reason for why their request was not processed or why the LLM's response was inappropriate. This may require Text classification ADR #5279

@matthewbolanos I've already added text moderation together with Prompt Shields in the same PR. But it uses Azure AI Content Safety service instead of OpenAI moderation endpoint:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Demos/ContentSafety/Filters/TextModerationFilter.cs
image

We can extend this demo app later when OpenAI text moderation connector will be implemented. Let me know what you think about it.

@matthewbolanos
Copy link
Member Author

Lowered the priority of the text classification example per your comment

@sophialagerkranspandey
Copy link
Contributor

@dmytrostruk
Copy link
Member

  1. Approving functions before they're run – if a function is "consequential", it should require the user to first approve the action before it happens. If the function is rejected, the result should be modified so that the LLM knows that the function was rejected (instead of just cancelled).

#6109

@dmytrostruk
Copy link
Member

2. Semantic caching – after a prompt has been invoked, a developer should be able to cache the response by using the original question as the key (i.e., embedding). During subsequent prompt renders, a check should be performed to see if a question has already been answered. If so, no request should be sent to the LLM. Instead, the cached answer should be provided. Ideally this sample highlights Redis cache and/or COSMOS DB, but it should make it easy to swap out another memory connector.

#6151

@dmytrostruk
Copy link
Member

Example of PII detection: #6171

@dmytrostruk
Copy link
Member

Example of text summarization and translation evaluation: #6262

@dmytrostruk
Copy link
Member

Example of FrugalGPT: #6815

github-merge-queue bot pushed a commit that referenced this issue Jun 25, 2024
### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

Related: #5436

Kernel and connectors have out-of-the-box telemetry to capture key
information, which is available during requests.
In most cases this telemetry should be enough to understand how the
application behaves.
This example contains the same telemetry recreated using Filters.
This should allow to extend existing telemetry if needed with additional
information and have the same set of logging messages for custom
connectors.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.NET Issue or Pull requests regarding .NET code sk team issue A tag to denote issues that where created by the Semantic Kernel team (i.e., not the community)
Projects
Status: Sprint: Done
Development

No branches or pull requests

5 participants