Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add detection strategy module abstraction #13

Open
woop opened this issue May 30, 2023 · 1 comment
Open

Add detection strategy module abstraction #13

woop opened this issue May 30, 2023 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@woop
Copy link
Collaborator

woop commented May 30, 2023

Currently, the detection strategies are hard coded in the main application. All of the core logic in Rebuff should be contained within a single library, with detection strategies having their own abstraction. Ideally users can also contribute their own strategies.

@ristomcgehee
Copy link
Collaborator

ristomcgehee commented Oct 17, 2023

Here's the approach I think I would take if I worked on this issue.

When initializing the SDK, the user configures zero or more strategies, where each strategy consists of one or more checks and the score threshold for each check. For example, a user could configure a "fast_and_cheap" strategy that includes the heuristic check, the vector store check, and GPT-3.5. Another configured strategy could be the "slow_and_thorough" strategy that makes multiple calls to GPT-4. One of the strategies will be marked as the default. Then at detection time, the client optionally picks one of the strategies to execute. If the client doesn't pick a strategy, the default one will be used. If the user does not configure a strategy when initializing the SDK, we'll enable a default strategy that is reasonably effective without being too slow.

I'd probably do this in 2 phases, where Phase 1 includes everything I described above and Phase 2 will add the ability to add custom checks.

I'd like to note that this would involve breaking changes to the API when invoking it for detection.

Another possible idea is to add different logic other than "trigger detection if any check fails". Perhaps a weighed voting system or a way to chain checks based on the results of other checks. But I think that can be done as a future improvement in a different issue.

Our code currently uses the term "check" which is what I've been using here, but I think a better term might be "tactic". The user would configure a collection of "tactics" to create a "strategy".

How does all that sound?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants