Add detection strategy module abstraction #13

woop · 2023-05-30T14:37:42Z

Currently, the detection strategies are hard coded in the main application. All of the core logic in Rebuff should be contained within a single library, with detection strategies having their own abstraction. Ideally users can also contribute their own strategies.

ristomcgehee · 2023-10-17T04:32:23Z

Here's the approach I think I would take if I worked on this issue.

When initializing the SDK, the user configures zero or more strategies, where each strategy consists of one or more checks and the score threshold for each check. For example, a user could configure a "fast_and_cheap" strategy that includes the heuristic check, the vector store check, and GPT-3.5. Another configured strategy could be the "slow_and_thorough" strategy that makes multiple calls to GPT-4. One of the strategies will be marked as the default. Then at detection time, the client optionally picks one of the strategies to execute. If the client doesn't pick a strategy, the default one will be used. If the user does not configure a strategy when initializing the SDK, we'll enable a default strategy that is reasonably effective without being too slow.

I'd probably do this in 2 phases, where Phase 1 includes everything I described above and Phase 2 will add the ability to add custom checks.

I'd like to note that this would involve breaking changes to the API when invoking it for detection.

Another possible idea is to add different logic other than "trigger detection if any check fails". Perhaps a weighed voting system or a way to chain checks based on the results of other checks. But I think that can be done as a future improvement in a different issue.

Our code currently uses the term "check" which is what I've been using here, but I think a better term might be "tactic". The user would configure a collection of "tactics" to create a "strategy".

How does all that sound?

woop added this to Rebuff Backlog May 30, 2023

woop converted this from a draft issue May 30, 2023

seanpmorgan added the help wanted Extra attention is needed label Oct 4, 2023

seanpmorgan mentioned this issue Oct 4, 2023

Add integration with Guardrails #7

Closed

seanpmorgan mentioned this issue Oct 16, 2023

Add heuristics for adversarial suffixes #58

Open

ristomcgehee mentioned this issue Dec 23, 2023

Modularize detection checks #90

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add detection strategy module abstraction #13

Add detection strategy module abstraction #13

woop commented May 30, 2023 •

edited

Loading

ristomcgehee commented Oct 17, 2023 •

edited

Loading

Add detection strategy module abstraction #13

Add detection strategy module abstraction #13

Comments

woop commented May 30, 2023 • edited Loading

ristomcgehee commented Oct 17, 2023 • edited Loading

woop commented May 30, 2023 •

edited

Loading

ristomcgehee commented Oct 17, 2023 •

edited

Loading