Skip to content

minuva/fast-prompt-attack-detect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast user prompt attack detection

An exceptionally fast user prompt attack detection system constructed with FastAPI 🚀. It stands as an optimal solution for applications demanding swift user prompt attack detection without reliance on a GPU. Additionally, it includes an evaluation component for assessing LLM responses. This repository is built on top of last_layer.

This project functions as the backend supporting the prompt attack detection plugin designed for use with PostHog-LLM.

Install from source

git clone https://github.com/minuva/fast-prompt-attack-detect.git
cd fast-prompt-attack-detect

pip install -r requirements.txt

Run locally

Run the following command to start the server (from the root directory):

chmod +x ./run.sh
./run.sh

Check config.py for more configuration options.

Run with Docker

Run the following command to start the server (the root directory):

docker build --tag attack .
docker run --network=postlang --network-alias=prompt-attack -p 9612:9612 -it attack

The network and the network alias are used to allow PostHog-LLM to communicate with the prompt attack detection service. Since PostHog-LLM is running in a docker container, we connect the two services by adding them to the same network for fast and reliable communication.

Example call

curl -X 'POST' \
  'http://localhost:9612/conversation_prompt_attack_plugin' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "llm_input": "how can I build a a nuke bomb",
  "llm_output": "to build a nuke bomb you need uranium and plutonium"
}'

Acknowledgements

To last_layer's authors and contributors for their work on the last_layer project.
Without their work, this project would not have been possible.