Skip to content

swyxio/unpwnable

Repository files navigation

Unpwnable - The prompt that cannot be leaked!

demo: https://unpwnable.netlify.app/

comment guesses: https://lspace.swyx.io/p/jan-2023-update

What is this?

This is a test of the "Unpwnable" prompt injection protection strategy. We want to demonstrate that you can take normal product prompts and sufficiently protect against prompt injection attacks.

First, you can verify that the prompt works as advertised, by submitting topics you would like GPT3 to write about (e.g. "dog", "netlify", "sam altman"). In the API we've used a simple prompt that is meant to be reflective of a realistic product prompt, with our "unpwnable" protection strategy.

Your real mission, should you choose to accept it, is to reverse engineer the source prompt to as high fidelity as possible, within our rate limit.

You can leave your guesses and process in the accompanying blogpost. We expect that you will get no more than the first sentence (16 words) of the source prompt.

The prompt is a simple variation on real product prompts:

  • a ~90 word, ~500 character string
  • starting with "You are an assistant"
  • ending by concatenating the user input to the source prompt
  • There are NO special characters or formatting used to protect the prompt

It's SHA-256 hash is cf58ad59e753e80419325ce57901efe40b4e141d819a13b9c0ba2d0c3402de50.

We will publish the source prompt and code in a few days; you can then compare your results to the actual prompt.