Skip to content

shriram/porpoise

Repository files navigation

Porpoise

Porpoise uses an LLM to translate HtDP-style purpose-statements into code to evaluate their quality. You can read more about Porpoise in this thread on Twitter/X or Mastodon.

Configuring Porpoise

How to run Porpoise on your own:

  1. Install Racket. Porpoise was lasted tested on Racket 8.11.1.

  2. Clone this repo.

  3. Several files build paths to read and write to the filesystem. All these are routed through the file config.rkt. In it, replace my path with yours. Specifically, just change the content of the function path-maker. Everything is relative to the path generated by it. It could be as simple as replacing the strings "Desktop" "r" "sk" "porpoise" (which on macOS corresponds to the path /Users/sk/Desktop/r/sk/porpoise) with your location for this repository. You can also replace 'home-dir in the call to find-system-path, if needed: see documentation.

  4. Porpoise logs who is using the system and what they do. It has a very simple notion of identity: the user simply indicates who they are. Porpoise checks that they are legal users against the content of the logins.sexp file. Replace the content of that file with the name of legal login names, one per line, which can be any strings you like. These are matched verbatim against what the user enters. You will find it useful to give yourself at least one more username for yourself, so you can log in periodically and make sure the system is working, without it counting against any individual user's analytics. You should probably choose a not-very-guessable name. Note that you don't even have to use the same format as everyone else: if they are all using email addresses, you can use the name of a pet instead. The system just matches strings.

  5. Place your problem definitions in the problem-sets directory. The format is given at the top of problem-sets/sample.sexp. The format is quite particular, and maybe also a bit confusing (for each test, the expected answer is written before the arguments). For now, stick to check-equal? as the testing predicate. See below to learn more about bad-impl, which is not documented in the original Porpoise social media threads.

  6. Define these top-level environment variables (e.g., in your shell):

    1. One called PORPOISE_PROBLEMS. This should name the problem set you want to use. Choose a full filename from the problem-sets directory, e.g., sample.sexp. So a good default value to test that the system is running properly is sample.sexp: e.g.,

      export PORPOISE_PROBLEMS=sample.sexp

    2. One called OPENAI_API_KEY. Put your OpenAI API key here. Configure at OpenAI.

    3. Optionally, one called PORPOISE_TIMEOUT_MS to define how long (in ms) to wait before timing out OpenAI requests. If this is not set, the timeout defaults to 7000, i.e., 7 seconds. Set whatever value seems reasonable for your system. If you see a lot of invalid runs, it means your timeout is probably too low.

Running Porpoise

You should be all ready! Running Porpoise should be as simple as

racket serve.rkt

Parameters

You might want to initialize Porpoise with a user ID (e.g., if users are coming from a form that generates an anonymous identity). To handle this, the starting URL allows you to add a ?init-id=<id> parameter. For instance, if your server is running as

http://localhost:8000/servlets/standalone.rkt

then you can also run

http://localhost:8000/servlets/standalone.rkt?init-id=shriram

and when you do, the identity field will be pre-populated with shriram.

Use this with care! The initial value is inserted into the Web page as the initial identity. It is done using JavaScript rather than inserted directly into the page source, but there may still be some weird error cases. Also, this mechanism is visible to users.

Logging and Output

While the instructions above tell you how to start running Porpoise, for longer-term use, you need it to be more robust. You will want to avoid it shutting down accidentally (e.g., when you close a shell). You should also redirect standard output and error to files to inspect. Something like

nohup racket serve.rkt >> std-out 2>> std-err &

should do the trick:

  • & puts it in the background right away
  • nohup ensures that if your connection to the server is lost, Porpoise doesn't necessarily stop running
  • >> std-out puts all the standard output in the file std-out
  • 2>> std-err puts all the error output in the file std-err

Of course, you should feel free to put things in different paths and replace their pathnames in the above line. In particular, it is smart to give an explicit pathname to the Racket binary, so if another one gets installed, Porpoise doesn't suddenly fail.

Error Logs

You can expect to see the following kinds of content in std-err:

system error: Too many open files; errno=24

This just means too many people are trying to use Porpoise concurrently for your current system networking configuration. Have users take a break and space things out.

system error: System error; gai_err=-11

This means OpenAI was temporarily unresponsive. This doesn't happen too often or for too long.

Connection error: read-request: malformed request

Ha, these are fun! These are bots that are trying to attack your server using requests that are known to be problematic (i.e., trigger security vulnerabilities) on other mainstream Web servers. They fail harmlessly on the Racket Web server. (Still, it is a bit disconcerting to be under attack this way. If possible, use a VPN, so that the machine isn't even visible outside your campus network. If you are still being attacked…)

Seed Programs

In collaboration with Jan Vahrenhold, Porpoise has a feature where users can be shown a seed program before they begin writing their prompts. This feature is optional; if you don't provide a seed program in the configuration of a problem (using bad-impl), then it's as if it it doesn't exist. However, if you do provide a seed, then the following are relevant:

  • The program must be a properly formatted s-expression. However, it's supplied as a string in case you need to do something strange in here that would otherwise not pass muster.
  • Use the same name as the synthesized-name for the function (unless you have some really good reason not to).
  • Make sure you get the indentation inside the string right, so it doesn't look weird to users and throw them off.
  • Not everyone is shown this! Rather, this is done in an A/B manner. To understand the logic, look for show-bad-impl in serve.rkt.

Docker

Thore Thießen has very kindly created a Dockerfile to enable Porpoise to run inside Docker. There's a small chance it may get out-of-date relative to the current set of files, so please double-check before using. Thanks, Thore!

Logo

The logo is from vintage map elements by Vector Tradition, licensed under the Adobe Education License. I like to think of it as a porpoise as depicted on a nautical map used by a pirate.

About

use an LLM to translate purpose-statements into code to evaluate their quality

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published