Record actual model used to run the prompt #34

simonw · 2023-06-16T07:03:44Z

Right now I'm just recording the model that was requested, e.g. gpt-3.5-turbo in the model column.

But... it turns out the response from OpenAI includes this - "model": "gpt-3.5-turbo-0301" - and there are meaningful differences between those model versions, e.g. the latest is gpt-3.5-turbo-0613 but you have to opt into it.

I'd like to record the model that was actually used. Not sure how best to put this in the schema though, since it may only make sense for OpenAI models.

The text was updated successfully, but these errors were encountered:

simonw · 2023-06-16T07:04:22Z

Current schema:

% sqlite-utils schema "$(llm logs path)"
CREATE TABLE [_llm_migrations] (
   [name] TEXT PRIMARY KEY,
   [applied_at] TEXT
);
CREATE TABLE "log" (
   [id] INTEGER PRIMARY KEY,
   [model] TEXT,
   [timestamp] TEXT,
   [prompt] TEXT,
   [system] TEXT,
   [response] TEXT,
   [chat_id] INTEGER REFERENCES [log]([id])
);

simonw · 2023-06-16T07:05:07Z

There's other data about individual runs that I'm interested in storing. For non-streaming responses from OpenAI I get back this:

  "created": 1686896201,
  "id": "chatcmpl-7Rx3BL9grubSusAyCEiRoJta8vEh7",
  "model": "gpt-3.5-turbo-0301",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 399,
    "prompt_tokens": 15,
    "total_tokens": 414
  }
}

I don't think I get the "usage" block for streaming responses, which is annoying.

simonw · 2023-06-16T07:06:34Z

I have another feature in the pipeline that will use a different model from the requested one:

A model that picks the right sized model #32

That may want to store "user requested 'auto' but we ran gpt-4-32k." But that's evev more confusing, because there are actually three models there - auto was requested, gpt-4-32k was then selected, but gpt-4-32k-0601 or whatever was actually executed.

I think in that case I don't actually care that they said "auto".

simonw · 2023-06-16T07:12:14Z

I'm going to add a duration_ms integer column to store the duration of the prompt, and a debug column which I'll dump JSON into with model-specific debug things - that's usage and model for the OpenAI ones and who-knows-what for the other models.

simonw · 2023-06-16T07:39:40Z

@migration
def m005_debug(db):
    db["log"].add_column("debug", str)
    db["log"].add_column("duration_ms", int)

simonw · 2023-06-16T07:58:31Z

Example output:

% llm logs
[
  {
    "id": 435,
    "model": "gpt-3.5-turbo",
    "timestamp": "2023-06-16 07:46:45.781006",
    "prompt": "say one duration",
    "system": null,
    "response": "1 hour",
    "chat_id": null,
    "debug": "{\"model\": \"gpt-3.5-turbo-0301\"}",
    "duration_ms": 820
  },
  {
    "id": 434,
    "model": "gpt-3.5-turbo",
    "timestamp": "2023-06-16 07:46:42.106479",
    "prompt": "say one duration",
    "system": null,
    "response": "One hour.",
    "chat_id": null,
    "debug": "{\"model\": \"gpt-3.5-turbo-0301\", \"usage\": {\"prompt_tokens\": 11, \"completion_tokens\": 3, \"total_tokens\": 14}}",
    "duration_ms": 1364
  },

simonw · 2023-06-16T07:59:08Z

Updated schema: https://llm.datasette.io/en/latest/logging.html#sql-schema

simonw added the enhancement New feature or request label Jun 16, 2023

simonw added this to the 0.4 milestone Jun 16, 2023

simonw closed this as completed in 2b1169c Jun 16, 2023

simonw added a commit that referenced this issue Jun 16, 2023

Fix schema display, refs #34

392f865

simonw added a commit that referenced this issue Jul 10, 2023

Store debug info, closes #34

8308fe5

simonw added a commit that referenced this issue Jul 10, 2023

Fix schema display, refs #34

cd1e0f8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record actual model used to run the prompt #34

Record actual model used to run the prompt #34

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023 •

edited

Loading

simonw commented Jun 16, 2023 •

edited

Loading

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

Record actual model used to run the prompt #34

Record actual model used to run the prompt #34

Comments

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023 • edited Loading

simonw commented Jun 16, 2023 • edited Loading

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023

simonw commented Jun 16, 2023 •

edited

Loading

simonw commented Jun 16, 2023 •

edited

Loading