Better ways of storing and accessing API keys #13

tonydewan · 2023-05-24T19:01:46Z

I'm not sure this is actually an issue, as I've developed a workaround, but I thought it was worth bringing up for discussion.

I prefer to keep keys like this in my password manager. Among other things, it allows secure access and consistent sync across machines. I already had a function to access the key, but I don't want to call it on every new shell session as it pops up a prompt in my password manager. I'd prefer to only do that when using the tool, and only the first time in the shell session.

So, I wrote a wrapper function to do that:

llm() {
  if [ -z "$OPENAI_API_KEY" ]; then
    export OPENAI_API_KEY=`open_ai_key`
  fi
  command llm "$@"
}

I use zsh on macOS.

I'm not aware of other patterns that llm could use to look for a key, and you already provide two reasonable ones...but if there was a third way that would obviate the need for my little wrapper, that would be cool! Otherwise, maybe someone else finds this helpful.

The text was updated successfully, but these errors were encountered:

nealmcb · 2023-06-09T19:43:21Z

Another pattern I'm seeing for configuring API keys is in the (excellent and free) DeepLearning.AI Short Courses. They rely on the dotenv module, and allow more flexible configuration, allowing other API keys to be set.

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

That reads the API keys from the .env file, and requires that the file contents look like OPENAI_API_KEY=sk...

So unfortunately you can't use the same file for both mechanisms. Note if you try to reuse the .env file, you'll get this rather confusing error:

openai.error.AuthenticationError: <empty message>

I hope they provide a better error message via the discussion at openai/openai-python#464

Back to llm: I suggest supporting (or even switching to) the dotenv approach, since I dare say that people will want to specify multiple API keys for the burgeoning variety of LLMs and plugins out there.

simonw · 2023-06-15T06:42:32Z

This is now blocking this issue, because I need to solve it for other API key providers too:

Add support for Google's PaLM 2 #20

simonw · 2023-06-15T06:46:18Z

I'm going to start storing keys in user_data_dir from #7. I'll add a command people can use to manage their keys (if they don't want to use environment variables).

I think I'll leave people who want to use .env to set that up themselves - it looks like the pip install "python-dotenv[cli]" tool provides options for managing those already, so I don't need to add it as a dependency to this tool. https://github.com/theskumar/python-dotenv#command-line-interface

You can do dotenv run llm ... to run the llm command with environment variables loaded from your .env.

simonw · 2023-06-15T07:26:15Z

I'm taking inspiration from how Vercel does this: https://til.simonwillison.net/macos/fs-usage

$ cat ~/Library/Application\ Support/com.vercel.cli/auth.json 
{
  "// Note": "This is your Vercel credentials file. DO NOT SHARE!",
  "// Docs": "https://vercel.com/docs/project-configuration#global-configuration/auth-json",
  "token": "... redacted ..."
}

I'm going to use the same filename - auth.json - in my user_data_dir.

I'll use different keys for the tokens though. Each command/provider will have a default key - "openai" for OpenAI, something else for the PaLM ones.

When using llm ... the following will happen:

If the command was called with --token xyz and xyz appears to be a token, it will be used.
If --token xyz matches a key in the auth.json dictionary, the value of that key will be used. This supports having multiple keys so you can do things like llm "some prompt" --token personal.
Now check for the environment variable - $OPENAI_API_KEY or whatever.
If no environment variable, use the default key e.g. "openai" in auth.json.

This design ensures people can over-ride their current environment variable if they want to by specifying --token openai.

simonw · 2023-06-15T07:27:00Z

I'm not going to have a -t shortcut for --token because I want to discourage people from using the --token option - it feels less secure to me than using environment variables or tokens stored in a file. I also may want to use -t for something else in the future.

simonw · 2023-06-15T07:28:59Z

How do users get their tokens into the system? They can edit auth.json directly, but I'll also provide a llm tokens set of commands for this:

$ llm tokens set openai
Enter token: 

$ llm tokens path
~/.share/llm/auth.json

simonw · 2023-06-15T07:30:40Z

Now I'm torn on naming. It would be good if the filename and the commands and the concepts were all consistent with each other.

Some options:

llm/tokens.json
$ llm tokens set openai2
$ llm "pelican names" --token openai2

I worry that "tokens" are less obviously secret than the others.

llm/auth.json
$ llm auth set openai2
$ llm "pelican names" --auth openai2

llm/keys.json
$ llm keys set openai2
$ llm "pelican names" --key openai2

Keys do at least have a clear implication that they should be protected.

llm/secrets.json
$ llm secrets set openai2
$ llm "pelican names" --secret openai2

I feel like secrets could be misunderstood to mean some other concept.

sderev · 2023-06-15T07:34:33Z

How do users get their tokens into the system? They can edit auth.json directly, but I'll also provide a llm tokens set of commands for this:
$ llm tokens set openai
Enter token: 

$ llm tokens path
~/.share/llm/auth.json

What do you think of using tokens / -t/--tokens for calculating the number of tokens in a prompt?

I know you have ttok for this usage; but I believe it could be better to integrate this feature in llm, for it could count the number of tokens on its own and automatically choose the best model based on its context length.

And then having key or something to set up the API keys.

simonw · 2023-06-15T07:36:28Z

OpenAI call them "API keys". Google call them all sorts of things - gcloud auth print-access-token and ?key=xxx in URLs and API keys on https://cloud.google.com/docs/authentication/api-keys and I'm sure I could find more if I kept looking.

I'm going to call them keys, and go with this:

llm/keys.json
$ llm keys set openai2
$ llm "pelican names" --key openai2

simonw · 2023-06-15T07:36:59Z

What do you think of using tokens / -t/--tokens for calculating the number of tokens in a prompt?

Yes! That's a great reason not to use "tokens" to mean authentication tokens, since tokens already means something different in LLM space.

simonw · 2023-06-15T07:44:48Z

I'm going with:

/Users/simon/Library/Application Support/io.datasette.llm/keys.json

On macOS at least - that's generated using:

def keys_path():
    return os.path.join(user_dir(), "keys.json")


def user_dir():
    return user_data_dir("io.datasette.llm", "Datasette")

I plan to move the docs for this tool to llm.datasette.io shortly.

simonw · 2023-06-15T07:57:46Z

Partly for unit testing convenience I'm going to allow the path to io.datasette.llm/keys.json to be over-ridden by an environment variable.

simonw · 2023-06-15T07:58:17Z

Prototype - this seems to work well:

diff --git a/llm/cli.py b/llm/cli.py
index 37dd9ed..b7d7112 100644
--- a/llm/cli.py
+++ b/llm/cli.py
@@ -1,9 +1,13 @@
 import click
 from click_default_group import DefaultGroup
 import datetime
+import getpass
 import json
 import openai
 import os
+import pathlib
+from platformdirs import user_data_dir
+import requests
 import sqlite_utils
 import sys
 import warnings
@@ -124,6 +128,50 @@ def init_db():
     db.vacuum()
 
 
+@cli.group()
+def keys():
+    "Manage API keys for different models"
+
+
+@keys.command()
+def path():
+    "Output path to keys.json file"
+    click.echo(keys_path())
+
+
+def keys_path():
+    return os.path.join(user_dir(), "keys.json")
+
+
+def user_dir():
+    return user_data_dir("io.datasette.llm", "Datasette")
+
+
+@keys.command(name="set")
+@click.argument("name")
+def set_(name):
+    """
+    Save a key in keys.json
+
+    Example usage:
+
+        $ llm keys set openai
+        Enter key: ...
+    """
+    default = {"// Note": "This file stores secret API credentials. Do not share!"}
+    path = pathlib.Path(keys_path())
+    path.parent.mkdir(parents=True, exist_ok=True)
+    value = getpass.getpass("Enter key: ").strip()
+    if not path.exists():
+        path.write_text(json.dumps(default))
+    try:
+        current = json.loads(path.read_text())
+    except json.decoder.JSONDecodeError:
+        current = default
+    current[name] = value
+    path.write_text(json.dumps(current, indent=2) + "\n")
+
+
 @cli.command()
 @click.option(
     "-n",
diff --git a/setup.py b/setup.py
index dc75536..b36718b 100644
--- a/setup.py
+++ b/setup.py
@@ -31,7 +31,13 @@ setup(
         [console_scripts]
         llm=llm.cli:cli
     """,
-    install_requires=["click", "openai", "click-default-group-wheel", "sqlite-utils"],
+    install_requires=[
+        "click",
+        "openai",
+        "click-default-group-wheel",
+        "sqlite-utils",
+        "platformdirs",
+    ],
     extras_require={"test": ["pytest", "requests-mock"]},
     python_requires=">=3.7",
 )

sderev · 2023-06-15T08:04:57Z

def keys_path():
    return os.path.join(user_dir(), "keys.json")

Since you're using pathlib later in the code, what do you think of making pathlib the default lib to handle it?

The same result can be achieved with return Path.home() / "keys.json"

I suggested the generalization of pathlib in #19 😊.

simonw · 2023-06-15T08:15:02Z

OK, I implemented these two commands:

$ llm keys --help
Usage: llm keys [OPTIONS] COMMAND [ARGS]...

  Manage API keys for different models

Options:
  --help  Show this message and exit.

Commands:
  path  Output path to keys.json file
  set   Save a key in keys.json
$ llm keys path --help
Usage: llm keys path [OPTIONS]

  Output path to keys.json file

Options:
  --help  Show this message and exit.
$ llm keys set --help
Usage: llm keys set [OPTIONS] NAME

  Save a key in keys.json

  Example usage:

      $ llm keys set openai
      Enter key: ...

Options:
  --value TEXT  Value to set
  --help        Show this message and exit.

simonw · 2023-06-15T08:17:22Z

Still needed:

Update get_openai_api_key() to use the new keys mechanism (including checking for OPENAI_API_KEY)
Documentation, including upgrade documentation for the changelog
Move /.llm/log.db to the new location

simonw · 2023-06-15T08:19:11Z

Since you're using pathlib later in the code, what do you think of making pathlib the default lib to handle it?

Good call, I'll do that as part of cleaning up this code.

simonw · 2023-06-15T16:22:32Z

This is implemented, next step is to write the docs for it - which is blocked on:

llm.datasette.io documentation site #21

Refs #6, #13, #15, #16, #17, #21, #24, #25, #35, #37

Also documents new keys.json mechanism, closes #13

Refs #6, #13, #15, #16, #17, #21, #24, #25, #35, #37

nealmcb mentioned this issue Jun 9, 2023

Consider using XDG_DATA_HOME directory instead of .llm #7

Closed

simonw mentioned this issue Jun 15, 2023

Add support for Google's PaLM 2 #20

Closed

simonw changed the title ~~Alternative Access to OpenAI Key~~ Better ways of storing and accessing API keys Jun 15, 2023

simonw mentioned this issue Jun 15, 2023

llm.datasette.io documentation site #21

Closed

simonw added a commit that referenced this issue Jun 15, 2023

llm keys set and llm keys path commands, refs #13

e8b1b08

simonw added a commit that referenced this issue Jun 15, 2023

Now uses API key from keys.json, refs #13

17d27ee

simonw closed this as completed in 50f0b2a Jun 15, 2023

simonw mentioned this issue Jun 16, 2023

Release notes for 0.4 #28

Closed

simonw added a commit that referenced this issue Jun 17, 2023

Release 0.4

f9ed733

Refs #6, #13, #15, #16, #17, #21, #24, #25, #35, #37

simonw added a commit that referenced this issue Jul 10, 2023

llm keys set and llm keys path commands, refs #13

246266b

simonw added a commit that referenced this issue Jul 10, 2023

Now uses API key from keys.json, refs #13

9fb3293

simonw added a commit that referenced this issue Jul 10, 2023

Multi-page docs using Markdown and Sphinx, refs #21

b9865a5

Also documents new keys.json mechanism, closes #13

simonw added a commit that referenced this issue Jul 10, 2023

Release 0.4

2d3d297

Refs #6, #13, #15, #16, #17, #21, #24, #25, #35, #37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better ways of storing and accessing API keys #13

Better ways of storing and accessing API keys #13

tonydewan commented May 24, 2023

nealmcb commented Jun 9, 2023 •

edited

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023 •

edited

sderev commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

sderev commented Jun 15, 2023 •

edited by simonw

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

Better ways of storing and accessing API keys #13

Better ways of storing and accessing API keys #13

Comments

tonydewan commented May 24, 2023

nealmcb commented Jun 9, 2023 • edited

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023 • edited

sderev commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

sderev commented Jun 15, 2023 • edited by simonw

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

simonw commented Jun 15, 2023

nealmcb commented Jun 9, 2023 •

edited

simonw commented Jun 15, 2023 •

edited

sderev commented Jun 15, 2023 •

edited by simonw