Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better ways of storing and accessing API keys #13

Closed
tonydewan opened this issue May 24, 2023 · 18 comments
Closed

Better ways of storing and accessing API keys #13

tonydewan opened this issue May 24, 2023 · 18 comments

Comments

@tonydewan
Copy link

I'm not sure this is actually an issue, as I've developed a workaround, but I thought it was worth bringing up for discussion.

I prefer to keep keys like this in my password manager. Among other things, it allows secure access and consistent sync across machines. I already had a function to access the key, but I don't want to call it on every new shell session as it pops up a prompt in my password manager. I'd prefer to only do that when using the tool, and only the first time in the shell session.

So, I wrote a wrapper function to do that:

llm() {
  if [ -z "$OPENAI_API_KEY" ]; then
    export OPENAI_API_KEY=`open_ai_key`
  fi
  command llm "$@"
}

I use zsh on macOS.

I'm not aware of other patterns that llm could use to look for a key, and you already provide two reasonable ones...but if there was a third way that would obviate the need for my little wrapper, that would be cool! Otherwise, maybe someone else finds this helpful.

@nealmcb
Copy link

nealmcb commented Jun 9, 2023

Another pattern I'm seeing for configuring API keys is in the (excellent and free) DeepLearning.AI Short Courses. They rely on the dotenv module, and allow more flexible configuration, allowing other API keys to be set.

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

That reads the API keys from the .env file, and requires that the file contents look like OPENAI_API_KEY=sk...

So unfortunately you can't use the same file for both mechanisms. Note if you try to reuse the .env file, you'll get this rather confusing error:

openai.error.AuthenticationError: <empty message>

I hope they provide a better error message via the discussion at openai/openai-python#464

Back to llm: I suggest supporting (or even switching to) the dotenv approach, since I dare say that people will want to specify multiple API keys for the burgeoning variety of LLMs and plugins out there.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

This is now blocking this issue, because I need to solve it for other API key providers too:

@simonw simonw changed the title Alternative Access to OpenAI Key Better ways of storing and accessing API keys Jun 15, 2023
@simonw
Copy link
Owner

simonw commented Jun 15, 2023

I'm going to start storing keys in user_data_dir from #7. I'll add a command people can use to manage their keys (if they don't want to use environment variables).

I think I'll leave people who want to use .env to set that up themselves - it looks like the pip install "python-dotenv[cli]" tool provides options for managing those already, so I don't need to add it as a dependency to this tool. https://github.com/theskumar/python-dotenv#command-line-interface

You can do dotenv run llm ... to run the llm command with environment variables loaded from your .env.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

I'm taking inspiration from how Vercel does this: https://til.simonwillison.net/macos/fs-usage

$ cat ~/Library/Application\ Support/com.vercel.cli/auth.json 
{
  "// Note": "This is your Vercel credentials file. DO NOT SHARE!",
  "// Docs": "https://vercel.com/docs/project-configuration#global-configuration/auth-json",
  "token": "... redacted ..."
}

I'm going to use the same filename - auth.json - in my user_data_dir.

I'll use different keys for the tokens though. Each command/provider will have a default key - "openai" for OpenAI, something else for the PaLM ones.

When using llm ... the following will happen:

  • If the command was called with --token xyz and xyz appears to be a token, it will be used.
  • If --token xyz matches a key in the auth.json dictionary, the value of that key will be used. This supports having multiple keys so you can do things like llm "some prompt" --token personal.
  • Now check for the environment variable - $OPENAI_API_KEY or whatever.
  • If no environment variable, use the default key e.g. "openai" in auth.json.

This design ensures people can over-ride their current environment variable if they want to by specifying --token openai.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

I'm not going to have a -t shortcut for --token because I want to discourage people from using the --token option - it feels less secure to me than using environment variables or tokens stored in a file. I also may want to use -t for something else in the future.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

How do users get their tokens into the system? They can edit auth.json directly, but I'll also provide a llm tokens set of commands for this:

$ llm tokens set openai
Enter token: 

$ llm tokens path
~/.share/llm/auth.json

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

Now I'm torn on naming. It would be good if the filename and the commands and the concepts were all consistent with each other.

Some options:

llm/tokens.json
$ llm tokens set openai2
$ llm "pelican names" --token openai2

I worry that "tokens" are less obviously secret than the others.

llm/auth.json
$ llm auth set openai2
$ llm "pelican names" --auth openai2
llm/keys.json
$ llm keys set openai2
$ llm "pelican names" --key openai2

Keys do at least have a clear implication that they should be protected.

llm/secrets.json
$ llm secrets set openai2
$ llm "pelican names" --secret openai2

I feel like secrets could be misunderstood to mean some other concept.

@sderev
Copy link
Contributor

sderev commented Jun 15, 2023

How do users get their tokens into the system? They can edit auth.json directly, but I'll also provide a llm tokens set of commands for this:

$ llm tokens set openai
Enter token: 

$ llm tokens path
~/.share/llm/auth.json

What do you think of using tokens / -t/--tokens for calculating the number of tokens in a prompt?

I know you have ttok for this usage; but I believe it could be better to integrate this feature in llm, for it could count the number of tokens on its own and automatically choose the best model based on its context length.

And then having key or something to set up the API keys.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

OpenAI call them "API keys". Google call them all sorts of things - gcloud auth print-access-token and ?key=xxx in URLs and API keys on https://cloud.google.com/docs/authentication/api-keys and I'm sure I could find more if I kept looking.

I'm going to call them keys, and go with this:

llm/keys.json
$ llm keys set openai2
$ llm "pelican names" --key openai2

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

What do you think of using tokens / -t/--tokens for calculating the number of tokens in a prompt?

Yes! That's a great reason not to use "tokens" to mean authentication tokens, since tokens already means something different in LLM space.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

I'm going with:

/Users/simon/Library/Application Support/io.datasette.llm/keys.json

On macOS at least - that's generated using:

def keys_path():
    return os.path.join(user_dir(), "keys.json")


def user_dir():
    return user_data_dir("io.datasette.llm", "Datasette")

I plan to move the docs for this tool to llm.datasette.io shortly.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

Partly for unit testing convenience I'm going to allow the path to io.datasette.llm/keys.json to be over-ridden by an environment variable.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

Prototype - this seems to work well:

diff --git a/llm/cli.py b/llm/cli.py
index 37dd9ed..b7d7112 100644
--- a/llm/cli.py
+++ b/llm/cli.py
@@ -1,9 +1,13 @@
 import click
 from click_default_group import DefaultGroup
 import datetime
+import getpass
 import json
 import openai
 import os
+import pathlib
+from platformdirs import user_data_dir
+import requests
 import sqlite_utils
 import sys
 import warnings
@@ -124,6 +128,50 @@ def init_db():
     db.vacuum()
 
 
+@cli.group()
+def keys():
+    "Manage API keys for different models"
+
+
+@keys.command()
+def path():
+    "Output path to keys.json file"
+    click.echo(keys_path())
+
+
+def keys_path():
+    return os.path.join(user_dir(), "keys.json")
+
+
+def user_dir():
+    return user_data_dir("io.datasette.llm", "Datasette")
+
+
+@keys.command(name="set")
+@click.argument("name")
+def set_(name):
+    """
+    Save a key in keys.json
+
+    Example usage:
+
+        $ llm keys set openai
+        Enter key: ...
+    """
+    default = {"// Note": "This file stores secret API credentials. Do not share!"}
+    path = pathlib.Path(keys_path())
+    path.parent.mkdir(parents=True, exist_ok=True)
+    value = getpass.getpass("Enter key: ").strip()
+    if not path.exists():
+        path.write_text(json.dumps(default))
+    try:
+        current = json.loads(path.read_text())
+    except json.decoder.JSONDecodeError:
+        current = default
+    current[name] = value
+    path.write_text(json.dumps(current, indent=2) + "\n")
+
+
 @cli.command()
 @click.option(
     "-n",
diff --git a/setup.py b/setup.py
index dc75536..b36718b 100644
--- a/setup.py
+++ b/setup.py
@@ -31,7 +31,13 @@ setup(
         [console_scripts]
         llm=llm.cli:cli
     """,
-    install_requires=["click", "openai", "click-default-group-wheel", "sqlite-utils"],
+    install_requires=[
+        "click",
+        "openai",
+        "click-default-group-wheel",
+        "sqlite-utils",
+        "platformdirs",
+    ],
     extras_require={"test": ["pytest", "requests-mock"]},
     python_requires=">=3.7",
 )

@sderev
Copy link
Contributor

sderev commented Jun 15, 2023

def keys_path():
    return os.path.join(user_dir(), "keys.json")

Since you're using pathlib later in the code, what do you think of making pathlib the default lib to handle it?

The same result can be achieved with return Path.home() / "keys.json"

I suggested the generalization of pathlib in #19 😊.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

OK, I implemented these two commands:

$ llm keys --help
Usage: llm keys [OPTIONS] COMMAND [ARGS]...

  Manage API keys for different models

Options:
  --help  Show this message and exit.

Commands:
  path  Output path to keys.json file
  set   Save a key in keys.json
$ llm keys path --help
Usage: llm keys path [OPTIONS]

  Output path to keys.json file

Options:
  --help  Show this message and exit.
$ llm keys set --help
Usage: llm keys set [OPTIONS] NAME

  Save a key in keys.json

  Example usage:

      $ llm keys set openai
      Enter key: ...

Options:
  --value TEXT  Value to set
  --help        Show this message and exit.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

Still needed:

  • Update get_openai_api_key() to use the new keys mechanism (including checking for OPENAI_API_KEY)
  • Documentation, including upgrade documentation for the changelog
  • Move /.llm/log.db to the new location

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

Since you're using pathlib later in the code, what do you think of making pathlib the default lib to handle it?

Good call, I'll do that as part of cleaning up this code.

@simonw
Copy link
Owner

simonw commented Jun 15, 2023

This is implemented, next step is to write the docs for it - which is blocked on:

@simonw simonw closed this as completed in 50f0b2a Jun 15, 2023
simonw added a commit that referenced this issue Jun 17, 2023
simonw added a commit that referenced this issue Jul 10, 2023
Also documents new keys.json mechanism, closes #13
simonw added a commit that referenced this issue Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants