Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command showing available options for installed models #82

Closed
simonw opened this issue Jul 8, 2023 · 9 comments
Closed

Command showing available options for installed models #82

simonw opened this issue Jul 8, 2023 · 9 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request plugins
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Jul 8, 2023

This might be part of llm models list or may be something else.

Follows:

@simonw simonw added documentation Improvements or additions to documentation enhancement New feature or request plugins labels Jul 8, 2023
@simonw simonw added this to the 0.5 milestone Jul 10, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

I'll do this:

llm models list --options

And introspect the Options class.

@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

I got this working:

OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
Markov: markov
  length: int
  delay: float
PaLM 2: chat-bison-001 (aliases: palm, palm2)
gpt4all: orca-mini-3b - Orca (Small), 1.80GB download, needs 4GB RAM (installed)
gpt4all: ggml-gpt4all-j-v1 - Groovy, 3.53GB download, needs 8GB RAM (installed)
gpt4all: orca-mini-7b - Orca, 3.53GB download, needs 8GB RAM (installed)
gpt4all: ggml-replit-code-v1-3b - Replit, 4.84GB download, needs 4GB RAM (installed)
gpt4all: ggml-vicuna-13b-1 - Vicuna (large), 7.58GB download, needs 16GB RAM (installed)
gpt4all: nous-hermes-13b - Hermes, 7.58GB download, needs 16GB RAM (installed)
gpt4all: ggml-model-gpt4all-falcon-q4_0 - GPT4All Falcon, 3.78GB download, needs 8GB RAM
gpt4all: ggml-vicuna-7b-1 - Vicuna, 3.92GB download, needs 8GB RAM
gpt4all: ggml-wizardLM-7B - Wizard, 3.92GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-base - MPT Base, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-instruct - MPT Instruct, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-chat - MPT Chat, 4.52GB download, needs 8GB RAM
gpt4all: orca-mini-13b - Orca (Large), 6.82GB download, needs 16GB RAM
gpt4all: GPT4All-13B-snoozy - Snoozy, 7.58GB download, needs 16GB RAM
gpt4all: ggml-nous-gpt4-vicuna-13b - Nous Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: ggml-stable-vicuna-13B - Stable Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: wizardLM-13B-Uncensored - Wizard Uncensored, 7.58GB download, needs 16GB RAM
Mpt30b: mpt30b (aliases: mpt)
  verbose: <class 'bool'>

I don't like it outputting the same help multiple times though. I'm going to have it only output the detailed descriptions once per model of each class.

@simonw simonw closed this as completed in 8f7c3a9 Jul 10, 2023
simonw added a commit that referenced this issue Jul 10, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Tests are failing now - docs/usage.md is detected as changed by cog on Python 3.8 for some reason.

@simonw simonw reopened this Jul 10, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Could be related to plugin order. It shouldn't though, I would expect this order to be the same each time with only the openai default plugin installed:

llm/llm/__init__.py

Lines 49 to 56 in 18f34b5

def get_models_with_aliases() -> List["ModelWithAliases"]:
model_aliases = []
def register(model, aliases=None):
model_aliases.append(ModelWithAliases(model, aliases or set()))
pm.hook.register_models(register=register)
return model_aliases

@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Could be the order of this bit:

llm/llm/cli.py

Line 381 in 18f34b5

for name, field in model_with_aliases.model.Options.model_fields.items():

simonw added a commit that referenced this issue Jul 10, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Actually the problem was something else. I copied the generated text to my local environment and did a diff and got this:

diff --git a/docs/usage.md b/docs/usage.md
index 8e76010..dfc4c16 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -98,53 +98,53 @@ cog.out("```\n{}\n```".format(result.output))
 OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
-  temperature: float
+  temperature: Union[float, NoneType]
     What sampling temperature to use, between 0 and 2. Higher values like
     0.8 will make the output more random, while lower values like 0.2 will
     make it more focused and deterministic.
-  max_tokens: int
+  max_tokens: Union[int, NoneType]
     Maximum number of tokens to generate
-  top_p: float
+  top_p: Union[float, NoneType]
     An alternative to sampling with temperature, called nucleus sampling,
     where the model considers the results of the tokens with top_p
     probability mass. So 0.1 means only the tokens comprising the top 10%
     probability mass are considered. Recommended to use top_p or
     temperature but not both.
-  frequency_penalty: float
+  frequency_penalty: Union[float, NoneType]
     Number between -2.0 and 2.0. Positive values penalize new tokens based
     on their existing frequency in the text so far, decreasing the model's
     likelihood to repeat the same line verbatim.
-  presence_penalty: float
+  presence_penalty: Union[float, NoneType]
     Number between -2.0 and 2.0. Positive values penalize new tokens based
     on whether they appear in the text so far, increasing the model's
     likelihood to talk about new topics.
-  stop: str
+  stop: Union[str, NoneType]
     A string where the API will stop generating further tokens.
   logit_bias: Union[dict, str, NoneType]
     Modify the likelihood of specified tokens appearing in the completion.
 OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
-  temperature: float
-  max_tokens: int
-  top_p: float
-  frequency_penalty: float
-  presence_penalty: float
-  stop: str
+  temperature: Union[float, NoneType]
+  max_tokens: Union[int, NoneType]
+  top_p: Union[float, NoneType]
+  frequency_penalty: Union[float, NoneType]
+  presence_penalty: Union[float, NoneType]
+  stop: Union[str, NoneType]
   logit_bias: Union[dict, str, NoneType]
 OpenAI Chat: gpt-4 (aliases: 4, gpt4)
-  temperature: float
-  max_tokens: int
-  top_p: float
-  frequency_penalty: float
-  presence_penalty: float
-  stop: str
+  temperature: Union[float, NoneType]
+  max_tokens: Union[int, NoneType]
+  top_p: Union[float, NoneType]
+  frequency_penalty: Union[float, NoneType]
+  presence_penalty: Union[float, NoneType]
+  stop: Union[str, NoneType]
   logit_bias: Union[dict, str, NoneType]
 OpenAI Chat: gpt-4-32k (aliases: 4-32k)
-  temperature: float
-  max_tokens: int
-  top_p: float
-  frequency_penalty: float
-  presence_penalty: float
-  stop: str
+  temperature: Union[float, NoneType]
+  max_tokens: Union[int, NoneType]
+  top_p: Union[float, NoneType]
+  frequency_penalty: Union[float, NoneType]
+  presence_penalty: Union[float, NoneType]
+  stop: Union[str, NoneType]
   logit_bias: Union[dict, str, NoneType]

So clearly on Python 3.8 this bit of code has a different output:

llm/llm/cli.py

Lines 382 to 384 in 18f34b5

type_info = str(field.annotation).replace("typing.", "")
if type_info.startswith("Optional["):
type_info = type_info[9:-1]

@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

To test locally I ran:

pyenv install 3.8.17

Then waited for that to compile.

Then:

~/.pyenv/versions/3.8.17/bin/python -m venv /tmp/pvenv
source /tmp/pvenv/bin/activate
pip install -e '.[test]'
/tmp/pvenv/bin/cog --check docs/usage.md

And to rewrite it:

/tmp/pvenv/bin/cog -r docs/usage.md

simonw added a commit that referenced this issue Jul 10, 2023
@simonw simonw closed this as completed Jul 10, 2023
@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

@simonw
Copy link
Owner Author

simonw commented Jul 10, 2023

Now documented here: https://llm.datasette.io/en/latest/usage.html#listing-available-models - including a cog powered example.

simonw added a commit that referenced this issue Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request plugins
Projects
None yet
Development

No branches or pull requests

1 participant