Command showing available options for installed models #82

simonw · 2023-07-08T22:24:07Z

This might be part of llm models list or may be something else.

Follows:

Plugin hook: register_models #53

The text was updated successfully, but these errors were encountered:

simonw · 2023-07-10T20:11:51Z

I'll do this:

llm models list --options

And introspect the Options class.

simonw · 2023-07-10T20:20:53Z

I got this working:

OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-4 (aliases: 4, gpt4)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
OpenAI Chat: gpt-4-32k (aliases: 4-32k)
  temperature: float - What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
  max_tokens: int - Maximum number of tokens to generate
  top_p: float - An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Recommended to use top_p or temperature but not both.
  frequency_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
  presence_penalty: float - Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
  stop: str - A string where the API will stop generating further tokens.
  logit_bias: Union[dict, str, NoneType] - Modify the likelihood of specified tokens appearing in the completion.
Markov: markov
  length: int
  delay: float
PaLM 2: chat-bison-001 (aliases: palm, palm2)
gpt4all: orca-mini-3b - Orca (Small), 1.80GB download, needs 4GB RAM (installed)
gpt4all: ggml-gpt4all-j-v1 - Groovy, 3.53GB download, needs 8GB RAM (installed)
gpt4all: orca-mini-7b - Orca, 3.53GB download, needs 8GB RAM (installed)
gpt4all: ggml-replit-code-v1-3b - Replit, 4.84GB download, needs 4GB RAM (installed)
gpt4all: ggml-vicuna-13b-1 - Vicuna (large), 7.58GB download, needs 16GB RAM (installed)
gpt4all: nous-hermes-13b - Hermes, 7.58GB download, needs 16GB RAM (installed)
gpt4all: ggml-model-gpt4all-falcon-q4_0 - GPT4All Falcon, 3.78GB download, needs 8GB RAM
gpt4all: ggml-vicuna-7b-1 - Vicuna, 3.92GB download, needs 8GB RAM
gpt4all: ggml-wizardLM-7B - Wizard, 3.92GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-base - MPT Base, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-instruct - MPT Instruct, 4.52GB download, needs 8GB RAM
gpt4all: ggml-mpt-7b-chat - MPT Chat, 4.52GB download, needs 8GB RAM
gpt4all: orca-mini-13b - Orca (Large), 6.82GB download, needs 16GB RAM
gpt4all: GPT4All-13B-snoozy - Snoozy, 7.58GB download, needs 16GB RAM
gpt4all: ggml-nous-gpt4-vicuna-13b - Nous Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: ggml-stable-vicuna-13B - Stable Vicuna, 7.58GB download, needs 16GB RAM
gpt4all: wizardLM-13B-Uncensored - Wizard Uncensored, 7.58GB download, needs 16GB RAM
Mpt30b: mpt30b (aliases: mpt)
  verbose: <class 'bool'>

I don't like it outputting the same help multiple times though. I'm going to have it only output the detailed descriptions once per model of each class.

simonw · 2023-07-10T20:54:35Z

Tests are failing now - docs/usage.md is detected as changed by cog on Python 3.8 for some reason.

simonw · 2023-07-10T20:56:17Z

Could be related to plugin order. It shouldn't though, I would expect this order to be the same each time with only the openai default plugin installed:

llm/llm/__init__.py

Lines 49 to 56 in 18f34b5

    
           def get_models_with_aliases() -> List["ModelWithAliases"]: 
        
               model_aliases = [] 
        
               def register(model, aliases=None): 
        
                   model_aliases.append(ModelWithAliases(model, aliases or set())) 
        
               pm.hook.register_models(register=register) 
        
               return model_aliases

simonw · 2023-07-10T20:57:39Z

Could be the order of this bit:

llm/llm/cli.py

Line 381 in 18f34b5

for name, field in model_with_aliases.model.Options.model_fields.items():

simonw · 2023-07-10T21:02:42Z

Actually the problem was something else. I copied the generated text to my local environment and did a diff and got this:

diff --git a/docs/usage.md b/docs/usage.md
index 8e76010..dfc4c16 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -98,53 +98,53 @@ cog.out("```\n{}\n```".format(result.output))
 OpenAI Chat: gpt-3.5-turbo (aliases: 3.5, chatgpt)
-  temperature: float
+  temperature: Union[float, NoneType]
     What sampling temperature to use, between 0 and 2. Higher values like
     0.8 will make the output more random, while lower values like 0.2 will
     make it more focused and deterministic.
-  max_tokens: int
+  max_tokens: Union[int, NoneType]
     Maximum number of tokens to generate
-  top_p: float
+  top_p: Union[float, NoneType]
     An alternative to sampling with temperature, called nucleus sampling,
     where the model considers the results of the tokens with top_p
     probability mass. So 0.1 means only the tokens comprising the top 10%
     probability mass are considered. Recommended to use top_p or
     temperature but not both.
-  frequency_penalty: float
+  frequency_penalty: Union[float, NoneType]
     Number between -2.0 and 2.0. Positive values penalize new tokens based
     on their existing frequency in the text so far, decreasing the model's
     likelihood to repeat the same line verbatim.
-  presence_penalty: float
+  presence_penalty: Union[float, NoneType]
     Number between -2.0 and 2.0. Positive values penalize new tokens based
     on whether they appear in the text so far, increasing the model's
     likelihood to talk about new topics.
-  stop: str
+  stop: Union[str, NoneType]
     A string where the API will stop generating further tokens.
   logit_bias: Union[dict, str, NoneType]
     Modify the likelihood of specified tokens appearing in the completion.
 OpenAI Chat: gpt-3.5-turbo-16k (aliases: chatgpt-16k, 3.5-16k)
-  temperature: float
-  max_tokens: int
-  top_p: float
-  frequency_penalty: float
-  presence_penalty: float
-  stop: str
+  temperature: Union[float, NoneType]
+  max_tokens: Union[int, NoneType]
+  top_p: Union[float, NoneType]
+  frequency_penalty: Union[float, NoneType]
+  presence_penalty: Union[float, NoneType]
+  stop: Union[str, NoneType]
   logit_bias: Union[dict, str, NoneType]
 OpenAI Chat: gpt-4 (aliases: 4, gpt4)
-  temperature: float
-  max_tokens: int
-  top_p: float
-  frequency_penalty: float
-  presence_penalty: float
-  stop: str
+  temperature: Union[float, NoneType]
+  max_tokens: Union[int, NoneType]
+  top_p: Union[float, NoneType]
+  frequency_penalty: Union[float, NoneType]
+  presence_penalty: Union[float, NoneType]
+  stop: Union[str, NoneType]
   logit_bias: Union[dict, str, NoneType]
 OpenAI Chat: gpt-4-32k (aliases: 4-32k)
-  temperature: float
-  max_tokens: int
-  top_p: float
-  frequency_penalty: float
-  presence_penalty: float
-  stop: str
+  temperature: Union[float, NoneType]
+  max_tokens: Union[int, NoneType]
+  top_p: Union[float, NoneType]
+  frequency_penalty: Union[float, NoneType]
+  presence_penalty: Union[float, NoneType]
+  stop: Union[str, NoneType]
   logit_bias: Union[dict, str, NoneType]

So clearly on Python 3.8 this bit of code has a different output:

llm/llm/cli.py

Lines 382 to 384 in 18f34b5

    
           type_info = str(field.annotation).replace("typing.", "") 
        
           if type_info.startswith("Optional["): 
        
               type_info = type_info[9:-1]

simonw · 2023-07-10T21:09:27Z

To test locally I ran:

pyenv install 3.8.17

Then waited for that to compile.

Then:

~/.pyenv/versions/3.8.17/bin/python -m venv /tmp/pvenv
source /tmp/pvenv/bin/activate
pip install -e '.[test]'
/tmp/pvenv/bin/cog --check docs/usage.md

And to rewrite it:

/tmp/pvenv/bin/cog -r docs/usage.md

Refs simonw/llm#82

simonw · 2023-07-10T21:21:23Z

Extracted a TIL: https://til.simonwillison.net/python/quick-testing-pyenv

simonw · 2023-07-10T21:21:49Z

Now documented here: https://llm.datasette.io/en/latest/usage.html#listing-available-models - including a cog powered example.

Refs #31, #53, #55, #57, #63, #69, #70, #75, #76, #79, #82, #91, #98

simonw added documentation Improvements or additions to documentation enhancement New feature or request plugins labels Jul 8, 2023

simonw added this to the 0.5 milestone Jul 10, 2023

simonw closed this as completed in 8f7c3a9 Jul 10, 2023

simonw added a commit that referenced this issue Jul 10, 2023

How to add docs to options, refs #82

18f34b5

simonw reopened this Jul 10, 2023

simonw added a commit that referenced this issue Jul 10, 2023

Show cog on usage.md to help debug #82

9f25cde

simonw added a commit that referenced this issue Jul 10, 2023

Fix for cogapp on Python 3.8, refs #82

898418a

simonw closed this as completed Jul 10, 2023

simonw added a commit to simonw/til that referenced this issue Jul 10, 2023

Quickly testing code in a different Python version using pyenv

8cd896d

Refs simonw/llm#82

simonw mentioned this issue Jul 12, 2023

Release notes for 0.5 #99

Closed

simonw added a commit that referenced this issue Jul 12, 2023

Release 0.5

a3c0796

Refs #31, #53, #55, #57, #63, #69, #70, #75, #76, #79, #82, #91, #98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command showing available options for installed models #82

Command showing available options for installed models #82

simonw commented Jul 8, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023 •

edited

Loading

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

Command showing available options for installed models #82

Command showing available options for installed models #82

Comments

simonw commented Jul 8, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023 • edited Loading

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023

simonw commented Jul 10, 2023 •

edited

Loading