Fix help generate #891

Jack-Khuu · 2024-07-10T05:23:26Z

The current --help is a mess; it uses a giant add_arguments_for_verb function that doesn't actually filter based on the provided verb subcommand.

This PR is part of a series to clean up this behavior.

Specifically, this PR separates out the commands that are unique to output generation into an _add_generation_args function. This function adds an argument group with the parameters from add_arguments_for_verb only for the subcommands that need it (chat/browser/generate).

Note: This PR does not attempt to rearrange or prettify the args. This makes reviewing simpler. Those are done in separate PRs.

python3 torchchat.py generate --help

usage: torchchat generate [-h] [--prompt PROMPT] [--chat] [--gui] [--num-samples NUM_SAMPLES] [--max-new-tokens MAX_NEW_TOKENS] [--top-k TOP_K] [--temperature TEMPERATURE] [--compile-prefill] [--sequential-prefill] [--speculate-k SPECULATE_K]
                          [--distributed] [--is-chat-model] [--seed SEED] [--compile] [--profile PROFILE] [--draft-checkpoint-path DRAFT_CHECKPOINT_PATH] [--checkpoint-path CHECKPOINT_PATH] [--params-path PARAMS_PATH] [--gguf-path GGUF_PATH]
                          [--tokenizer-path TOKENIZER_PATH] [--dso-path DSO_PATH] [--pte-path PTE_PATH] [--output-pte-path OUTPUT_PTE_PATH] [--output-dso-path OUTPUT_DSO_PATH]
                          [--dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}] [--quantize QUANTIZE] [--draft-quantize DRAFT_QUANTIZE]
                          [--params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}] [--device {fast,cpu,cuda,mps}] [--hf-token HF_TOKEN] [--model-directory MODEL_DIRECTORY]
                          [--port PORT] [-v]
                          [model]

positional arguments:
  model                 Model name for well-known models

options:
  -h, --help            show this help message and exit
  --distributed         Whether to enable distributed inference
  --is-chat-model       Indicate that the model was trained to support chat functionality
  --seed SEED           Initialize torch seed
  --compile             Whether to compile the model with torch.compile
  --profile PROFILE     Profile path.
  --draft-checkpoint-path DRAFT_CHECKPOINT_PATH
                        Use the specified draft checkpoint path
  --checkpoint-path CHECKPOINT_PATH
                        Use the specified model checkpoint path
  --params-path PARAMS_PATH
                        Use the specified parameter file
  --gguf-path GGUF_PATH
                        Use the specified GGUF model file
  --tokenizer-path TOKENIZER_PATH
                        Use the specified model tokenizer file
  --dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}
                        Override the dtype of the model (default is the checkpoint dtype). Options: bf16, fp16, fp32, fast16, fast
  --quantize QUANTIZE   Quantization options. pass in as '{"<mode>" : {"<argname1>" : <argval1>, "<argname2>" : <argval2>,...},}' modes are: embedding, linear:int8, linear:int4, linear:a8w4dq, precision.
  --draft-quantize DRAFT_QUANTIZE
                        Quantization options. Same format as quantize, or 'quantize' to indicate same options specified by --quantize to main model. Applied to draft model.
  --params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}
                        Parameter table to use
  --device {fast,cpu,cuda,mps}
                        Hardware device to use. Options: cpu, cuda, mps
  --hf-token HF_TOKEN   A HuggingFace API token to use when downloading model artifacts
  --model-directory MODEL_DIRECTORY
                        The directory to store downloaded model artifacts. Default: /Users/jackkhuu/.torchchat/model-cache
  --port PORT           Port for the web server in browser mode
  -v, --verbose         Verbose output

Generation Args:
  Configs for generating output based on provided prompt

  --prompt PROMPT       Input prompt for manual output generation
  --chat                Whether to start an interactive chat session
  --gui                 Whether to use a web UI for an interactive chat session
  --num-samples NUM_SAMPLES
                        Number of samples
  --max-new-tokens MAX_NEW_TOKENS
                        Maximum number of new tokens
  --top-k TOP_K         Top-k for sampling
  --temperature TEMPERATURE
                        Temperature for sampling
  --compile-prefill     Whether to compile the prefill. Improves prefill perf, but has higher compile times.
  --sequential-prefill  Whether to perform prefill sequentially. Only used for model debug.
  --speculate-k SPECULATE_K
                        Speculative execution depth

Exported Model Path Args:
  Specify the path of the exported model files to ingest

  --dso-path DSO_PATH   Use the specified AOT Inductor .dso model file
  --pte-path PTE_PATH   Use the specified ExecuTorch .pte model file

Export Output Path Args:
  Specify the output path for the exported model files

  --output-pte-path OUTPUT_PTE_PATH
                        Output to the specified ExecuTorch .pte model file
  --output-dso-path OUTPUT_DSO_PATH
                        Output to the specified AOT Inductor .dso model file

python3 torchchat.py eval --help

usage: torchchat eval [-h] [--distributed] [--is-chat-model] [--seed SEED] [--compile] [--profile PROFILE] [--draft-checkpoint-path DRAFT_CHECKPOINT_PATH] [--checkpoint-path CHECKPOINT_PATH] [--params-path PARAMS_PATH] [--gguf-path GGUF_PATH]
                      [--tokenizer-path TOKENIZER_PATH] [--dso-path DSO_PATH] [--pte-path PTE_PATH] [--output-pte-path OUTPUT_PTE_PATH] [--output-dso-path OUTPUT_DSO_PATH]
                      [--dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}] [--quantize QUANTIZE] [--draft-quantize DRAFT_QUANTIZE]
                      [--params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}] [--device {fast,cpu,cuda,mps}] [--tasks TASKS [TASKS ...]] [--limit LIMIT]
                      [--max-seq-length MAX_SEQ_LENGTH] [--hf-token HF_TOKEN] [--model-directory MODEL_DIRECTORY] [--port PORT] [-v]
                      [model]

positional arguments:
  model                 Model name for well-known models

options:
  -h, --help            show this help message and exit
  --distributed         Whether to enable distributed inference
  --is-chat-model       Indicate that the model was trained to support chat functionality
  --seed SEED           Initialize torch seed
  --compile             Whether to compile the model with torch.compile
  --profile PROFILE     Profile path.
  --draft-checkpoint-path DRAFT_CHECKPOINT_PATH
                        Use the specified draft checkpoint path
  --checkpoint-path CHECKPOINT_PATH
                        Use the specified model checkpoint path
  --params-path PARAMS_PATH
                        Use the specified parameter file
  --gguf-path GGUF_PATH
                        Use the specified GGUF model file
  --tokenizer-path TOKENIZER_PATH
                        Use the specified model tokenizer file
  --dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}
                        Override the dtype of the model (default is the checkpoint dtype). Options: bf16, fp16, fp32, fast16, fast
  --quantize QUANTIZE   Quantization options. pass in as '{"<mode>" : {"<argname1>" : <argval1>, "<argname2>" : <argval2>,...},}' modes are: embedding, linear:int8, linear:int4, linear:a8w4dq, precision.
  --draft-quantize DRAFT_QUANTIZE
                        Quantization options. Same format as quantize, or 'quantize' to indicate same options specified by --quantize to main model. Applied to draft model.
  --params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}
                        Parameter table to use
  --device {fast,cpu,cuda,mps}
                        Hardware device to use. Options: cpu, cuda, mps
  --hf-token HF_TOKEN   A HuggingFace API token to use when downloading model artifacts
  --model-directory MODEL_DIRECTORY
                        The directory to store downloaded model artifacts. Default: /Users/jackkhuu/.torchchat/model-cache
  --port PORT           Port for the web server in browser mode
  -v, --verbose         Verbose output

Exported Model Path Args:
  Specify the path of the exported model files to ingest

  --dso-path DSO_PATH   Use the specified AOT Inductor .dso model file
  --pte-path PTE_PATH   Use the specified ExecuTorch .pte model file

Export Output Path Args:
  Specify the output path for the exported model files

  --output-pte-path OUTPUT_PTE_PATH
                        Output to the specified ExecuTorch .pte model file
  --output-dso-path OUTPUT_DSO_PATH
                        Output to the specified AOT Inductor .dso model file

Evaluation Args:
  Configs for evaluating model performance

  --tasks TASKS [TASKS ...]
                        List of lm-eluther tasks to evaluate. Usage: --tasks task1 task2
  --limit LIMIT         Number of samples to evaluate
  --max-seq-length MAX_SEQ_LENGTH
                        Maximum length sequence to evaluate

Note that only Eval does not have the arguments of Generate and vice-versa

pytorch-bot · 2024-07-10T05:23:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/891

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d71edb6 with merge base a0962d1 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… fix-help-generate

…r expect

byjlw · 2024-07-10T16:55:52Z

.ci/scripts/validate.sh

        echo "*************************************************"
        if [ "$DTYPE" != "float16" ]; then
            python3 -W ignore export.py --dtype ${DTYPE} --quant "$QUANT_OPTIONS" --checkpoint-path "$CHECKPOINT_PATH" --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" || exit 1
-            python3 -W ignore eval.py --dtype ${DTYPE} --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" --limit 5 > "$MODEL_DIR/output_eval_aoti" || exit 1


why are we dropping temperature 0 here? Won't this impact eval?

Eval currently doesn't use the field at all....

I have a non-MVP task mentioning this, seems like gptfast doesn't take a temp arg either

* Fixing the help mode of the download subcommand * Initial Addition of subparsers for generation * Move compile out of generation exclusive * typo * Fix test by removing temperature, which is a field eval doesn't use or expect * Typo Generater => Generator

Jack-Khuu added 2 commits July 9, 2024 20:43

Fixing the help mode of the download subcommand

3e3cf55

Initial Addition of subparsers for generation

6a870c4

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 10, 2024

Jack-Khuu and others added 5 commits July 9, 2024 22:25

Merge branch 'main' into fix-help-generate

48436e3

Move compile out of generation exclusive

cde4dcc

Merge branch 'fix-help-generate' of github.com:pytorch/torchchat into…

a0ec720

… fix-help-generate

typo

f28e412

Fix test by removing temperature, which is a field eval doesn't use o…

76325da

…r expect

Jack-Khuu requested review from Gasoonjia, byjlw, jerryzh168, larryliu0820, malfet and vmpuri July 10, 2024 08:24

Jack-Khuu marked this pull request as ready for review July 10, 2024 08:24

Typo Generater => Generator

d71edb6

byjlw approved these changes Jul 10, 2024

View reviewed changes

Jack-Khuu merged commit 03ea37b into main Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix help generate #891

Fix help generate #891

Uh oh!

Jack-Khuu commented Jul 10, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 10, 2024 •

edited

Loading

Uh oh!

byjlw Jul 10, 2024

Uh oh!

Jack-Khuu Jul 10, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix help generate #891

Fix help generate #891

Uh oh!

Conversation

Jack-Khuu commented Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/891

✅ No Failures

Uh oh!

byjlw Jul 10, 2024

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jack-Khuu commented Jul 10, 2024 •

edited

Loading

pytorch-bot bot commented Jul 10, 2024 •

edited

Loading

Jack-Khuu Jul 10, 2024 •

edited

Loading