Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@Jack-Khuu
Copy link
Contributor

@Jack-Khuu Jack-Khuu commented Jul 10, 2024

The current --help is a mess; it uses a giant add_arguments_for_verb function that doesn't actually filter based on the provided verb subcommand.

This PR is part of a series to clean up this behavior.

Specifically, this PR separates out the commands that are unique to output generation into an _add_generation_args function. This function adds an argument group with the parameters from add_arguments_for_verb only for the subcommands that need it (chat/browser/generate).

Note: This PR does not attempt to rearrange or prettify the args. This makes reviewing simpler. Those are done in separate PRs.


python3 torchchat.py generate --help

usage: torchchat generate [-h] [--prompt PROMPT] [--chat] [--gui] [--num-samples NUM_SAMPLES] [--max-new-tokens MAX_NEW_TOKENS] [--top-k TOP_K] [--temperature TEMPERATURE] [--compile-prefill] [--sequential-prefill] [--speculate-k SPECULATE_K]
                          [--distributed] [--is-chat-model] [--seed SEED] [--compile] [--profile PROFILE] [--draft-checkpoint-path DRAFT_CHECKPOINT_PATH] [--checkpoint-path CHECKPOINT_PATH] [--params-path PARAMS_PATH] [--gguf-path GGUF_PATH]
                          [--tokenizer-path TOKENIZER_PATH] [--dso-path DSO_PATH] [--pte-path PTE_PATH] [--output-pte-path OUTPUT_PTE_PATH] [--output-dso-path OUTPUT_DSO_PATH]
                          [--dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}] [--quantize QUANTIZE] [--draft-quantize DRAFT_QUANTIZE]
                          [--params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}] [--device {fast,cpu,cuda,mps}] [--hf-token HF_TOKEN] [--model-directory MODEL_DIRECTORY]
                          [--port PORT] [-v]
                          [model]

positional arguments:
  model                 Model name for well-known models

options:
  -h, --help            show this help message and exit
  --distributed         Whether to enable distributed inference
  --is-chat-model       Indicate that the model was trained to support chat functionality
  --seed SEED           Initialize torch seed
  --compile             Whether to compile the model with torch.compile
  --profile PROFILE     Profile path.
  --draft-checkpoint-path DRAFT_CHECKPOINT_PATH
                        Use the specified draft checkpoint path
  --checkpoint-path CHECKPOINT_PATH
                        Use the specified model checkpoint path
  --params-path PARAMS_PATH
                        Use the specified parameter file
  --gguf-path GGUF_PATH
                        Use the specified GGUF model file
  --tokenizer-path TOKENIZER_PATH
                        Use the specified model tokenizer file
  --dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}
                        Override the dtype of the model (default is the checkpoint dtype). Options: bf16, fp16, fp32, fast16, fast
  --quantize QUANTIZE   Quantization options. pass in as '{"<mode>" : {"<argname1>" : <argval1>, "<argname2>" : <argval2>,...},}' modes are: embedding, linear:int8, linear:int4, linear:a8w4dq, precision.
  --draft-quantize DRAFT_QUANTIZE
                        Quantization options. Same format as quantize, or 'quantize' to indicate same options specified by --quantize to main model. Applied to draft model.
  --params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}
                        Parameter table to use
  --device {fast,cpu,cuda,mps}
                        Hardware device to use. Options: cpu, cuda, mps
  --hf-token HF_TOKEN   A HuggingFace API token to use when downloading model artifacts
  --model-directory MODEL_DIRECTORY
                        The directory to store downloaded model artifacts. Default: /Users/jackkhuu/.torchchat/model-cache
  --port PORT           Port for the web server in browser mode
  -v, --verbose         Verbose output

Generation Args:
  Configs for generating output based on provided prompt

  --prompt PROMPT       Input prompt for manual output generation
  --chat                Whether to start an interactive chat session
  --gui                 Whether to use a web UI for an interactive chat session
  --num-samples NUM_SAMPLES
                        Number of samples
  --max-new-tokens MAX_NEW_TOKENS
                        Maximum number of new tokens
  --top-k TOP_K         Top-k for sampling
  --temperature TEMPERATURE
                        Temperature for sampling
  --compile-prefill     Whether to compile the prefill. Improves prefill perf, but has higher compile times.
  --sequential-prefill  Whether to perform prefill sequentially. Only used for model debug.
  --speculate-k SPECULATE_K
                        Speculative execution depth

Exported Model Path Args:
  Specify the path of the exported model files to ingest

  --dso-path DSO_PATH   Use the specified AOT Inductor .dso model file
  --pte-path PTE_PATH   Use the specified ExecuTorch .pte model file

Export Output Path Args:
  Specify the output path for the exported model files

  --output-pte-path OUTPUT_PTE_PATH
                        Output to the specified ExecuTorch .pte model file
  --output-dso-path OUTPUT_DSO_PATH
                        Output to the specified AOT Inductor .dso model file

python3 torchchat.py eval --help

usage: torchchat eval [-h] [--distributed] [--is-chat-model] [--seed SEED] [--compile] [--profile PROFILE] [--draft-checkpoint-path DRAFT_CHECKPOINT_PATH] [--checkpoint-path CHECKPOINT_PATH] [--params-path PARAMS_PATH] [--gguf-path GGUF_PATH]
                      [--tokenizer-path TOKENIZER_PATH] [--dso-path DSO_PATH] [--pte-path PTE_PATH] [--output-pte-path OUTPUT_PTE_PATH] [--output-dso-path OUTPUT_DSO_PATH]
                      [--dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}] [--quantize QUANTIZE] [--draft-quantize DRAFT_QUANTIZE]
                      [--params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}] [--device {fast,cpu,cuda,mps}] [--tasks TASKS [TASKS ...]] [--limit LIMIT]
                      [--max-seq-length MAX_SEQ_LENGTH] [--hf-token HF_TOKEN] [--model-directory MODEL_DIRECTORY] [--port PORT] [-v]
                      [model]

positional arguments:
  model                 Model name for well-known models

options:
  -h, --help            show this help message and exit
  --distributed         Whether to enable distributed inference
  --is-chat-model       Indicate that the model was trained to support chat functionality
  --seed SEED           Initialize torch seed
  --compile             Whether to compile the model with torch.compile
  --profile PROFILE     Profile path.
  --draft-checkpoint-path DRAFT_CHECKPOINT_PATH
                        Use the specified draft checkpoint path
  --checkpoint-path CHECKPOINT_PATH
                        Use the specified model checkpoint path
  --params-path PARAMS_PATH
                        Use the specified parameter file
  --gguf-path GGUF_PATH
                        Use the specified GGUF model file
  --tokenizer-path TOKENIZER_PATH
                        Use the specified model tokenizer file
  --dtype {fp32,fp16,bf16,float,half,float32,float16,bfloat16,fast,fast16}
                        Override the dtype of the model (default is the checkpoint dtype). Options: bf16, fp16, fp32, fast16, fast
  --quantize QUANTIZE   Quantization options. pass in as '{"<mode>" : {"<argname1>" : <argval1>, "<argname2>" : <argval2>,...},}' modes are: embedding, linear:int8, linear:int4, linear:a8w4dq, precision.
  --draft-quantize DRAFT_QUANTIZE
                        Quantization options. Same format as quantize, or 'quantize' to indicate same options specified by --quantize to main model. Applied to draft model.
  --params-table {13B,70B,CodeLlama-7b-Python-hf,34B,stories42M,30B,stories110M,7B,stories15M,Mistral-7B,Meta-Llama-3-8B}
                        Parameter table to use
  --device {fast,cpu,cuda,mps}
                        Hardware device to use. Options: cpu, cuda, mps
  --hf-token HF_TOKEN   A HuggingFace API token to use when downloading model artifacts
  --model-directory MODEL_DIRECTORY
                        The directory to store downloaded model artifacts. Default: /Users/jackkhuu/.torchchat/model-cache
  --port PORT           Port for the web server in browser mode
  -v, --verbose         Verbose output

Exported Model Path Args:
  Specify the path of the exported model files to ingest

  --dso-path DSO_PATH   Use the specified AOT Inductor .dso model file
  --pte-path PTE_PATH   Use the specified ExecuTorch .pte model file

Export Output Path Args:
  Specify the output path for the exported model files

  --output-pte-path OUTPUT_PTE_PATH
                        Output to the specified ExecuTorch .pte model file
  --output-dso-path OUTPUT_DSO_PATH
                        Output to the specified AOT Inductor .dso model file

Evaluation Args:
  Configs for evaluating model performance

  --tasks TASKS [TASKS ...]
                        List of lm-eluther tasks to evaluate. Usage: --tasks task1 task2
  --limit LIMIT         Number of samples to evaluate
  --max-seq-length MAX_SEQ_LENGTH
                        Maximum length sequence to evaluate

Note that only Eval does not have the arguments of Generate and vice-versa

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/891

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d71edb6 with merge base a0962d1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 10, 2024
@Jack-Khuu Jack-Khuu marked this pull request as ready for review July 10, 2024 08:24
echo "*************************************************"
if [ "$DTYPE" != "float16" ]; then
python3 -W ignore export.py --dtype ${DTYPE} --quant "$QUANT_OPTIONS" --checkpoint-path "$CHECKPOINT_PATH" --output-dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" || exit 1
python3 -W ignore eval.py --dtype ${DTYPE} --checkpoint-path "$CHECKPOINT_PATH" --temperature 0 --dso-path ${MODEL_DIR}/${MODEL_NAME}.so --device "$TARGET_DEVICE" --limit 5 > "$MODEL_DIR/output_eval_aoti" || exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we dropping temperature 0 here? Won't this impact eval?

Copy link
Contributor Author

@Jack-Khuu Jack-Khuu Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eval currently doesn't use the field at all....

I have a non-MVP task mentioning this, seems like gptfast doesn't take a temp arg either

@Jack-Khuu Jack-Khuu merged commit 03ea37b into main Jul 10, 2024
fduwjj pushed a commit that referenced this pull request Jul 11, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
malfet pushed a commit that referenced this pull request Jul 17, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
malfet pushed a commit that referenced this pull request Jul 17, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
malfet pushed a commit that referenced this pull request Jul 17, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
malfet pushed a commit that referenced this pull request Jul 17, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
malfet pushed a commit that referenced this pull request Jul 17, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
malfet pushed a commit that referenced this pull request Jul 17, 2024
* Fixing the help mode of the download subcommand

* Initial Addition of subparsers for generation

* Move compile out of generation exclusive

* typo

* Fix test by removing temperature, which is a field eval doesn't use or expect

* Typo Generater => Generator
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants