-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66
Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66
Conversation
@@ -20,7 +20,7 @@ | |||
"output-len": [ | |||
128 | |||
], | |||
"tensor-parallel-size": [ | |||
"tensor-parallel-size_": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the underscore? I don't see it for other arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark_throughput.py
script is not meant to use this argument directly.
it is part of a mutually exclusive arg group,
tp_group = parser.add_mutually_exclusive_group(required=True)
tp_group.add_argument("--tensor-parallel-size_", type=int, default=None)
tp_group.add_argument("--use-all-available-gpus_", action="store_true")
and the script derives the correct tensor_parallel_size using the utility,
def get_tensor_parallel_size(args: argparse.Namespace) -> int:
tensor_parallel_size = num_available_gpus() \
if args.use_all_available_gpus_ else args.tensor_parallel_size_
assert tensor_parallel_size > 0 and \
tensor_parallel_size <= num_available_gpus()
return tensor_parallel_size
@@ -8,6 +8,7 @@ | |||
"mistralai/Mistral-7B-Instruct-v0.2", | |||
"NousResearch/Llama-2-7b-chat-hf" | |||
], | |||
"use_all_available_gpus" : "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually translates to use_all_available_gpus = True
in python.. If we want to set it to false, we remove this key.
This is really a pain - I am considering moving to yml or some other format. on top this, JSON doesn't even let you add comments :(
@@ -27,36 +28,6 @@ | |||
"sharegpt" | |||
] | |||
} | |||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed redundant benchmark.
@@ -34,7 +31,8 @@ | |||
], | |||
"dtype": [ | |||
"auto" | |||
] | |||
], | |||
"use-all-available-gpus_" : [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above - an empty use-all-available-gpus_
translates to --use-all-available-gpus_
arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks reasonable and nice job with the multi-gpu usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool
SUMMARY: Add a benchmarking workflow to run Nightly TEST PLAN: Manual benchmark triggers on AWS instances - check #66 --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
SUMMARY:
TEST PLAN:
Run manual benchmark jobs
multi-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086009234/job/22094787943
single-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086016169/job/22094812742
(Then benchmarks didn't run to completion as huggingface went down mid-way. but the artifacts seem reasonable for what it did run)