Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66

varun-sundar-rabindranath · 2024-02-28T00:33:35Z

SUMMARY:

Add tensor_parallel_size arg for multi-gpu benchmarking.
Update the benchmark config files to use all available gpus by default.
Other minor changes :
- Fix script args outer-product enumeration.
- Limit dataset sample-space when using sharegpt dataset.
- Remove redundant benchmarks.
- Standardize benchmarking result JSON.

TEST PLAN:
Run manual benchmark jobs
multi-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086009234/job/22094787943
single-gpu benchmark : https://github.com/neuralmagic/nm-vllm/actions/runs/8086016169/job/22094812742
(Then benchmarks didn't run to completion as huggingface went down mid-way. but the artifacts seem reasonable for what it did run)

mgoin · 2024-02-28T14:47:31Z

neuralmagic/benchmarks/configs/benchmark_throughput.json

@@ -20,7 +20,7 @@
 				"output-len": [
 					128
 				],
-				"tensor-parallel-size": [
+				"tensor-parallel-size_": [


Why the underscore? I don't see it for other arguments

The benchmark_throughput.py script is not meant to use this argument directly.
it is part of a mutually exclusive arg group,

tp_group = parser.add_mutually_exclusive_group(required=True) tp_group.add_argument("--tensor-parallel-size_", type=int, default=None) tp_group.add_argument("--use-all-available-gpus_", action="store_true")

and the script derives the correct tensor_parallel_size using the utility,

def get_tensor_parallel_size(args: argparse.Namespace) -> int: tensor_parallel_size = num_available_gpus() \ if args.use_all_available_gpus_ else args.tensor_parallel_size_ assert tensor_parallel_size > 0 and \ tensor_parallel_size <= num_available_gpus() return tensor_parallel_size

neuralmagic/benchmarks/datasets_registry.py

varun-sundar-rabindranath · 2024-02-28T18:41:07Z

neuralmagic/benchmarks/configs/benchmark_serving.json

@@ -8,6 +8,7 @@
 				"mistralai/Mistral-7B-Instruct-v0.2",
 				"NousResearch/Llama-2-7b-chat-hf"
 			],
+			"use_all_available_gpus" : "",


This actually translates to use_all_available_gpus = True in python.. If we want to set it to false, we remove this key.

This is really a pain - I am considering moving to yml or some other format. on top this, JSON doesn't even let you add comments :(

varun-sundar-rabindranath · 2024-02-28T18:41:30Z

neuralmagic/benchmarks/configs/benchmark_serving.json

@@ -27,36 +28,6 @@
 					"sharegpt"
 				]
 			}
-		},


removed redundant benchmark.

varun-sundar-rabindranath · 2024-02-28T18:47:10Z

neuralmagic/benchmarks/configs/benchmark_throughput.json

@@ -34,7 +31,8 @@
 				],
 				"dtype": [
 					"auto"
-				]
+				],
+				"use-all-available-gpus_" : []


same as above - an empty use-all-available-gpus_ translates to --use-all-available-gpus_ arg

mgoin

looks reasonable and nice job with the multi-gpu usage

andy-neuma

cool

SUMMARY: Add a benchmarking workflow to run Nightly TEST PLAN: Manual benchmark triggers on AWS instances - check #66 --------- Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>

Varun Sundar Rabindranath added 3 commits February 28, 2024 00:21

Add tensor-parallel-size arg and other fixes

9256a8e

fix yapf ugliness

32b08c4

fix config jsons

3dc2de6

varun-sundar-rabindranath requested review from tlrmchlsmth, mgoin, andy-neuma and robertgshaw2-neuralmagic February 28, 2024 00:44

tensor-parallel-sizes list -> int

88592b9

varun-sundar-rabindranath mentioned this pull request Feb 28, 2024

Add Nightly benchmark workflow #62

Merged

mgoin reviewed Feb 28, 2024

View reviewed changes

varun-sundar-rabindranath marked this pull request as draft February 28, 2024 15:24

Varun Sundar Rabindranath added 5 commits February 28, 2024 16:42

fix arg cross product

a3129d0

use all available gpus by default

39d7789

add script name to result jsons

8a920f1

add instantiate_benchmark_results_dict

72b528d

yapf

63e272d

varun-sundar-rabindranath commented Feb 28, 2024

View reviewed changes

Varun Sundar Rabindranath added 2 commits February 28, 2024 18:49

add server-tensor-parallel-size arg

d4cd31e

yapf

b99ef6b

varun-sundar-rabindranath marked this pull request as ready for review February 28, 2024 19:04

standardize more info in result json

04d70e5

mgoin approved these changes Feb 28, 2024

View reviewed changes

andy-neuma approved these changes Feb 28, 2024

View reviewed changes

varun-sundar-rabindranath merged commit 0367fc2 into main Feb 28, 2024
3 of 6 checks passed

varun-sundar-rabindranath deleted the varun/benchmark-add-tensor-parallel-sizes-arg branch February 28, 2024 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66

Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66

varun-sundar-rabindranath commented Feb 28, 2024 •

edited

mgoin Feb 28, 2024

varun-sundar-rabindranath Feb 28, 2024

varun-sundar-rabindranath Feb 28, 2024

varun-sundar-rabindranath Feb 28, 2024

varun-sundar-rabindranath Feb 28, 2024

mgoin left a comment

andy-neuma left a comment

Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66

Benchmarking - Add tensor_parallel_size arg for multi-gpu benchmarking #66

Conversation

varun-sundar-rabindranath commented Feb 28, 2024 • edited

mgoin Feb 28, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Feb 28, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Feb 28, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Feb 28, 2024

Choose a reason for hiding this comment

varun-sundar-rabindranath Feb 28, 2024

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

andy-neuma left a comment

Choose a reason for hiding this comment

varun-sundar-rabindranath commented Feb 28, 2024 •

edited