New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the option to use tuned model in shark_runner #79
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should name the model_config.json the same as the model name tf_minilm_config.json or something
Good point. Done. |
@@ -0,0 +1 @@ | |||
{"options": [{"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [1, 32, 128], "work_group_sizes": [32, 1, 1], "pipeline": "GPU"}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this as a gsutil cp call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, where the file should be stored? I think people from outside may also want to run the command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let this example be checked in. We can wget other configs in the future.
shark/iree_utils.py
Outdated
@@ -158,7 +161,8 @@ def get_iree_module(module, device, input_type, args, func_name): | |||
def get_iree_compiled_module(module, | |||
device: str, | |||
frontend: str = "torch", | |||
func_name: str = "forward"): | |||
func_name: str = "forward", | |||
use_tuned_model: str = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Can we rename use_tuned_model
to something like tuned_model_path
or model_config_path
, since it's more representative of what the variable store. Ideally would have the same name in shark parser for easy readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's done.
Do you also see the benchmark module report 1.66ms on CUDA A100 ? |
Let me build the latest perf branch and check the performance. I've checked the generated annotated mlir file and looks good. |
you don't need to build the latest branch. the new venv should have it |
I got 1.75ms on my A100 vm. Just solved the merge conflict, and this PR should be ready to merge. |
python -m shark.examples.shark_inference.minilm_tf --device="gpu" --model_config_path=shark/examples/shark_inference/minilm_tf_gpu_config.json