Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option to use tuned model in shark_runner #79

Merged
merged 6 commits into from Jun 3, 2022

Conversation

yzhang93
Copy link
Contributor

@yzhang93 yzhang93 commented Jun 1, 2022

  • Add a flag to load pre-tuned model config file in shark_runner, and nothing is changed in inference api.
  • The example is made for minilm model with tensorflow frontend and GPU. To run the example:
    python -m shark.examples.shark_inference.minilm_tf --device="gpu" --model_config_path=shark/examples/shark_inference/minilm_tf_gpu_config.json
  • The config file (example/shark-inference/.json) does not work for Torch frontend directly. The torch-mlir uses linalg.batch_matmul (instead of matmul) with first dimension as 1. So one needs to generate a new config file, and append the first dimension of tile_sizes as 1. Further test is needed for model annotation with Torch frontend.

Copy link
Contributor

@powderluv powderluv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should name the model_config.json the same as the model name tf_minilm_config.json or something

@yzhang93
Copy link
Contributor Author

yzhang93 commented Jun 1, 2022

maybe we should name the model_config.json the same as the model name tf_minilm_config.json or something

Good point. Done.

@powderluv powderluv self-requested a review June 1, 2022 22:21
@@ -0,0 +1 @@
{"options": [{"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 32, 16], "work_group_sizes": [64, 2, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [32, 64, 32], "work_group_sizes": [128, 1, 1], "pipeline": "GPU_TENSORCORE"}, {"work_group_tile_sizes": [1, 64, 64, 32], "work_group_sizes": [128, 2, 1], "pipeline": "GPU_TENSORCORE", "split_k": 8}, {"work_group_tile_sizes": [1, 32, 128], "work_group_sizes": [32, 1, 1], "pipeline": "GPU"}]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this as a gsutil cp call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, where the file should be stored? I think people from outside may also want to run the command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let this example be checked in. We can wget other configs in the future.

@@ -158,7 +161,8 @@ def get_iree_module(module, device, input_type, args, func_name):
def get_iree_compiled_module(module,
device: str,
frontend: str = "torch",
func_name: str = "forward"):
func_name: str = "forward",
use_tuned_model: str = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Can we rename use_tuned_model to something like tuned_model_path or model_config_path, since it's more representative of what the variable store. Ideally would have the same name in shark parser for easy readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's done.

@powderluv
Copy link
Contributor

Do you also see the benchmark module report 1.66ms on CUDA A100 ?

@yzhang93
Copy link
Contributor Author

yzhang93 commented Jun 1, 2022

Do you also see the benchmark module report 1.66ms on CUDA A100 ?

Let me build the latest perf branch and check the performance. I've checked the generated annotated mlir file and looks good.

@powderluv
Copy link
Contributor

you don't need to build the latest branch. the new venv should have it

@yzhang93
Copy link
Contributor Author

yzhang93 commented Jun 2, 2022

Do you also see the benchmark module report 1.66ms on CUDA A100 ?

I got 1.75ms on my A100 vm. Just solved the merge conflict, and this PR should be ready to merge.

@powderluv powderluv merged commit 16c50ca into nod-ai:main Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants