Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect VNNI support (avx512) in CI and skip ORT test for int8 models if missing VNNI #526

Merged
merged 6 commits into from
Jun 1, 2022

Conversation

jcwchen
Copy link
Member

@jcwchen jcwchen commented May 18, 2022

Description
Sometimes GitHub Action machines don't have VNNI support. However, all data of int8 models were produced by machines with VNNI support. To make CI consistent, it should know whether the machine has VNNI support or not. If not, simply skip it.

Motivation and Context
Closes #522.

Signed-off-by: jcwchen <jacky82226@gmail.com>
@jcwchen jcwchen added the CI label May 18, 2022
Signed-off-by: jcwchen <jacky82226@gmail.com>
Signed-off-by: jcwchen <jacky82226@gmail.com>
@jcwchen
Copy link
Member Author

jcwchen commented May 19, 2022

@mengniwang95 If you have bandwidth, would you mind to help me review this PR? I added two files for testing: test-mobilenetv2-12-int8.tar.gz, test-resnet50-v1-12-int8.onnx. As you can see in the CIs of the latest commit:

  • Windows CI does have avx512 support so it still will run ORT test for quantized models.
  • Linux CI does not have avx512 support so it will skip the ORT test for quantized models.

I will remove these two testing files before merge. Thank you for your time!

This reverts commit bf0e98f.

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
@jcwchen jcwchen merged commit b441335 into onnx:main Jun 1, 2022
@jcwchen jcwchen deleted the jcw/handle-vnni branch June 1, 2022 15:28
Copy link
Contributor

@garymm garymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope you don't mind the drive-by review.

# TODO: now ONNX only supports Protobuf <= 3.20.1
python -m pip install protobuf==3.20.1
python -m pip install onnx onnxruntime requests py-cpuinfo
python -m cpuinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why run cpuinfo here?
If you want to keep it, could you please add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it intentionally for future debugging since the CPU information can be very helpful while debugging why ORT inference behaves differently on different machines. I will send a PR to add comments about it.

import ort_test_dir_utils
import onnxruntime
import onnx
import test_utils


def has_vnni_support():
return 'avx512' in str(get_cpu_info()['flags'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is strictly correct. From my reading there are CPUs that support avx512 "foundation" but not VNNI, which is an extension.
I think the flag to check for is avx512_vnni and maybe also avx512_4vnniw.

Sources:

https://en.wikichip.org/wiki/x86/avx512_vnni
https://bugzilla.redhat.com/show_bug.cgi?id=1761678
https://unix.stackexchange.com/questions/43539/what-do-the-flags-in-proc-cpuinfo-mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this! The function/PR name here might be misleading -- I made several experiments and concluded that machines with avx512f behave differently from machines without avx512f. Unfortunately I don't have machines with avx512_vnni support so I am not sure whether machines with avx512_vnni and machines with avx512f (and without avx512_vnni) behave the same. At least based on previous int8-related PRs, the output.pb produced by machines with avx512_vnni seems the same as the output.pb produced by machines with avx512f support only.

In summary, we need to figure out:

  1. Machines without any avx512 support
  2. Machines with avx512f support but without avx512_vnni support
  3. Machines with avx512_vnni support

I am sure that 1 != 2 and 1 != 3, but I am not sure whether 2 == 3.

cc @mengniwang95 who is the main contributor of int8 models in ONNX Model Zoo. Feel free to comment if you have any insight. Thank you for the help!.

return 'avx512' in str(get_cpu_info()['flags'])


def skip_quant_models_if_missing_vnni(model_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change:

def ort_skip_reason(model_path: str) -> Optional[str]:
  if '-int8' in model_name and not has_vnni_support():
    return f'Skip ORT test for {model_path} because this machine lacks VNNI support and the output.pb was produced with VNNI support.'
  model = onnx.load(model_path)
  if model.opset_import[0].version < 7:
      return f'Skip ORT test for {model_path} because ORT only *guarantees* support for models stamped with opset version 7'
  return None

def run_backend_ort(...):
   skip_reason = ort_skip_reason(model_path)
   if skip_reason:
      # log skip reason
      return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! I have created a new PR to improve it: #533.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect VNNI support in CIs before running ORT
3 participants