Skip to content

Commit

Permalink
Add models from Hugging Face/transformers from MLAgility (#615)
Browse files Browse the repository at this point in the history
* popular_on_huggingface/bert-base-uncased.py

Signed-off-by: jcwchen <jacky82226@gmail.com>

* add transformers models

Signed-off-by: jcwchen <jacky82226@gmail.com>

* remove gpt1 and gpt2 for now

Signed-off-by: jcwchen <jacky82226@gmail.com>

* config

Signed-off-by: jcwchen <jacky82226@gmail.com>

* get model name from build_dir

Signed-off-by: jcwchen <jacky82226@gmail.com>

* find_model_hash_name

Signed-off-by: jcwchen <jacky82226@gmail.com>

* subprocess.PIPE

Signed-off-by: jcwchen <jacky82226@gmail.com>

* new models

Signed-off-by: jcwchen <jacky82226@gmail.com>

* 7 models

Signed-off-by: jcwchen <jacky82226@gmail.com>

* only keep 4

Signed-off-by: jcwchen <jacky82226@gmail.com>

* remove 4

Signed-off-by: jcwchen <jacky82226@gmail.com>

* remove albert-base-v2

Signed-off-by: jcwchen <jacky82226@gmail.com>

* del model and sess

Signed-off-by: jcwchen <jacky82226@gmail.com>

* check_path

Signed-off-by: jcwchen <jacky82226@gmail.com>

* drop models in CI

Signed-off-by: jcwchen <jacky82226@gmail.com>

* add bert_generation

Signed-off-by: jcwchen <jacky82226@gmail.com>

* --binary

Signed-off-by: jcwchen <jacky82226@gmail.com>

* disable bert_generation.py

Signed-off-by: jcwchen <jacky82226@gmail.com>

* no binary

Signed-off-by: jcwchen <jacky82226@gmail.com>

* cancel in progress

Signed-off-by: jcwchen <jacky82226@gmail.com>

* binary

Signed-off-by: jcwchen <jacky82226@gmail.com>

* minimal

Signed-off-by: jcwchen <jacky82226@gmail.com>

* --mini

Signed-off-by: jcwchen <jacky82226@gmail.com>

* manually check

Signed-off-by: jcwchen <jacky82226@gmail.com>

* only keep

Signed-off-by: jcwchen <jacky82226@gmail.com>

* run_test_dir

Signed-off-by: jcwchen <jacky82226@gmail.com>

* coma

Signed-off-by: jcwchen <jacky82226@gmail.com>

* cache_converted_dir = "~/.cache"

Signed-off-by: jcwchen <jacky82226@gmail.com>

* delete and clean cache

Signed-off-by: jcwchen <jacky82226@gmail.com>

* clean

Signed-off-by: jcwchen <jacky82226@gmail.com>

* clean all

Signed-off-by: jcwchen <jacky82226@gmail.com>

* only clean

Signed-off-by: jcwchen <jacky82226@gmail.com>

* --cache-dir", cache_converted_dir

Signed-off-by: jcwchen <jacky82226@gmail.com>

* disable openai_clip-vit-large-patch14

Signed-off-by: jcwchen <jacky82226@gmail.com>

* disable

Signed-off-by: jcwchen <jacky82226@gmail.com>

* only keep 4

Signed-off-by: jcwchen <jacky82226@gmail.com>

* comma

Signed-off-by: jcwchen <jacky82226@gmail.com>

* runs-on: macos-latest

Signed-off-by: jcwchen <jacky82226@gmail.com>

* not using conda

Signed-off-by: jcwchen <jacky82226@gmail.com>

* final_model_path

Signed-off-by: jcwchen <jacky82226@gmail.com>

* git-lfst pull dir

Signed-off-by: jcwchen <jacky82226@gmail.com>

* git diff

Signed-off-by: jcwchen <jacky82226@gmail.com>

* use onnx.load to compare

Signed-off-by: jcwchen <jacky82226@gmail.com>

* test_utils.pull_lfs_file(final_model_path)

Signed-off-by: jcwchen <jacky82226@gmail.com>

* only test changed models

Signed-off-by: jcwchen <jacky82226@gmail.com>

* test_utils

Signed-off-by: jcwchen <jacky82226@gmail.com>

* get_cpu_info

Signed-off-by: jcwchen <jacky82226@gmail.com>

* ext names

Signed-off-by: jcwchen <jacky82226@gmail.com>

* test_utils.get_changed_models()

Signed-off-by: jcwchen <jacky82226@gmail.com>

* compare 2

Signed-off-by: jcwchen <jacky82226@gmail.com>

* fix init

Signed-off-by: jcwchen <jacky82226@gmail.com>

* transformers==4.29.2

Signed-off-by: jcwchen <jacky82226@gmail.com>

* test

Signed-off-by: jcwchen <jacky82226@gmail.com>

* initializer

Signed-off-by: jcwchen <jacky82226@gmail.com>

* update bert-generation

Signed-off-by: jcwchen <jacky82226@gmail.com>

* fixed numpy

Signed-off-by: jcwchen <jacky82226@gmail.com>

* print(f"initializer {k}")

Signed-off-by: jcwchen <jacky82226@gmail.com>

* update bert from mac

Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>

* remove bert-generation

Signed-off-by: jcwchen <jacky82226@gmail.com>

* mlagility_subdir_count number

Signed-off-by: jcwchen <jacky82226@gmail.com>

* remove unused onnx

Signed-off-by: jcwchen <jacky82226@gmail.com>

---------

Signed-off-by: jcwchen <jacky82226@gmail.com>
Signed-off-by: Chun-Wei Chen <jacky82226@gmail.com>
  • Loading branch information
jcwchen committed Jul 26, 2023
1 parent c5612a4 commit c021460
Show file tree
Hide file tree
Showing 31 changed files with 170 additions and 56 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ on:
schedule:
- cron: '31 11 * * 4'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
analyze:
name: Analyze
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/linux_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This workflow contains a single job called "build"
build:
Expand Down
17 changes: 10 additions & 7 deletions .github/workflows/mlagility_validation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,23 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
build:
runs-on: ubuntu-latest
runs-on: macos-latest
strategy:
matrix:
python-version: ['3.8']
python-version: ["3.8"]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@c85c95e3d7251135ab7dc9ce3241c5835cc595a9 # v3.5.3
name: Checkout repo
- uses: conda-incubator/setup-miniconda@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@61a6322f88396a6271a6ee3565807d608ecaddd1 # v4.7.0
with:
miniconda-version: "latest"
activate-environment: mla
python-version: ${{ matrix.python-version }}

- name: Install dependencies and mlagility
Expand All @@ -34,4 +37,4 @@ jobs:
run: |
# TODO: remove the following after mlagility has resovled version contradict issue
pip install -r models/mlagility/requirements.txt
python workflow_scripts/run_mlagility.py
python workflow_scripts/run_mlagility.py --drop
4 changes: 4 additions & 0 deletions .github/workflows/windows_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
pull_request:
branches: [ main, new-models]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
# This workflow contains a single job called "build"
build:
Expand Down
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
Git LFS file not shown
2 changes: 2 additions & 0 deletions models/mlagility/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
numpy==1.24.4
torch==2.0.1
torchvision==0.15.2
transformers==4.29.2
3 changes: 2 additions & 1 deletion workflow_scripts/check_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@ def has_vnni_support():

def run_onnx_checker(model_path):
model = onnx.load(model_path)
onnx.checker.check_model(model, full_check=True)
del model
onnx.checker.check_model(model_path, full_check=True)


def ort_skip_reason(model_path):
Expand Down
3 changes: 1 addition & 2 deletions workflow_scripts/generate_onnx_hub_manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
import onnx
from onnx import shape_inference
import argparse
from test_models import get_changed_models
from test_utils import pull_lfs_file
from test_utils import get_changed_models, pull_lfs_file


# Acknowledgments to pytablereader codebase for this function
Expand Down
11 changes: 11 additions & 0 deletions workflow_scripts/mlagility_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,15 @@
"torch_hub/densenet121.py",
"torch_hub/inception_v3.py",
"torch_hub/googlenet.py",
#"transformers/bert_generation.py", # non consistent created model from mlagility
#"popular_on_huggingface/bert-base-uncased.py",
#"popular_on_huggingface/xlm-roberta-large.py",
#"popular_on_huggingface/bert-large-uncased.py",
"popular_on_huggingface/openai_clip-vit-large-patch14.py",
#"popular_on_huggingface/xlm-roberta-base.py", # output nan
#"popular_on_huggingface/roberta-base.py", # output nan
"popular_on_huggingface/distilbert-base-uncased.py",
#"popular_on_huggingface/distilroberta-base.py", # output nan
"popular_on_huggingface/distilbert-base-multilingual-cased.py",
#"popular_on_huggingface/albert-base-v2", # Status Message: indices element out of data bounds, idx=8 must be within the inclusive range [-2,1]
]
49 changes: 34 additions & 15 deletions workflow_scripts/run_mlagility.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import subprocess
import sys
import ort_test_dir_utils
import test_utils


def get_immediate_subdirectories_count(dir_name):
Expand All @@ -21,7 +22,7 @@ def find_model_hash_name(stdout):
line = line.replace("\\", "/")
# last part of the path is the model hash name
return line.split("/")[-1]
raise Exception(f"Cannot find Build dir in {stdout}.")
raise Exception(f"Cannot find Build dir in {stdout}.")


ZOO_OPSET_VERSION = "16"
Expand All @@ -33,34 +34,45 @@ def find_model_hash_name(stdout):


def main():
# caculate first; otherwise the directories might be deleted by shutil.rmtree
mlagility_subdir_count = get_immediate_subdirectories_count(mlagility_models_dir)

parser = argparse.ArgumentParser(description="Test settings")

parser.add_argument("--all_models", required=False, default=False, action="store_true",
help="Test all ONNX Model Zoo models instead of only chnaged models")
parser.add_argument("--create", required=False, default=False, action="store_true",
help="Create new models from mlagility if not exist.")
parser.add_argument("--drop", required=False, default=False, action="store_true",
help="Drop downloaded models after verification. (For space limitation in CIs)")
parser.add_argument("--skip", required=False, default=False, action="store_true",
help="Skip checking models if already exist.")


args = parser.parse_args()
errors = 0

changed_models_set = set(test_utils.get_changed_models())
print(f"Changed models: {changed_models_set}")
for model_info in models_info:
directory_name, model_name = model_info.split("/")
_, model_name = model_info.split("/")
model_name = model_name.replace(".py", "")
model_zoo_dir = model_name
print(f"----------------Checking {model_zoo_dir}----------------")
final_model_dir = osp.join(mlagility_models_dir, model_zoo_dir)
final_model_name = f"{model_zoo_dir}-{ZOO_OPSET_VERSION}.onnx"
final_model_path = osp.join(final_model_dir, final_model_name)
if not args.all_models and final_model_path not in changed_models_set:
print(f"Skip checking {final_model_path} because it is not changed.")
continue
if osp.exists(final_model_path) and args.skip:
print(f"Skip checking {model_zoo_dir} because {final_model_path} already exists.")
continue
try:
print(f"----------------Checking {model_zoo_dir}----------------")
final_model_dir = osp.join(mlagility_models_dir, model_zoo_dir)
final_model_name = f"{model_zoo_dir}-{ZOO_OPSET_VERSION}.onnx"
final_model_path = osp.join(final_model_dir, final_model_name)
if osp.exists(final_model_path) and args.skip:
print(f"Skip checking {model_zoo_dir} because {final_model_path} already exists.")
continue
cmd = subprocess.run(["benchit", osp.join(mlagility_root, model_info), "--cache-dir", cache_converted_dir,
"--onnx-opset", ZOO_OPSET_VERSION, "--export-only"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=sys.stderr, check=True)
model_hash_name = find_model_hash_name(cmd.stdout)
print(model_hash_name)
mlagility_created_onnx = osp.join(cache_converted_dir, model_hash_name, "onnx", model_hash_name + base_name)
if args.create:
ort_test_dir_utils.create_test_dir(mlagility_created_onnx, "./", final_model_dir)
Expand All @@ -75,14 +87,21 @@ def main():
except Exception as e:
errors += 1
print(f"Failed to check {model_zoo_dir} because of {e}.")

if args.drop:
subprocess.run(["benchit", "cache", "delete", "--all", "--cache-dir", cache_converted_dir],
cwd=cwd_path, stdout=sys.stdout, stderr=sys.stderr, check=True)
subprocess.run(["benchit", "cache", "clean", "--all", "--cache-dir", cache_converted_dir],
cwd=cwd_path, stdout=sys.stdout, stderr=sys.stderr, check=True)
shutil.rmtree(final_model_dir, ignore_errors=True)
shutil.rmtree(cache_converted_dir, ignore_errors=True)
total_count = len(models_info) if args.all_models else len(changed_models_set)
if errors > 0:
print(f"All {len(models_info)} model(s) have been checked, but {errors} model(s) failed.")
print(f"All {total_count} model(s) have been checked, but {errors} model(s) failed.")
sys.exit(1)
else:
print(f"All {len(models_info)} model(s) have been checked.")
print(f"All {total_count} model(s) have been checked.")


mlagility_subdir_count = get_immediate_subdirectories_count(mlagility_models_dir)
if mlagility_subdir_count != len(models_info):
print(f"Expected {len(models_info)} model(s) in {mlagility_models_dir}, but got {mlagility_subdir_count} model(s) under models/mlagility."
f"Please check if you have added new model(s) to models_info in mlagility_config.py.")
Expand Down
34 changes: 3 additions & 31 deletions workflow_scripts/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,6 @@ def get_all_models():
return model_list


def get_changed_models():
model_list = []
cwd_path = Path.cwd()
# git fetch first for git diff on GitHub Action
subprocess.run(["git", "fetch", "origin", "main:main"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# obtain list of added or modified files in this PR
obtain_diff = subprocess.Popen(["git", "diff", "--name-only", "--diff-filter=AM", "origin/main", "HEAD"],
cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutput, _ = obtain_diff.communicate()
diff_list = stdoutput.split()

# identify list of changed ONNX models in ONXX Model Zoo
model_list = [str(model).replace("b'", "").replace("'", "")
for model in diff_list if onnx_ext_name in str(model) or tar_ext_name in str(model)]
return model_list


def main():
parser = argparse.ArgumentParser(description="Test settings")
# default all: test by both onnx and onnxruntime
Expand All @@ -53,12 +34,12 @@ def main():
parser.add_argument("--create", required=False, default=False, action="store_true",
help="Create new test data by ORT if it fails with existing test data")
parser.add_argument("--all_models", required=False, default=False, action="store_true",
help="Test all ONNX Model Zoo models instead of only chnaged models")
help="Test all ONNX Model Zoo models instead of only changed models")
parser.add_argument("--drop", required=False, default=False, action="store_true",
help="Drop downloaded models after verification. (For space limitation in CIs)")
args = parser.parse_args()

model_list = get_all_models() if args.all_models else get_changed_models()
model_list = get_all_models() if args.all_models else test_utils.get_changed_models()
# run lfs install before starting the tests
test_utils.run_lfs_install()

Expand Down Expand Up @@ -106,16 +87,7 @@ def main():
print("[PASS] {} is checked by onnx. ".format(model_name))
if args.target == "onnxruntime" or args.target == "all":
try:
# git lfs pull those test_data_set_* folders
root_dir = Path(model_path).parent
for _, dirs, _ in os.walk(root_dir):
for dir in dirs:
if "test_data_set_" in dir:
test_data_set_dir = os.path.join(root_dir, dir)
for _, _, files in os.walk(test_data_set_dir):
for file in files:
if file.endswith(".pb"):
test_utils.pull_lfs_file(os.path.join(test_data_set_dir, file))
test_utils.pull_lfs_directory(Path(model_path).parent)
check_model.run_backend_ort_with_data(model_path)
print("[PASS] {} is checked by onnxruntime. ".format(model_name))
except Exception as e:
Expand Down
35 changes: 35 additions & 0 deletions workflow_scripts/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,18 @@ def pull_lfs_file(file_name):
print(f'LFS pull completed for {file_name} with return code= {result.returncode}')


def pull_lfs_directory(directory_name):
# git lfs pull those test_data_set_* folders
for _, dirs, _ in os.walk(directory_name):
for dir in dirs:
if "test_data_set_" in dir:
test_data_set_dir = os.path.join(directory_name, dir)
for _, _, files in os.walk(test_data_set_dir):
for file in files:
if file.endswith(".pb"):
pull_lfs_file(os.path.join(test_data_set_dir, file))


def run_lfs_prune():
result = subprocess.run(['git', 'lfs', 'prune'], cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print(f'LFS prune completed with return code= {result.returncode}')
Expand Down Expand Up @@ -62,3 +74,26 @@ def remove_tar_dir():
def remove_onnxruntime_test_dir():
if os.path.exists(TEST_ORT_DIR) and os.path.isdir(TEST_ORT_DIR):
rmtree(TEST_ORT_DIR)


def get_changed_models():
tar_ext_name = ".tar.gz"
onnx_ext_name = ".onnx"
model_list = []
cwd_path = Path.cwd()
# TODO: use the main branch instead of new-models
branch_name = "new-models" # "main"
# git fetch first for git diff on GitHub Action
subprocess.run(["git", "fetch", "origin", f"{branch_name}:{branch_name}"],
cwd=cwd_path, stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# obtain list of added or modified files in this PR
obtain_diff = subprocess.Popen(["git", "diff", "--name-only", "--diff-filter=AM", "origin/" + branch_name, "HEAD"],
cwd=cwd_path, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutput, _ = obtain_diff.communicate()
diff_list = stdoutput.split()

# identify list of changed ONNX models in ONXX Model Zoo
model_list = [str(model).replace("b'", "").replace("'", "")
for model in diff_list if onnx_ext_name in str(model) or tar_ext_name in str(model)]
return model_list

0 comments on commit c021460

Please sign in to comment.