Skip to content

New API ONNXRT example update#187

Merged
chensuyue merged 44 commits intomasterfrom
new_api_onnx_example
Dec 27, 2022
Merged

New API ONNXRT example update#187
chensuyue merged 44 commits intomasterfrom
new_api_onnx_example

Conversation

@yuwenzho
Copy link
Copy Markdown
Contributor

Type of Change

example

Description

update ONNXRT example for new API

JIRA ticket: ILITV-2468

How has this PR been tested?

extension test on onnx models

Dependency Change?

no

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@yuwenzho yuwenzho marked this pull request as ready for review December 7, 2022 03:28
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@yuwenzho
Copy link
Copy Markdown
Contributor Author

yuwenzho commented Dec 9, 2022

hi @chensuyue, PR is ready for extension test

@chensuyue
Copy link
Copy Markdown
Contributor

extension test

  1. pls check the tuning regression.
  2. benchmark.sh api gap.

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@yuwenzho yuwenzho force-pushed the new_api_onnx_example branch from d0882e3 to 15c4863 Compare December 16, 2022 03:03
@yuwenzho
Copy link
Copy Markdown
Contributor Author

@chensuyue extension test:
https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3784/artifact/report.html
image

performance regression is caused by switching performance dataset from dummy to real dataset.

@chensuyue
Copy link
Copy Markdown
Contributor

extension test for the other examples.

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@yuwenzho
Copy link
Copy Markdown
Contributor Author

https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3877/ Note: object detection models need new quantization recipe support from Strategy team and may not pass extension test now.

NLP models failed due to some typos and code changes not working. Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3883/

Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3890/
yolov3, yolov4 and tiny_yolov3 will not be enabled in this version because 'onnxrt.graph_optimization.level' is not supported now.

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@yuwenzho
Copy link
Copy Markdown
Contributor Author

Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3890/ yolov3, yolov4 and tiny_yolov3 will not be enabled in this version because 'onnxrt.graph_optimization.level' is not supported now.

  1. ssd-12, ssd-12_qdq, faster_rcnn, faster_rcnn_qdq, mask_rcnn, mask_rcnn_qdq will be re-enabled in 2.1 with supported 'onnxrt.graph_optimization.level' and quantization recipe. Please ignore them in extension test.
  2. hf model failed with error: 'setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.', which is caused from numpy version update. issue

Update:

  • remove ssd, faster_rcnn and mask_rcnn model
  • update model config json
  • add numpy==1.23.5 into requirements.txt in huggingface model

Retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3909/

@yuwenzho
Copy link
Copy Markdown
Contributor Author

passed: bert_squad_model_zoo_dynamic, mobilebert_squad_mlperf_dynamic, mobilebert_squad_mlperf_qdq, duc, BiDAF_dynamic and huggingface question answering models

failed: gpt2_lm_head_wikitext_model_zoo_dynamic and huggingface test classification models, retest: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3913/

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
@yuwenzho
Copy link
Copy Markdown
Contributor Author

passed:
bert_squad_model_zoo_dynamic, mobilebert_squad_mlperf_dynamic, mobilebert_squad_mlperf_qdq, duc, BiDAF_dynamic and huggingface question answering models: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3908/artifact/report.html
gpt2_lm_head_wikitext_model_zoo_dynamic and huggingface test classification models: https://inteltf-jenk.sh.intel.com/job/intel-lpot-validation-top-mr-extension/3919/artifact/report.html

@mengniwang95 mengniwang95 mentioned this pull request Dec 26, 2022
@chensuyue chensuyue merged commit 97c8e3b into master Dec 27, 2022
@chensuyue chensuyue deleted the new_api_onnx_example branch December 27, 2022 01:50
VincyZhang pushed a commit that referenced this pull request Feb 12, 2023
* SparseLib add vtune support

refine doc about profiling
yiliu30 added a commit that referenced this pull request Apr 5, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <yiliu4@habana.ai>
mengniwang95 pushed a commit that referenced this pull request Apr 15, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <yiliu4@habana.ai>
xin3he pushed a commit that referenced this pull request Apr 22, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <yiliu4@habana.ai>
XuehaoSun pushed a commit that referenced this pull request May 13, 2025
Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <yiliu4@habana.ai>
This was referenced Dec 9, 2025
@yiliu30 yiliu30 mentioned this pull request Jan 16, 2026
yiliu30 added a commit that referenced this pull request Mar 12, 2026
* [SW-207748] Support Auto-round on HPU (#25)

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-209878] Increase threshold to avoid random error in test_layer_wise.py (#36)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-207579] support load vLLM compatible FP8 model (#18)

Support load vLLM compatible FP8 model, both G2 and G3, both single card and multi-cards.
---------

Signed-off-by: changwang <changwang@habana.ai>

* [SW-207451] Implement block-wise calibration for LLM (#41)

* [SW-207451] Implement block-wise calibration for LLM

---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-208986] fix save&load bug (#40)

* [SW-208986] fix save&load bug

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-207748] Add Auto-round Example (#42)

* add autoround hpu example

Change-Id: Ibd537f4667c7c077160427722a5eca2c721aa5cd
Signed-off-by: Yi Liu <yiliu4@habana.ai>

* add requirements

Change-Id: I77a95ec05e41247db9903e8622c31f05259ca365
Signed-off-by: Yi Liu <yiliu4@habana.ai>

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] fix bug (#47)

* [SW-210541] loading for fused_sdpa requires additional amax scale (#51)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* fix PatchedLoRACompatibleLinear init (#65)

Signed-off-by: changwangss <changwang@habana.ai>

* align files with v1.19.0 in fp8_quant folder

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix missing SaveLoadFormat

Signed-off-by: Xin He <xinhe3@habana.ai>

* align and fix config after cherry-pick

Signed-off-by: Xin He <xinhe3@habana.ai>

* Implicit relative imports is abandoned

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix config issue blocking CI

Signed-off-by: Xin He <xinhe3@habana.ai>

* remove synchronize for `pack_unpack_tensor_with_numpy` (#2070)

* remove pack&unpack synchronize

---------

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* stop auto-fix of pre-commit

Signed-off-by: Xin He <xinhe3@habana.ai>

* update autoround example for release test

Signed-off-by: xin3he <xin3.he@intel.com>

* fix AWQ&TEQ loading due to input scale

Signed-off-by: xin3he <xin3.he@intel.com>

* fix HQQ state_dict loading caused by [SW-195965]

Signed-off-by: xin3he <xin3.he@intel.com>

* use per_channel as default config (#2091)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* workaround transformers issue in version 4.47.0 (#2092)

* workaround transformers issue in version 4.47.0

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Refactor FP8 pytest script (#2089)

* Refactor FP8 pytest script

---------

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update ci scan scope

Signed-off-by: chensuyue <suyue.chen@intel.com>

* [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-213236] resolve CPU mem issue in CI (#76)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* recover pre-commit

Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix `is_sharded` setting for loading quant model (#2094)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* fix error message for different python version (#2099)

Signed-off-by: changwangss <changwang@habana.ai>

* fix UT of RTN on HPU (#2098)

Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix device issue during calibration (#2100)

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix woq example and update document for v1.19.0 (#2097)

Signed-off-by: xin3he <xin3.he@intel.com>

* Refactor version import paths to common module (#2095)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update CI gaudi-docker to 1.19.0 (#2096)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix device mapping issue of llama gptq (#2101)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Doc: Update readme.md (#2083)

Signed-off-by: fengding <feng1.ding@intel.com>

* update publication_list.md (#2105)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* [pre-commit.ci] pre-commit autoupdate (#2107)

Signed-off-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Update publication list with new blog (#2111)

* Update publication list with new blog

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Update publication list num

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

---------

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update the License (#2108)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add autotune support for PT2E (#2110)

Add autotune support for PT2E and disable some conv1d-related test on HPU
---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Xin He <xin3.he@intel.com>

* Add intel-extension-for-pytorch to Transformers-like API requirements (#2113)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Add VLM quantization & loading into transformers-like API (#2116)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Fix hf_device_map setting for transformers-like api (#2122)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add mapping entry to 1.20 (#2126)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* fix bug of lwq gtpq (#2128)

Signed-off-by: n1ck-guo <heng.guo@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add xfail for onnx test_layer_wise.py (#2129)

* Cherry pick Habana software v1.20.0  (#2123)

* [SW-210525] release HPU memory when loading neural_magic fp8 models (#48)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-211178] save generation_config when saving model if exists (#57)

* [SW-211178] save generation_config when saving model if exists

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-210543] update gitignore to simplify the git message (#50)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-205334][SW-187731] llama70b vLLM fix graph breaks with  torch.compile (#67)

* fix graph breaks with torch.compile

* remove orig_mod from helper_modules

* fix typos

* fix test_register_apis

---------

Co-authored-by: Rafal Litka <rlitka@habana.ai>

* [SW-213890] Disable test_two_step_layer_wise temporarily (#84)

* [SW-205437] - Support LM-HEAD patching (#79)

* [SW-205437] - Support LM-HEAD patching

* fix CR comments

* Enhance and rename fix_measurements tool to postprocessing_vllm_measurements (#82)

* [SW-214088] Fix graph break caused by PatchedMixtralMoE (#74)

* [SW-208528] Support FP8 per channel Q/DQ (#13)

* add per channel qdq support

Signed-off-by: changwang <changwang@habana.ai>

* improve ut

Signed-off-by: changwang <changwang@habana.ai>

* improve get_scale_dtype func and qdq init

Signed-off-by: changwangss <changwang@habana.ai>

* improve DequantOutput QuantInput init

Signed-off-by: changwangss <changwang@habana.ai>

* add scale_method improve PCQ

Signed-off-by: changwangss <changwang@habana.ai>

* remove scale name

Signed-off-by: changwangss <changwang@habana.ai>

* fix PCQ scale_inv expanding

Signed-off-by: changwangss <changwang@habana.ai>

* merge the qdq_per_channel, qdq_per_tensor to qdq

Signed-off-by: changwangss <changwang@habana.ai>

* move scale_inv change to the QuantInput init

Signed-off-by: changwangss <changwang@habana.ai>

* remove  scale_dtype list judge

Signed-off-by: changwangss <changwang@habana.ai>

* fix missing axis parameter

Signed-off-by: changwangss <changwang@habana.ai>

---------

Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: changwangss <changwang@habana.ai>

* [SW-204341] explicit scale format for ops (#73)

* [SW-204341] explicit scale format for ops

Added wrapper around fp8 functions

Wrapper decides which flavor of the function to call,
according to scale format

Helper modules call the wrapper

Decide which cast flavor to call,
according to scale format

* [SW-204341] Adjust softmax API , remove commented-out code

* [SW-204341] Fixes from CR 1

* [SW-204341] Fixed CR 2

* [SW-204341] add missing arg is fsdpa

Signed-off-by: Uri Livne <ulivne@habana.ai>

* [SW-204341] Enhance SDPA for measure and quant

* [SW-204341] remove sdpa quantized ops

* reland per op class with more enchancments

* [SW-204341] reland specfic arguments , rename class to wrapper

* added call with self in patched lm head

rebased on top of master next
force push

* fix mistake in conflict resolution

resotore MethodType fix

* antoher fix

* modified fp8 mtamul test to test quantized matmul func

* another fix of rebase mistake

* hopefully last rebase mistake fix

* restore backward compatibly import protection

---------

Signed-off-by: Uri Livne <ulivne@habana.ai>

* [SW-213890] Revert "[SW-213890] Disable test_two_step_layer_wise temporarily (#84)" (#86)

This reverts commit 27162ae60755aa39a40ac1a1c96a9efaac48751b.

* Revert "[SW-205334][SW-187731] llama70b vLLM fix graph breaks with  torch.com…" (#87)

This reverts commit 01a57343d34d8d0c63935b848bdd94d9dfd40210.

Co-authored-by: Danny Semiat <dsemiat@habana.ai>

* [ALGO-809] PatchedLmHeadLinearAllreduce: replacing the sharding code with the one from deepspeed-fork (#85)

Change-Id: Icb9670cfefdd1880c1ebb9a804a97c9ba79ecdc3

Co-authored-by: smarkovichgolan <smarkovich@habana.ai>

* fix bug of FusedMoE object has no attribute w13_weight (#94)

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

* [SW-208588] Add HPU fp8 Dynamic MOE (#88)

* [SW-208588] Add HPU fp8 Dynamic MOE

* fix review comments

* fix more review comments

* fix comments

* fix tests

* minor config fixes (#96)

* [SW-0] minor cosmetic fixes in quant_config

* remove hooks

* [SW-196641] - Fix type mismatch in linear quantization unit tests (#99)

* [SW-196641] - Fix type mismatch in linear quantization unit tests

* fix atol value

* add hp_dtype to fp8 config dict before parsing

* [SW-214785] Apply PatchedModuleBase for all existing PatchedModules (#92)

* [SW-214785] Apply PatchedModuleBase for all existing PatchedModules

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-215319] threshold of memory usage in test_block_wise.py is too tight (#100)

* [SW-215543] Revert "minor config fixes (#96)" (#104)

This reverts commit fa40142f32404691ea2ae36d13a15c348c853bde.

* fix RowParalleLinear func names from string to tuple (#106)

* [SW-215615] memory is unreleased during loading neural_magic models on multi-cards (#105)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-212423] RuntimeError when load the gptq model from HF (#70)

* [SW-212423] RuntimeError when load the gptq model from HF
* skip tie_word_embeddings=False

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-214785] fix issue when self._mod_extra_config is None (#108)

* [SW-211826] [example] demonstrate layer-wise, block-wise and lm_eval usage (#66)

* [SW-211826] [example] demonstrate layer-wise&block-wise usage to quantize LLM with limited host&device memory

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-215295] Force single object from quantized func wrapper classes (#103)

* [SW-215295] Force single object from quantized func wrapper classes

* Modify the factory object to be cleared after module patching

* Move cleanup to Quantizer object

* [SW-216292]Minor update for lm-eval (#113)

* Enable lm-eval 0.4.2 and expose `add_bos_token`

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-209207] add vllm fp8 dynamic MoE (#116)

* [SW-216239] Align Softmax fp8 scale calc with configuration (#112)

* [SW-217321] Skip auto round tests (#119) (#125)

* Test Commit

* [SW-217321] Skip auto round tests do to CI breakage

* remove uneeded print

* [SW-207451] Implement block-wise calibration for LLM (#24)

For LLMs, measurement on bf16 requires high hpu memory usage.
This change can help measure bf16 llama-405b on 8 Gaudi2 card, or measure llama-70b on 1 Gaudi card.
Shortage: cannot measure lm_head layer, maybe we can enhance it later.

---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] fix bug in output arbitrary scales (#45)

* [SW-197077] fix bug

* [SW-197077] fix bug in outputs arbitrary scales

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] fix bug in output arbitrary scales (#45)

* [SW-197077] fix bug

* [SW-197077] fix bug in outputs arbitrary scales

* [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54) (#77)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-213236] resolve CPU mem issue in CI (#76) (#83)

Cherry-pick from 1.19
Co-authored-by: Xin He <xin3.he@intel.com>

* [SW-213368] requirements_pt.txt: allow newer pydantic versions to >= 1.10.13 (#80)

* requirements_pt.txt: upgrade pydantic version to >= 2.0.0

* allow newer version of pydantic

newer deepspeed uses pydantic v2, which have slight different APIs.

* Update requirements_pt.txt

* [SW-212057] Enable scalar scale to support QDQ (#98)

* [SW-212057] Enable scalar scale to support QDQ

Change-Id: Ib5f5accd7a770675609e91c18bd04497b15937c5

* PR comment fixes

Change-Id: I01be41c29721b8d59c887f3d2b4e3cef8433331c
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-215845] Run some unit tests from top level API (#109)

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-212629] Support saving weight-only quantization INT4 model in Hugging Face format (#101)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-205970] update state_dict to save scalar scales (#6)

* update state_dict method in save/load function

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* Revert "[SW-205970] update state_dict to save scalar scales (#6)" (#114)

This reverts commit ffcb97eaf7965bf8ac942523660a6d5e9e8e2184.

* [SW-212092] Save vllm compatible format (#102)

* save vllm compatible format

Signed-off-by: changwangss <changwang@habana.ai>

* add assertion and improve max_file_size to human reading

Signed-off-by: changwangss <changwang@habana.ai>

* support default the same with huggingface when saving

Signed-off-by: changwangss <changwang@habana.ai>

* separate save funtion for single device and multi devices.

Signed-off-by: changwangss <changwang@habana.ai>

* rebase

Signed-off-by: changwangss <changwang@habana.ai>

* rebase save

Signed-off-by: changwangss <changwang@habana.ai>

* remove weight and scale convert on G2

Signed-off-by: changwangss <changwang@habana.ai>

* rebase master_next due to revert #6

Signed-off-by: changwangss <changwang@habana.ai>

* improve convert weight to vllm compatable function

Signed-off-by: changwangss <changwang@habana.ai>

* replace print to logger

Signed-off-by: changwangss <changwang@habana.ai>

* move unit_mapping to common utils

Signed-off-by: changwangss <changwang@habana.ai>

---------

Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-205970] update state_dict to save scalar scales (#115)

* [SW-205970] update state_dict to save scalar scales (#6)

* update state_dict method in save/load function

* support mixtral
---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-215009] support loading per-channel scales (#95)

* [SW-215009] support loading per-channel scales

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix UT

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* Refactoring scales (#22) (#122)

* Refactoring scales (#22)

* [SW-197077] refactoring maxabs scales and adding arbitrary scales.

* [SW-199696] Supporting Dynamic Quantization (#128)

* Calculating dynamic scales using nn.Modules

Change-Id: I8c344ae737803b39117037edaaa3d3b9cbd09f30

* [SW-199696] Supporting Dynamic Quantization

Change-Id: Ic5d6f04ec0b5032ac305e1b3097747c47250385b

* Code cleanup

Change-Id: I213bc7438e06bd1002775066bfb0dc6f10e8a84a

* Review changes and model print issue (circular dependency fix)

Change-Id: I5c41d2f9a937416ce260f55cb045c86858dd201a

* removed debug code from patching_common.py

* Round 2 + CI import issue

Change-Id: I27dbb33de8e027fb0b726336b38156b5d23a6896
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-217334] enable fp8 qdq mode using PatchedModuleBase (#129)

* [SW-217334] enable fp8 qdq mode using PatchedModuleBase

* fix review commnets

* [SW-218871] fp8 multi-cards is not loaded correctly (#138)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* Fix bug in mixtral unitscale (#141)

* [SW-218197] fix bug in Mixtral unitscale

* [SW-218197] fix bug in Mixtral unitscale

* update version to 3.3 for release

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-20808] Make sure save&load format is an Enum object (#58)

* [SW-20808] Make sure save&load format is an Enum object

Signed-off-by: Xin He <xinhe3@habana.ai>

* Update save_load_entry.py

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add xfail for torchvision

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix ILITV-3859

Signed-off-by: xin3he <xin3.he@intel.com>

* workaround for ILITV-3858

Signed-off-by: xin3he <xin3.he@intel.com>

* fix sdxl_smooth_quant

Signed-off-by: xin3he <xin3.he@intel.com>

* fix ILITV-3854

Signed-off-by: xin3he <xin3.he@intel.com>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: Uri Livne <ulivne@habana.ai>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: RafLit <rafal.litka@intel.com>
Co-authored-by: Rafal Litka <rlitka@habana.ai>
Co-authored-by: Dany Kiazada <141814181+kiazada@users.noreply.github.com>
Co-authored-by: Nir David <124874956+nirda7@users.noreply.github.com>
Co-authored-by: Yuwen Zhou <yuwen.zhou@intel.com>
Co-authored-by: Wang, Chang <changwang@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Oz Abramovich <oabramovich@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: smarkovichgolan <smarkovich@habana.ai>
Co-authored-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Nadav Elyahu <88962733+nelyahu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Doc: Update fp8 accuracy test data and update docker image 1.20.0 (#2130)

Signed-off-by: fengding <feng1.ding@intel.com>

* [SW-219274] - Changing the quant method name in lm-head (#150) (#2132)

* [SW-219274] - Changing the quant method name in lm-head (#150)

* Update helper_modules.py

---------

Co-authored-by: Nir David <124874956+nirda7@users.noreply.github.com>

* Adapt ipex xpu transformers version (#2134)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add phi3 vlm  transformers example (#2135)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* fix saving issue for group_size=-1 (#2138)

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add v3.3 release faq (#2139)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* bump release version into 3.3 (#2140)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* upgrade numpy from 1.23.5 to 1.26.4 (#2115)

Signed-off-by: xin3he <xin3.he@intel.com>

* Update publications (#2145)

* update publications

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Add transformers to align onnxruntime-extensions=1.14.0 (#2147)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Freeze 2x package versions (#2151)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix vulnerability (#2149)

Signed-off-by: xin3he <xin3.he@intel.com>

* Bump into v3.3.1 (#2152)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* [SW-218939] fix memory mapping failure in UT (#2154)

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-223106] change code with robust implementation (#2153)

* update habana docker and PyTorch and related packages to latest versions (#2158)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* remove mxnet (#2146)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

* Workaround of [SW-208658] (#2162)

Signed-off-by: Xin He <xinhe3@habana.ai>

* compatible with transformers version 4.50 (#2159)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix NotImplementedError: get_type is not implemented (#2133)

Signed-off-by: Xin He <xinhe3@habana.ai>

* improve 3x ut on bf16 supported machine (#2163)

Signed-off-by: changwangss <chang1.wang@intel.com>

* remove numpy version limitation (#2161)

Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] pre-commit autoupdate (#2166)

* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pycqa/isort: 5.13.2 → 6.0.1](https://github.com/pycqa/isort/compare/5.13.2...6.0.1)
- [github.com/psf/black.git: 24.10.0 → 25.1.0](https://github.com/psf/black.git/compare/24.10.0...25.1.0)
- [github.com/codespell-project/codespell: v2.3.0 → v2.4.1](https://github.com/codespell-project/codespell/compare/v2.3.0...v2.4.1)
- [github.com/astral-sh/ruff-pre-commit: v0.8.6 → v0.11.4](https://github.com/astral-sh/ruff-pre-commit/compare/v0.8.6...v0.11.4)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add revision for hf-internal-testing/tiny-random-gptj in UT (#2174)

Signed-off-by: changwangss <chang1.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* suit transformers>=4.51 (#2171)

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add the missed change during cherry-pick (#2175)

* Reset `accelerator` when `INC_TARGET_DEVICE` is set in code (#2168)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Fix text-to-image example (#2176)

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

* restrict lm_eval version <= 0.4.7 (#2177)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Add `update_g_idx` flag for setting qweight&g_idx (#2143)



Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* [SW-219134] dynamically remove  2x content (#2173)

* [SW-219134] dynamically remove  2x content

Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix(pytorch): Rename layer_scale parameter to avoid quantization error (#2172)

* fix(pytorch): Rename layer_scale parameter to avoid quantization error

---------

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: V-E-D <vedantthote2019@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: Xin He <xin3.he@intel.com>

* Adapt autoround v0.5 (#2187)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update transformers version for VLM example (#2181)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update export function for torch 2.6 (#2184)

Signed-off-by: Yi Liu <yiliu4@habana.ai>

* add mapping between v3.4 and v1.21 (#2189)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Add continueOnError to avoid baseline stages block CI (#2188)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Revert "[SW-219134] dynamically remove  2x content (#2173)" (#2183)

* Add `add_bos_token` for Llama3 evaluation (#2179)

* add_bos_token for llama3

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update code

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix security issue (#2194)

* fix security issue

Signed-off-by: Xin He <xinhe3@habana.ai>

* update pub list with llama4 blog (#2190)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Fix WoQ loading for Large Model (#2200)

* fix save/load

Change-Id: I36d1ca10426b65abfe2093091f41906cecf2fcaf
Signed-off-by: Yi Liu <yiliu4@habana.ai>

* fix check

Change-Id: I2cc647ed3ecc7ff499206a3d79e0f71e25605f7e
Signed-off-by: Yi Liu <yiliu4@habana.ai>

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* add qwen3 blog to pub list (#2201)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* replace XPU with Intel GPU (#2198)

* replace XPU with Intel GPU

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* Revert "remove 1x docs (#1900)" (#2205)

This reverts commit d3204604aad007f3db67c46dcb0575aa8f5cd584.

* Deprecate 2x Tensorflow, Keras and ONNX (#2199)

Signed-off-by: Xin He <xinhe3@habana.ai>

* Fix `g_idx` init in transformers-like API (#2204)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* [SW-205334][SW-187731] Reintroduce PR #67 with PatchedKVCache fix of [SW-214296] (#97)

* [SW-205334] wrap calls to original module methods

* add wrap for fetch_from_cache

* fix PatchedKVCache test, revert the PatchedModule
Base

* PatchedParallelLMHead switch orig_linear_apply

* fix PatchedKVCache

---------

Co-authored-by: Rafal Litka <rlitka@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-218081] temproray disable fp8_static_quant test (#131)

* [SW-218081] temproray disable fp8_static_quant test

* skip test block wise measurements

* dummy commit to re-trigger CI

* [FSW-12066] Add initial multi device support (#91)

* [FSW-12066] Multiple PT Bridge support

Added abstraction for per device quantized func wrapper

* [SW-12066] Add quantized func wrapper for multi-device

Removed hpu code from quant_dequant
add unit test

* TEMP commit

* Further adjustmetns

Rebaesd above latests master_next
added INC device enum (instead usage of device speicifc enums)
removed unecceeray device specific imports
Moved device specifc imports to internal scopes, or wraped with try-except

* Add fixes after running guadi tests

* added __init__ files and removed undeeded _init_ function

* Call directly to get_quantized_func_wrapper_object

and remove old hpu ops file

* MOre fixes for xpu tests to work

added import xpu_modules
device for scale calculation
added file headers

* adjutments for dynamic MOE vllm

* Adjutments after rebase of scale refactiorng

* Rebase for 1.21, call in load API, refine factory class

* rebase from master_next 8_2_25
* add init/clear calls in load api
* refine factory class impl, use class methods and members only
* refine func wrapper init file, use only api functions

* Fixes after CR 10_02

* More Fixes after CR

* Compare INC Acclerator enum value

* fix var name

* Added Matmul OP type

* Rename gemm and matmul op types

Signed-off-by: Xin He <xinhe3@habana.ai>

* [FSW-12066] small fixes in xpu quantized func (#145)

* [SW-218197] fix bug in Mixtral unitscale (#139)

Signed-off-by: Xin He <xinhe3@habana.ai>

* [ALGO-808] add support for int4 weights + fp8 activations - phase 1 (#43)

* [ALGO-808] add support for int4 weights + fp8 activations - phase 1

* Add code for quantizing only single input to PatchedMatmul

* w4a8 new kernel

---------

Co-authored-by: Tomer Gafni <tgafni@habana.ai>

* [SW-218081] Re-enable tests (#140)

* [SW-214378] remove creation of nc_workspace in each INC run (#151)

* [SW-214378] remove creation of nc_workspace in each INC run

* remove the commented line

* [SW-219274] - Fix getting error log when lm_head in vLLM does not have measurements for scale calculation (#152)

* [SW-207602] INC to Support fp8 communication in PatchedRowParallelLinear (#1)

This commit modifies PatchedRowParallelLinear collective func to custom
all_reduce function that:
-During measurement- measures all_reduce output and matmul_fp8 maximum
output.
-During quantization- quantizes all_gather and all_to_all ops inside the all_reduce
func as they preformed in fp8.
Add branch in reduce_forward_quant to have the fp8 optimization be done only at decode phase

---------

Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: linoy buchnik <lbuchnik@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-207602] Fix bug with PatchedRowParallelLinear (#158)

* [SW-219745] fix fp8 GaudiMixtralSparseMoeBlock graph break (#161)

* Blockwise gptq (#155)

* Add blockwise quantization for GPTQ

* sharded checkpoint additions

* CR fixes

* CR fixes #2

* fix error caused in CI

* update safetensors requirements file to support safetensors hpu support

Signed-off-by: Xin He <xinhe3@habana.ai>

* Raise error when measuring PC without shapes (#163)

* Raise error when measuring PC without shapes

* Update measure.py

* [SW-218303] Fix incorrect bias addition point in PatchedColumnParallelLinear (#160)

* Correct `PatchedVLLMKVCache` to measure the whole input (#170)

Change-Id: I41a07985d602936e5d6c4f25a061a009bc251253

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-218484] Enhance log for saving (#134)

* [SW-218484] Enhance log for saving

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-222320] Optimize code for TPC fuser in dynamic quantization (#173)

* [SW-221372] allow running Mixtral measurment phase using torch.compile (#167)

Co-authored-by: Ivan Antonov <Iantonov@habana.ai>

* [SW-222366] Switch tests to lazy mode (#174)

* [SW-222366] Move env default to init (#178)

* [SW-223106] Temporary disable mixed precision test (#180)

* fp8 aware gptq (hybrid gptq) (#154)

* fp8 aware gptq (hybrid gptq)

* review1

* loading bias to mixed low precision

* fixing tests for fp8 aware quantization and hybrid re-ordering

* Addressed second review round comments

* Adressed review 3 comments

---------

Co-authored-by: Asaf Karnieli <akarnieli@habana.ai>

* [SW-222513] OSError: does not appear to have a file named generation_config.json (#181)

* [SW-222513] OSError: does not appear to have a file named generation_config.json

* Update save_load.py

* Revert "fp8 aware gptq (hybrid gptq) (#154)" (#184)

This reverts commit 050dc44424debaa659f8dd58b4a4ec6bc9af7d68.

* [SW-221589] AutoRound W4A8 Quantization and Loading (#110)

- Quantize model to W4A8 using auto-round
- Loading W4A8 model

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Asaf Karnieli <akarnieli@habana.ai>
Co-authored-by: Tomer Gafni <tgafni@habana.ai>

* [SW-214855] - Set scale attributes in INC to reduce graph recompilations (#162)

* [SW-219831] - Set scale attributes in INC to reduce grpah recompilation

* add scaling methods ids

* fix scaling method ids check and set

* enable feature also for Load QuantMode

* move scale tensors to cpu when feature is enabled

* fix scaling methods ids to start at 1

* fix cr comments

* remove unnecessary imports

* fix cr comments

* fix more cr comments

* fix cr comments

* move scale to float on cpu in scale handler for dynamic scaling

* fix cr comments

* Add unit test

* fix sending scale tensor to bridge and unit-test bug

* [SW-222220] Quantising Llama3.2. 11B/90B fails with GC error (#179)

* refine PatchVLLMKVCache

* move cache out of args

* revert option2

* add get_cache

* Revert "add get_cache"

This reverts commit a89d9d23810ce594743504fea4bc5cd49e8d4192.

* Revert "revert option2"

This reverts commit d2b124c1d30717baf482eb887ba5ab3cb09ac51d.

* add comments

* update comment

* Dummy commit for triggering CI

* Dummy commit for triggering CI

* [SW-216623] Restore patch module to original before convert (#185)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-218081] move htcore.hpu_set_env() to confest (#146)

* [SW-218081] move htcore.hpu_set_env() to confest

Signed-off-by: Xin He <xinhe3@habana.ai>

* Update conftest.py

* use htcore.hpu_set_inference_env()

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-221588] Unify weights in multi-cards for HF/INC format (N->M) (#172)

* Unify weights in multi-cards for HF/INC format (N->M)

M cards for loading is less than N, and can divide N with an integer number
--------------------------------------------------------------------------------------------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-221594]Re-quantize the Official DeepSeek FP8 Model (#187)

Building on the vllm WoQ path, this PR adds support for re-quantizing FP8 weights w/ per-tensor or per-channel scaling.

---------

Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-218277]Add support for mixtral with expert parallelism (#177)

* Add support for mixtral with expert parallelism

* Remove allreduce from measurement

* Update PatchedVLLMKVCache for deepseek performance (#194)

Co-authored-by: Linoy Buchnik <linoybu@gmail.com>

* [SW-224836] disable test_mixed_precision_gptq_fp8_quant_only_nlp (#208)

Co-authored-by: linoy buchnik <lbuchnik@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* Fix `PatchedMoeMatmul` and Get `num_experts` from Module (#202)

Fix `PatchedMoeMatmul` and Get `num_experts` from Module

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* add back missing changes in cherry-pick

Signed-off-by: Xin He <xinhe3@habana.ai>

* add docstring and fix typo

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix CI failure caused by internal changes

Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* support update config after initialization (#2191)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* add preprocess_quant_config to collect common code (#2192)

* add preprocess_quant_config to collect common code

Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* skip fp8 xpu path in CI

Signed-off-by: Xin He <xinhe3@habana.ai>

* remove htcore.hpu_inference_set_env to suit 1.20

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix typo

Signed-off-by: Xin He <xinhe3@habana.ai>

* support ComposableConfig setattr

Signed-off-by: Xin He <xinhe3@habana.ai>

* workaround for v1.20 missing attribution

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix bug in previous UT

Signed-off-by: Xin He <xinhe3@habana.ai>

* remove experimental related document and code (#2207)

* remove experimental

Signed-off-by: Xin He <xinhe3@habana.ai>

* remove docs (#2206)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Yi Liu <yi4.liu@intel.com>

* Add add_bos_token for Llama3(Instruct) xpu evaluation (#2209)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update habana docker image version and DeepSpeed dependency to 1.21.0 (#2213)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: Xin He <xin3.he@intel.com>

* add INC_PT_ONLY and INC_TF_ONLY (#2202)

* add INC_PT_ONLY and INC_TF_ONLY
* compatible with previous install method

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* Bump release version into v3.4 (#2214)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update torch to 2.7.0 in CI (#2216)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Fix VLM model ut (#2218)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Add transformers version limit for ut case (#2219)

Signed-off-by: changwa1 <chang1.wang@intel.com>

* Update `--int8` flag to `--optimized` flag (#2215)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Fix g_idx init for GPTQ (#2222)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>

* Fix CI env(#2225)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Bump release version to v3.4.1 (#2224)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Freeze IPEX version for INT8 SQ support (#2221)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* remove unused examples (#2230)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* [pre-commit.ci] pre-commit autoupdate (#2233)

Signed-off-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Bump transformers (#2234)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.38.0 to 4.51.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.38.0...v4.51.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.51.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* update pytorch int8 notebook example (#2235)

* Adapt transformers v4.53.1 (#2237)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Support saving inc model for transformers-like api  (#2231)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Bump transformers from 4.35 to 4.52.1 (#2236)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.35 to 4.52.1.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.35.0...v4.52.1)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.52.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump transformers from 4.51.0 to 4.52.1 (#2243)

Bumps [transformers](https://github.com/huggingface/transformers) from 4.51.0 to 4.52.1.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.51.0...v4.52.1)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.52.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix Mix Precision Example (#2242)

Signed-off-by: Yi Liu <yiliu4@habana.ai>

* Bump transformers to 4.52.1 (#2244)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* [SW-207602] Fix all_reduce scale format in PatchedRowParallelLinear (#183)

* [SW-199696] Added PatchedLinearBase (#192)

* Added PatchedLinearBase

* Fixed PatchedLinear forward_qdq

* Changed quant strategy - scale to fix ci

* Renamed QuantStrategy to QuantWrapper

* Removed instance member from QuantWrapper

* [SW-224403] Added ticket and throwing error when using row_parallel_linear_allreduce_quantization

* Changed QuantWrapper to a simple method that stores scale

* [SW-224538] Added ticket to TODO comment for init_linear

* Pushed requires_grad to the tensor creation

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Moved copy_scale functions inside PatchedLinearBase

* Update helper_modules.py

* Update helper_modules.py

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Update helper_modules.py

* Update helper_modules.py

* Update helper_modules.py copy scale

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* [SW-199696] Implementing dynamic quantization design for linear ops (#188)

* Implementing dynamic quantization design for linear ops

* Using copy_ to store scale as a member, added qdq, removed dyn

* Added PatchedLinearBase to support all linear modules

* Testing dynamic quantization with scale compare

* CR comments - calling cguid

* Added PatchedLinearBase

* Fixed PatchedLinear forward_qdq

* Changed quant strategy - scale to fix ci

* Renamed QuantStrategy to QuantWrapper

* Removed instance member from QuantWrapper

* [SW-224403] Added ticket and throwing error when using row_parallel_linear_allreduce_quantization

* Changed QuantWrapper to a simple method that stores scale

* [SW-224538] Added ticket to TODO comment for init_linear

* Pushed requires_grad to the tensor creation

* Fixed merge

* Fixed load() flow - handling meta tensors with dummy scale

* [SW-224609] removed non tested dynamic qdq

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Moved copy_scale functions inside PatchedLinearBase

* Added and fixed test cases

* Increased tolerance for new test cases

* Update helper_modules.py

* Update helper_modules.py

* Some tests/ci fixes

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Update helper_modules.py

* cr comments + cguid check change

* Update helper_modules.py

* Update helper_modules.py copy scale

* Update neural_compressor/torch/algorithms/fp8_quant/_quant_common/helper_modules.py

* Maxabs design and some structure changes

* Merged MaxAbsDynamicPts To base + cguid comments

* changed cguid calls to functions

* Log changes

* Update neural_compressor/torch/algorithms/fp8_quant/model_configs.py

* Update neural_compressor/torch/algorithms/fp8_quant/model_configs.py

* Re-set self.scale_input as before, value is none in dynamic

* Changing back dynamic scale_input to intermediate and not member

* Disabling test_linear_dynamic_quantization: not storing scale as member

* Reintroduce MaxAbsDynamicPts: in dynamic we don't save scale as a member

* weight to hpu comment

* Fix incorrect condition in PatchedMatmul (#204)

* Fixing vllm runs for dynamic quantization (#210)

* Revert "Revert "fp8 aware gptq  (hybrid gptq) and fix performance drop of W4A16 scheme (#190)

* Revert "Revert "fp8 aware gptq  (hybrid gptq) and fix performance drop in gptq test (SW-223441)"

This reverts commit ba9475d7599bd1bb43fa978eef60a655b77e44dd.

* addressing reviewer comments

* Temporarily disable rel_err test until fixed

* fixed pytest error

---------

Co-authored-by: Asaf Karnieli <akarnieli@habana.ai>
Co-authored-by: Mariusz Okroj <mariusz.okroj@intel.com>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Signed-off-by: Xin He <xinhe3@habana.ai>

* enable bf16 h2d scales for dynamic scaling (#215)

* [SW-197607] INC- change hard coded gaudi 2 scales for optimal weight … (#221)

* [SW-197607] INC- change hard coded gaudi 2 scales for optimal weight quantization

* cr fix

* [SW-217321] Add autoround UTs Back (#197)

* add autoround UTs back

Change-Id: I0614ffd8be4f89e9787037ee99e24a60f8548b49

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-225078] [INC][DynamicQuant] Reenable testing dynamic quantization… (#214)

* [SW-225078] [INC][DynamicQuant] Reenable testing dynamic quantization scales on hpu graphs and torch.compile

* CR fixes

* tiny fix

* cr fix

* don't support running _quant_only_scale_methods with dynamic quantization

* string check fix

* fix test_matmul runs and atol in HW_ALIGNED_SINGLE_SCALE

* string fixes

* [SW-227504] disabled test (#226)

* [FSW-13914] Fix gaudi specific code in common location (#224)

Move Gaudi specific code to internal scopes, so it won't
be imported in FS/JS env

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-224874] Implement support for hp/lp dtypes in KV-cache QDQ (#222)

* [SW-225858] Fix unit_scale issue with unquantized modules (#225)

* [SW-223055] Cleanup fetch_from_cache (#229)

* [SW-226788] Fixed handling RowParallelLinear to fix accuracy (#233)

* [SW-226788] Fixed handling RowParallelLinear to fix accuracy

* Changed supported ops to op types

* dynamic quant check changes in quantize.py

* Fixed dynamic quant check changes in quantize.py

* [SW-228966] add codeowners to github (#230)

* fix revision bug when loading from huggingface hub (#235)

* fix revision bug when loading from huggingface hub
---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-228061] define ScaleMethod by its properties in INC FP8 (#227)

* [SW-228061] define ScaleMethod in better way

* bug fixes and new scale_method_config file

* copilot cr

* tiny fix

* rebase

* some cr fixes

* bug+cr fixes

* cr fix

* cr fix

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-229653] disable fakequant test (#236)

* [SW-226948] Removed redundant cast to hp float in invert scale + some function commentary (#237)

* [SW-218668] add use_mmap in example for llama-70b GPTQ  (#234)

* [SW-218668] add use_mmap in example for llama-70b GPTQ

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-229704] Disable scale rounding CGUID + using exp2 for fuser (#240)

* [SW-230359] Support vector of scales on cpu (#241)

and minor refactor of create_scale_tensor function

* Deepseek FP8 fixes (#242)

* Allign moe api, switch to local_num_experts

* retrigger checks

* retrigger checks

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-229825] support fp32 softmax mode in fp8_fsdpa (#247)

* [SW-199936] Remove collective_func usage as preparation for vLLM upstream (#249)

self.collective_func is not allowed for upstream,
therefore we use explicitly vLLM collective functions,
while protecting from import errors and circular import

* [SW-219751]improve vllm compatible save function (#217)

* improve vllm compatible save to avoid OOM
---------

Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-232491] Fix collective functions from vLLM (#250)

* [SW-228539] Support setting scale method per node in INC FP8 (#238)

* [SW-228539] Support setting scale method per node in INC FP8

* scale method parser

* tiny fix

* apply all scale methods validations in config parsing correctly

* tiny test fix

* copilot cr

* cr fixes

* cr fix

* python3.10 not supporting strenum

* fix for enum

* remove not needed imports

* fix hard coded path

* bug fix in new scale method test

* save scale_method with provided path

* small fix

* retriger

* retriger2

* [SW-214269] support g_idx for uint4 (#246)

* support g_idx for uint4


---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Sylwester Fraczek <sylwester.fraczek@intel.com>

* [SW-228570] support FP8 GaudiFluxPipeline save and load (#254)

* [SW-228570] support FP8 GaudiFluxPipeline save and load
---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-224612] Set scale calculation to run on cguid as default in dynamic quant (#257)

* [SW-224612] Set scale calculation to run on cguid as default

* Update common.py

* CGUID calculation only in dynamic

---------

Co-authored-by: Danny <dsemiat@habana.ai>

* [SW-228576] Add Dynamic Quant Support For FusedMoE (#243)


Signed-off-by: Yi Liu <yiliu4@habana.ai>

* [SW-230951] Save measurements according to samples counter (#251)

* Added post forward hook to dump measurements according to samples counter 
* add support in samples counter in config
* removed function in RowParllelLinear as it is removed from the vllm upstream code
* currently only blocking method is operational, will complete async methods in future commit



* fix CR comments

* remove unused files

* add reslove_input method

it can't be defined in vllm due to upstream considerations,
so it is copied here

* fixed logging acoording to cr

* fixed resolve_input and moved the hook function

* [SW-230641] Remove smoothquant related scale methods (#258)

* Rename model in UT to reduce CI effort (#245)

* rename model in UT to reduce CI effort
---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-233731] Support FP8 QDQ quant on CPU (#239)

supported module types: Linear, Conv2D, EmbeddingBag (weight-only quant)
validated scheme: per-tensor, sym, E4M3
validated model: DLRM, vit

---------

Signed-off-by: Mengni Wang <mewang@habana.ai>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Co-authored-by: Mengni Wang <mewang@habana.ai>

* [SW-233731] Use torchao op for CPU QDQ and abstract QDQ calling (#264)

Abstract QDQ calling
Fix QDQ model print issue
Use torchao op for CPU QDQ (HPU doesn't has this accuracy issue)
---------

Signed-off-by: Mengni Wang <mewang@habana.ai>
Co-authored-by: Mengni Wang <mewang@habana.ai>

* [SW-0] Update version to 3.5 (#269)

* [SW-234066] fix performance drop due to ordered g_idx (#277)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>

* update format after cherry_pick

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-230053] Merge with public

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-234066] update g_idx fix to make it robust (#281)

The previous fix only covered part of the flow, and the second flow is failing.
The current fix moves the relevant code to a common location that's shared by both flows.

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix kwargs mismatch issue in WOQ pack

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix cpu fp8 quant unittest (#2245)

* fix cpu fp8 quant unittest

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix requirement and recover skipped in UT  (#2248)

Fix requirement and recover skipped in UT
---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Xin He <xin3.he@intel.com>

* Fix CPU FP8 per-tensor QDQ (#2246)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Co-authored-by: Xin He <xin3.he@intel.com>

* add version check for 1.22.0

Signed-off-by: Xin He <xinhe3@habana.ai>

* make version check robust

Signed-off-by: Xin He <xinhe3@habana.ai>

* Use SafeUnpickler (#2247)

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* better solution for checking g_idx support (#2251)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Replace all pickle load with safe load (#2252)

* replace all pickle load

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Update neural_compressor/utils/utility.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add docstring

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Update utility.py

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [SW-234750] Fix reading distributed data in quant_config (#284) (#2255)

Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>

* add mapping entry for v3.5 (#2250)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Fix autoround CI with amp (#2253)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add CPU FP8 QDQ doc (#2240)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

* [SW-235565] Adapt the `FusedMoE` update (#286) (#2256)

* fix moe
* update copy properties

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* Add dlrm_v2 CPU FP8 QDQ example (#2239)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

* fix corrupted file

* fix CR

* fix merge

* fix cr

* fix cr

* fix cr

* fix

* fix

* [SW-240561]Requant LLMC FP8 model (#301)

* add VllmMixtureOfExpertsOpFP8PerChannel


Change-Id: I1e28dbb1f6a6839fe90db8a38da4532bd335d69e
Signed-off-by: Yi Liu <yiliu4@habana.ai>

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* pass extra args to moe (#302)

Change-Id: I47f5259a247bbce0c6290d1d1d1bb47071bd3256

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* fix

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: fengding <feng1.ding@intel.com>
Signed-off-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Huang, Tai <tai.huang@intel.com>
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Uri Livne <ulivne@habana.ai>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Daniel Socek <daniel.socek@intel.com>
Signed-off-by: V-E-D <vedantthote2019@gmail.com>
Signed-off-by: changwa1 <chang1.wang@intel.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Mengni Wang <mewang@habana.ai>
Co-authored-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Xin He <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Wang, Chang <changwang@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: feng-intel <110514170+feng-intel@users.noreply.github.com>
Co-authored-by: Huang, Tai <tai.huang@intel.com>
Co-authored-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: RafLit <rafal.litka@intel.com>
Co-authored-by: Rafal Litka <rlitka@habana.ai>
Co-authored-by: Dany Kiazada <141814181+kiazada@users.noreply.github.com>
Co-authored-by: Nir David <124874956+nirda7@users.noreply.github.com>
Co-authored-by: Yuwen Zhou <yuwen.zhou@intel.com>
Co-authored-by: Oz Abramovich <oabramovich@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: smarkovichgolan <smarkovich@habana.ai>
Co-authored-by: Nadav Elyahu <88962733+nelyahu@users.noreply.github.com>
Co-authored-by: fengding <feng1.ding@intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Daniel Socek <daniel.socek@intel.com>
Co-authored-by: Vedant <146507396+ved1beta@users.noreply.github.com>
Co-authored-by: Asaf Karnieli <akarnieli@habana.ai>
Co-authored-by: Tomer Gafni <tgafni@habana.ai>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai>
Co-authored-by: Ivan Antonov <Iantonov@habana.ai>
Co-authored-by: Tomasz Bohutyn <tbohutyn@habana.ai>
Co-authored-by: Tomasz Szulist <72727299+tszulist-hbn@users.noreply.github.com>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Krzysztof Wiśniewski <kwisniewski@habana.ai>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mariusz Okroj <mariusz.okroj@intel.com>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Sylwester Fraczek <sylwester.fraczek@intel.com>
Co-authored-by: Mengni Wang <mewang@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants