Cherry pick Habana software 1.18.0 update #2025

xin3he · 2024-10-11T02:39:48Z

Type of Change

Cherry pick Habana software 1.18.0 update

commit 23fe77e Author: Uri Livne <ulivne@habana.ai> Date: Sun Aug 11 19:01:44 2024 +0300 [SW-193273] Merge from public github to gerrit Merged from INC public master branch, top commit 7056720 Change-Id: I3c016ab98973ac56fc976e5b15a678e91a59291e commit f02e9bd Author: Asaf Karnieli <akarnieli@habana.ai> Date: Tue Aug 13 15:23:33 2024 +0300 [ALGO-801] add additional mark_step in qdq due to difference in results Change-Id: Ia7adaa70afb4f2990686fdb242d6a8f651fc2986 commit 775d5a2 Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Sun Aug 11 09:52:04 2024 +0300 [SW-174155] Fix race condition bug when reading scales Implement an inter-process reader-writer lock Implement locking mechanism at save_file/load_file Change-Id: I140fdc05814286796bb47e6be8170b2ae9dd5154 commit a529cf4 Author: Asaf Karnieli <akarnieli@habana.ai> Date: Sun Aug 11 12:39:20 2024 +0300 [ALGO-801] Add Fake Quant option in linear and matmul layers Change-Id: I9888c92ffc33035f75d434044f4ef41b58f51e62 commit 09c6312 Author: Uri Livne <ulivne@habana.ai> Date: Mon Aug 12 10:42:44 2024 +0300 [SW-192770] Remove regression detection script It is maintained by QA in other path Change-Id: Ie343575e0a6da28681283541847ad9541e209e30 commit ac48710 Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Tue Aug 6 14:06:52 2024 +0300 [SW-195526] Rename LOG_LEVEL_HQT to LOG_LEVEL_INC Rename 'HQT' occurrences in fp8_tests.py and logger.py Change-Id: Ibbf314410de627f98a54d2230bf8db72aca0c40a commit c7aa37c Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Tue Aug 6 15:02:38 2024 +0300 [SW-195525] INC Logger: Support ENABLE_CONSOLE values 1/0 Add support for values '1' and '0' for 'ENABLE_CONSOLE' env var Change-Id: I53f71250d7a74d2a8050aa1722b75acaebef0c4c commit b42b018 Author: yan tomsinsky <ytomsinsky@habana.ai> Date: Mon Aug 5 13:30:07 2024 +0300 [SW-195483] Remove hard coded strings from FP8 config in INC Change-Id: I1f58b74ab07eda93739b4e6c8be5041ac2beb714 commit c6af377 Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Mon Aug 5 15:49:03 2024 +0300 [SW-194203] Add flag to recalculate scales Add support for conf 'recalc_scales' in Fp8cfg::parse Remove 'recalc_scales' parameter from get_config in scale.py - insead read from hqt config Change-Id: Ie5fe693e8dfdab850fcf3647049fda2880f20ba2 commit 55e1387 Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Thu Aug 1 17:02:10 2024 +0300 [SW-186675] Update default configuration of 'allowlist' Defined default allowlist types to be empty - allows quantization of all models Refactor parse function to more dynamic code and consistency Change-Id: I6c8a14cb7ca6830927e5c5b7476e4b03335456aa commit 3f1d5c0 Author: Eran Geva <egeva@habana.ai> Date: Sun Aug 4 10:56:03 2024 +0300 [SW-192999] bump the inc version to 3.0 Change-Id: I2236780a613cd7102fa16618bc24aaca0d2f5d86 commit c19fcbd Author: Nir David <ndavid@habana.ai> Date: Thu Aug 1 19:48:57 2024 +0300 Adjust INC to run from vLLM with old PA Change-Id: Ifdea6840aaa22791f478ad10788e5d47fd4a0394 commit ff114b7 Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Tue Jul 30 13:15:12 2024 +0300 [SW-194748] Switch tester.py framework from using HQT to using INC Switch every call to HQT package to use INC instead Change-Id: I2f2dd4e6d6029aeb73fa2f70e7978aecfdccc65e commit 7949907 Author: Eran Geva <egeva@habana.ai> Date: Mon Jul 29 15:53:59 2024 +0300 [SW-194599] fix setup.py get_build_version Change-Id: I22ab530d88a2f37802859a7f3434e6395390566a commit ad0625b Author: yan tomsinsky <ytomsinsky@habana.ai> Date: Tue Jul 9 12:00:57 2024 +0300 [SW-189684] Add description to functions in HQT Change-Id: Id5822a21abd1f60f28999574c2ca0e89acc70bf6 commit 7bf9521 Author: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Date: Mon Jul 29 10:08:53 2024 +0300 [SW-193263] Switch HQT unit tests to run on INC Modify test to point to the correct package in INC instead of HQT. Add __init__.py file to include needed content for test_layers' tests. Change-Id: If47acdfc9f7521a54a7f350a444711a7c2b3e5b2 commit a5b6ef8 Author: Uri Livne <ulivne@habana.ai> Date: Sun Jul 28 13:34:04 2024 +0300 [SW-184689] Adjust correct condition for one step flow align to 1.17 Change-Id: I588680b463a9f8304d95863306b6d5b2503e6e62 commit ae9d934 Author: xinhe3 <xinhe3@hababa.ai> Date: Tue Jul 16 09:16:50 2024 +0300 [SW-192931] align setup.py with github INC and remove fp8_convert Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f Signed-off-by: xinhe3 <xinhe3@hababa.ai> commit a92d70a Author: xinhe3 <xinhe3@hababa.ai> Date: Tue Jul 16 06:16:34 2024 +0300 [SW-192917] Update all HQT logic files with pre-commit check Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42 Signed-off-by: xinhe3 <xinhe3@hababa.ai> (cherry picked from commit 099e984) Signed-off-by: xinhe3 <xinhe3@hababa.ai> commit 56a1a7e Author: xinhe3 <xinhe3@hababa.ai> Date: Thu Jul 18 05:19:42 2024 +0300 [SW-193292] align INC pt requierments with OHF requieremnt (peft==0.11.1) Change-Id: I55961ff8265177b7916870d9884350af2bb7542f Signed-off-by: xinhe3 <xinhe3@hababa.ai> (cherry picked from commit aa26f16) commit 3f61954 Author: Witold Szczurek <wszczurek@habana.ai> Date: Mon Jul 22 14:51:03 2024 +0300 [SW-187215] Add valid_seq_len feature to patched SDPA module Change-Id: Ia627fe8134470d68a7e55fc978a972bb7f7b3d5b commit 039af39 Author: Nir David <ndavid@habana.ai> Date: Thu Jul 25 12:18:23 2024 +0300 [SW-194200] Save scale file only with new scales Change-Id: I14a4ef94d188b13c2fbf4ea77d2b42cb5bd6d952 commit 4f8b257 Author: Zhou Yuwen <zyuwen@habana.ai> Date: Mon Jul 15 09:02:41 2024 +0000 [SW-192809] fix json_file bug when instantiating FP8Config class Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> (cherry picked from commit dc4b5f5) commit 3572617 Author: Nir David <ndavid@habana.ai> Date: Thu Jul 25 10:30:10 2024 +0300 [SW-194177] - Integrate new vllm-PA algo with HQT Change-Id: I94c9679f0aff7c2f9a86a802da825bfd6d0772ad commit 5e3a679 Author: Dudi Lester <dlester@habana.ai> Date: Thu Jul 11 15:15:58 2024 +0300 [SW-191415] update fp8 maxAbs observer using torch.copy_ Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f commit 7f62871 Author: Asaf Karnieli <akarnieli@habana.ai> Date: Sun Jul 21 11:45:02 2024 +0300 [ALGO-790] add GPTQ quantization support for Gaudi Change-Id: I00ac0c6d2263e1dde3b86b019f84671188f1b482 commit abaa038 Author: Uri Livne <ulivne@habana.ai> Date: Thu Jul 11 12:41:09 2024 +0300 [SW-192358] Remove HQT reference in INC Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e commit 56c03d8 Author: yan tomsinsky <ytomsinsky@habana.ai> Date: Tue Jul 9 12:31:07 2024 +0300 [SW-190303] Implement HPUWeightOnlyLinear class in INC Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41 commit 969f467 Author: Zhou Yuwen <zyuwen@habana.ai> Date: Wed Jun 12 18:49:17 2024 -0700 [SW-184943] Enhance INC WOQ model loading - Support loading huggingface WOQ model - Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses - Load woq linear weight module by module - Save hpu format tensor to reuse it once load it again Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf commit 6404b06 Author: xinhe3 <xinhe3@hababa.ai> Date: Tue Jul 9 11:32:29 2024 +0300 [SW-191945] align requirement_pt.txt in gerrit INC with Github INC Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef Signed-off-by: xinhe3 <xinhe3@hababa.ai> commit 7e1e78f Author: Uri Livne <ulivne@habana.ai> Date: Tue Jul 9 22:30:50 2024 +0300 [SW-184689] use finalize_calibration intrenaly for one step flow Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89 commit 997bf9b Author: Uri Livne <ulivne@habana.ai> Date: Mon Jul 8 11:29:04 2024 +0300 [SW-190899] Install packages according to configuration Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525 commit 1ed690c Author: Uri Livne <ulivne@habana.ai> Date: Sun Jun 23 11:54:59 2024 +0300 [SW-187731] Save orig module as member of patched module This allows direct usage of the original module methods, which solves torch compile issue Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632 commit adfe13b Author: smarkovichgolan <smarkovich@habana.ai> Date: Wed Jul 3 18:09:30 2024 +0300 Fix errors in regression_detection Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438 commit 222402e Author: Danny Semiat <dsemiat@habana.ai> Date: Thu Jun 20 12:27:17 2024 +0300 [SW-177468] Removed unused code + cleanup Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638 commit 7329e4f Author: Uri Livne <ulivne@habana.ai> Date: Sun Jul 7 18:23:30 2024 +0300 [SW-184714] Add internal folder to fp8 quant This is a folder used for experiments, not to be used by users Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc commit da4bcd2 Author: Uri Livne <ulivne@habana.ai> Date: Sat Jul 6 20:06:08 2024 +0300 [SW-184714] Port HQT code into INC HQT lib content was copied as is under fp8_quant Tests were copied to 3.x torch location Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9 commit 768c2a4 Author: Uri Livne <ulivne@habana.ai> Date: Wed Jul 3 17:22:02 2024 +0300 [SW-191317] Raise exception according to hqt config object Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a commit 52a98f4 Author: Uri Livne <ulivne@habana.ai> Date: Wed Jun 19 15:05:12 2024 +0300 [SW-189361] Fix white list extend Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3 commit abd570b Author: Zhou Yuwen <zyuwen@habana.ai> Date: Wed May 22 07:39:06 2024 +0000 [SW-177474] add HQT FP8 porting code Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> commit 254de6d Author: Ron Ben Moshe <rbenmoshe@habana.ai> Date: Thu Jun 6 10:58:15 2024 +0300 [SW-183320]updated setup.py Change-Id: I592af89486cb1d9e0b5197521c428920197a9103 commit f23f1fa Author: yan tomsinsky <ytomsinsky@habana.ai> Date: Sun May 19 16:39:09 2024 +0300 [SW-184941] INC CI, CD and Promotion Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52 commit d7ad2d1 Author: Uri Livne <ulivne@habana.ai> Date: Wed Apr 24 19:47:28 2024 +0300 [SW-181785] Remove torch from INC requierments Change-Id: I469c5b2ae3b1ff5369fa555fd1bcea193ec02211 commit 31d8bb9 Author: Wang, Mengni <mengni.wang@intel.com> Date: Tue Jun 11 15:28:40 2024 +0800 Add UT and remove unused code for torch MX quant (#1854) * Add UT and remove unused code for torch MX quant --------- Change-Id: I2727aa716fa99467fa2d63b966de4d88470e4bb3 Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai>

Signed-off-by: xinhe3 <xinhe3@habana.ai>

Change-Id: I69b3228c708b766fa3d3a7b8f8680bc2a98e5e62

…endencies Change-Id: I43563223dedb8578cdaee230dd8dd68fb70d17c4 Signed-off-by: xinhe3 <xinhe3@habana.ai>

…ted instead of deepspeed env Change-Id: I5a585037ee049dedc671e320c57e6e13151d79a8

Change-Id: I1cf46b9cc4f06cfa74f7bbcb7142c1387f294e6d

Change-Id: Idff4c54d4737a418cd3c56e127259163bdff29e5 Signed-off-by: Yi Liu <yiliu4@habana.ai>

Change-Id: Ie34d80ea536b7b01b38435fe48203c88a3442e37

Change-Id: Ia53ccae8d1fe5beb45ed625ac0defcd05393c047

…W_aligned scale Unit scale and the new HW aligned scale don't need measurement files to run, so we're avoiding loading the measurement files. Also, user can choose whether to supply stat path or not. If it's supplied, then the single scales will be saved also to this directory Change-Id: Ia582f3f10ef06ace592f9c1075af335f2dc3aea5

Change-Id: I4ae7770ace4440a998599d3e6ae5b76e34bf404b

…bility Change-Id: I125e08364835b87d97cf243a89db13fda8958f20

Change-Id: Ie2e7ccf5b6cfe016e93378066ccb5730c2255274

Change-Id: I87311e50a5bb1e0298ba39646930be608f783eee

Change-Id: I2bb14d1a4c5840965bf8bd23def0a4df9aa66abb

Change-Id: Iec299bfb45c167bcac7dc12a12991db4eebce440

Change-Id: I093cb0773f9ca1043c88ba7f1fb80df6ec0570b7 Signed-off-by: xinhe3 <xinhe3@habana.ai>

Change-Id: I049181549e32be923695e18ed31a47e80a57a783

Support config of scales as scalar create scales tensors according to config (scale or const) currnetly fsdpa op isn't supported due to op python API Change-Id: Ieb9d550a6118f9134c7d9d39db0bf0355192263c

…d update version to 3.1 Change-Id: I58d8e1e2443e3d16f1ac4a18abf5ef0b66319089 Signed-off-by: xinhe3 <xinhe3@habana.ai>

Previously, when dump_stats_path was in the config, INC still required measurements, even though unit scale doesn't need measurement * Also setting default scale_method to maxabs_hw * Added predefined config test with expected exceptions Change-Id: I86ad0774f0140da50cac8d1f80126aa3a5f6fc0b

Change-Id: I29e7313612435054c751806575b03f3e7a41a9be

Change the test to numel() == 1 instead of dim() == 0 Change-Id: I510aa2cc8f04e30d4c5346040ed0611eaa407cf4

This reverts commit 87fe15e. Change-Id: I22d429626a8a024a5893c439c70eb22e844c6736

Block PatchedSoftmax from accepting SCALAR scale_format Change-Id: I3836232ee06a6c4e76c74290b19436ae8bbff41c

Change-Id: I782d3070b61160562c96dfce243cc4f52c782365 Signed-off-by: xinhe3 <xinhe3@habana.ai>

ftian1 · 2024-10-12T07:48:52Z

pls fix DCO and UT issues

Signed-off-by: xinhe3 <xinhe3@habana.ai>

yiliu30

Left some comments, others LGTM.

docs/source/3x/PT_FP8Quant.md

yiliu30 · 2024-10-14T01:03:52Z

examples/fp8_sample/README.md

@@ -0,0 +1,96 @@
+### Usage demo:


Suggest adding these examples to the weekly test or CI. cc @XuehaoSun @chensuyue

requirements_pt.txt

Signed-off-by: xinhe3 <xinhe3@habana.ai>

Change-Id: Iafa4a6a8577724bd8a86581bfe38d3269dab2ea2 Signed-off-by: xinhe3 <xinhe3@habana.ai>

Signed-off-by: xinhe3 <xinhe3@habana.ai>

thuang6

LGTM

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

Signed-off-by: chensuyue <suyue.chen@intel.com>

Signed-off-by: xinhe3 <xinhe3@habana.ai>

test/3x/torch/algorithms/fp8_quant/unit_tests/test_functions/test_config_json.py

Signed-off-by: xinhe3 <xinhe3@habana.ai>

.../3.x_api/pytorch/nlp/huggingface_models/language-modeling/quantization/weight_only/README.md

Signed-off-by: xinhe3 <xinhe3@habana.ai>

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

Merged from INC public master branch Squashed commit of the following: commit 27f3e2657b2667e8bca8fb9c02a50d55f404a7e6 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Oct 22 11:51:37 2024 +0800 Adapt autoround format (#2038) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 7775768245ec8beea3210910b166e95e2e730586 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Sun Oct 20 19:41:36 2024 +0800 remove autoround limit (#2036) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 795aeb5bb6950a357da5e339ed97277bc73b4c5c Author: WeiweiZhang1 <weiwei1.zhang@intel.com> Date: Fri Oct 18 17:39:10 2024 +0800 Add vlm examples, bugfix (#2012) * add VLM examples Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix, add utils Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docstring issues Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine examples Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine shell Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine scripts & requirements Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * typofix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine docs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * set attn_implementation for Phi3-vision Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine phi3 example Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code coverage Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update config Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * refine shells, docs and example. enable qwen2-vl quantization Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix EOF error Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update qwen dir Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * refine shell, add llama3.2 inference to doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine eval shell Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix eval device issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine eval dtype Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit b5f3eb7ea604fc1bd235cce15dda87569b70390e Author: xinhe <xin3.he@intel.com> Date: Fri Oct 18 15:03:53 2024 +0800 add back missing image (#2035) Signed-off-by: xin3he <xin3.he@intel.com> commit 45b29d46a2e958b103c6f8a5539fead25809a89a Author: Huang, Tai <tai.huang@intel.com> Date: Thu Oct 17 15:23:26 2024 +0800 fix broken link to FP8 example (#2034) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 01bf4b2b3a0f12434b5f44f07a9c26abf96fb5f0 Author: Huang, Tai <tai.huang@intel.com> Date: Thu Oct 17 15:22:23 2024 +0800 update gaudi version mapping table for v3.1 (#2030) Signed-off-by: Huang, Tai <tai.huang@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit 5fb21847e12acc51ed4f197eb86b066e6578934b Author: xinhe <xin3.he@intel.com> Date: Thu Oct 17 15:21:18 2024 +0800 Cherry pick Habana software 1.18.0 update (#2025) Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> Co-authored-by: Danny <dsemiat@habana.ai> Co-authored-by: Tomer Gafni <tgafni@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com> Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit d6149aac01c8142f6f5ffc18c03433c82f44150c Author: Yi Liu <yi4.liu@intel.com> Date: Wed Oct 16 14:02:29 2024 +0800 Update the PT2E CV example (#2032) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 08ec90866f9cbd770bed3d93c35aaaf0087d4fe9 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Oct 16 09:20:33 2024 +0800 Remove itrex dependency for 2x example (#2024) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit d9377b826d24b2e1c206632bc40f39aab02b3d43 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Oct 15 15:28:37 2024 +0800 Support generation search for transformers examples (#2029) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 61f1e393b6374703b6516fe9406bafb0cc088009 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Oct 11 17:07:14 2024 +0800 Support quant procedure on XPU (#2026) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit dfa6aabab3d280085fb166822b7d849a2dc9b36e Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Oct 11 16:36:42 2024 +0800 remove ITREX unit test CI (#2021) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 2bb257e71353d87414ff7e410ca35bce5cc3dbc7 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Thu Oct 10 19:27:11 2024 +0800 Add woq examples (#1982) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit 586eb88fc7b4bfe87bf8fed9f77951623e48bd88 Author: Huang, Tai <tai.huang@intel.com> Date: Wed Oct 9 09:22:39 2024 +0800 add transformers-like api link in readme (#2022) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 4e9c7641589c5f3eec20972f9a16022b7eb7e941 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Oct 8 13:13:45 2024 +0800 Remove itrex dependency for 3x example (#2016) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit a0066d4e55f3dc03a2e0b992286d8806509cf368 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Sep 30 18:17:32 2024 +0800 Fix transformers rtn layer-wise quant (#2008) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 802a5af3c3093941b665e6e9a92c706da1aeccdc Author: Huang, Tai <tai.huang@intel.com> Date: Mon Sep 30 17:02:52 2024 +0800 add autoround EMNLP24 to pub list (#2014) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 44795a1ae93f3676a595063cf0e6f680c41989b2 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Sep 30 16:55:22 2024 +0800 Adapt transformers 4.45.1 (#2019) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: changwangss <chang1.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit d4662ad47a4af11a9ed8b45429aff007d8c1b605 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Sep 30 15:52:17 2024 +0800 Add transformers-like api doc (#2018) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 72398b69334d90cdd7664ac12a025cd36695b55c Author: Wang, Chang <chang1.wang@intel.com> Date: Fri Sep 27 15:11:04 2024 +0800 fix xpu device set weight and bias (#2010) Signed-off-by: changwangss <chang1.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit 9d27743705af66a66aafae7fb1d19e2ffad6b2a2 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Sep 27 14:17:24 2024 +0800 Update model accuracy (#2006) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 7bbc47373033a46fca92ce5ec312d4e051092eee Author: xinhe <xin3.he@intel.com> Date: Fri Sep 27 11:47:00 2024 +0800 add pad_to_buckets in evaluation for hpu performance (#2011) * add pad_to_buckets in evaluation for hpu performance --------- Signed-off-by: xin3he <xin3.he@intel.com> commit b6b7d7c3c415d67976e054ab5ad5be6b5d5b460d Author: Kaihui-intel <kaihui.tang@intel.com> Date: Thu Sep 26 17:21:54 2024 +0800 Update auto_round requirements for transformers example (#2013) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit ee600ba79c5197908ca119446b377be59a5a19e7 Author: Wang, Chang <chang1.wang@intel.com> Date: Fri Sep 20 13:54:06 2024 +0800 add repack_awq_to_optimum_format function (#1998) Signed-off-by: changwangss <chang1.wang@intel.com> commit 4ee6861d666a15c26bb796547d446879e17e6b11 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Thu Sep 19 22:27:05 2024 +0800 remove accelerate version in unit test (#2007) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 24458114c0765e177b3f4dfbb73d7cfda6b196ab Author: WeiweiZhang1 <weiwei1.zhang@intel.com> Date: Sat Sep 14 18:13:30 2024 +0800 enable auto_round format export (#2002) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> commit 906333abd41c8be8a6f097da42c1931ea3bb37d5 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Sat Sep 14 16:17:46 2024 +0800 Replace FORCE_DEVICE with INC_TARGET_DEVICE [transformers] (#2005) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 443d00779acac739c3a185f384b78236eaac9643 Author: xinhe <xin3.he@intel.com> Date: Fri Sep 13 21:35:32 2024 +0800 add INC_FORCE_DEVICE introduction (#1988) * add INC_FORCE_DEVICE introduction Signed-off-by: xin3he <xin3.he@intel.com> * Update PyTorch.md * Update PyTorch.md * Update docs/source/3x/PyTorch.md Co-authored-by: Yi Liu <yi4.liu@intel.com> * rename to INC_TARGET_DEVICE Signed-off-by: xin3he <xin3.he@intel.com> --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> commit 5de9a4f56c4cf3901b8ca75d56677255c4e8c833 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Sep 13 20:48:22 2024 +0800 Support transformers-like api for woq quantization (#1987) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> commit 9c39b429baa16591d4fe883e1a7279761f7f86a5 Author: chen, suyue <suyue.chen@intel.com> Date: Thu Sep 12 14:34:49 2024 +0800 update docker image prune rules (#2003) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 09d4f2d6fb1a6aa91874a0b87a967067800462cb Author: Huang, Tai <tai.huang@intel.com> Date: Mon Sep 9 09:24:35 2024 +0800 Add recent publications (#1995) * add recent publications Signed-off-by: Huang, Tai <tai.huang@intel.com> * update total count Signed-off-by: Huang, Tai <tai.huang@intel.com> --------- Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 399cd44a35583bd96701bee58107c6969be0201e Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Sep 3 16:37:09 2024 +0800 Remove the save of gptq config (#1993) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 05272c48591567d0a1d36fe6cfe5c697d836887b Author: Yi Liu <yi4.liu@intel.com> Date: Tue Sep 3 10:21:51 2024 +0800 add per_channel_minmax (#1990) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 82d8c06c6b535d8db21a6c848a2e374b3b16288e Author: chen, suyue <suyue.chen@intel.com> Date: Fri Aug 30 21:21:00 2024 +0800 update 3x pt binary build (#1992) Signed-off-by: chensuyue <suyue.chen@intel.com> commit e9f06af240065fd48066d32ec4d856c0b7a62f14 Author: Huang, Tai <tai.huang@intel.com> Date: Fri Aug 30 17:49:48 2024 +0800 Update installation_guide.md (#1989) Correct typo in installation doc commit 093c9669692c8b9263cfbc16d7299da4170c8201 Author: Wang, Chang <chang1.wang@intel.com> Date: Fri Aug 30 17:45:54 2024 +0800 add quantize, save, load function for transformers-like api (#1986) Signed-off-by: changwangss <chang1.wang@intel.com> commit 4dd49a43dec86aea581db4f29c7ca36b0baf1f7c Author: xinhe <xin3.he@intel.com> Date: Thu Aug 29 17:23:18 2024 +0800 add hasattr check for torch fp8 dtype (#1985) Signed-off-by: xin3he <xin3.he@intel.com> commit f2c454f88c0ffbb4d30d66eedaa6fc56ad47f804 Author: chen, suyue <suyue.chen@intel.com> Date: Thu Aug 29 13:45:39 2024 +0800 update installation and ci test for 3x api (#1991) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 7ba9fdcb24a8ea1c1efc27844f39d0c128f83517 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Aug 19 14:50:50 2024 +0800 support gptq `true_sequential` and `quant_lm_head` (#1977) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 68b1f8b734bff723dd4962da08ecdc0d22c5faab Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Aug 16 09:43:46 2024 +0800 Fix UT env and upgrade torch to 2.4.0 (#1978) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit f9dfd54272348483037cc70802cd85a085fec39c Author: Yi Liu <yi4.liu@intel.com> Date: Thu Aug 15 14:13:26 2024 +0800 Skip some tests for torch 2.4 (#1981) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 46d9192659f1c0dcf488e2e69f0f7dd7bd0b2f2e Author: xinhe <xin3.he@intel.com> Date: Thu Aug 15 09:57:22 2024 +0800 update readme for fp8 (#1979) Signed-off-by: xinhe3 <xinhe3@habana.ai> commit 842b7159fafa09300bc0e745c802910a2d60502e Author: chen, suyue <suyue.chen@intel.com> Date: Tue Aug 13 12:09:25 2024 +0800 bump main version into v3.1 (#1974) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 3845cdc4837e7f0ede12b9de0906b7d01899fc00 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Tue Aug 13 12:09:09 2024 +0800 fix online doc search issue (#1975) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> commit 7056720df96f17c706522bc6b0530df534d22ee7 Author: chen, suyue <suyue.chen@intel.com> Date: Sun Aug 11 20:58:34 2024 +0800 update main page (#1973) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 95197d1697e19323b124c2a32bdef7425d4d1c3e Author: xinhe <xin3.he@intel.com> Date: Sat Aug 10 23:28:43 2024 +0800 Cherry pick v1.17.0 (#1964) * [SW-184941] INC CI, CD and Promotion Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52 * [SW-183320]updated setup.py Change-Id: I592af89486cb1d9e0b5197521c428920197a9103 * [SW-177474] add HQT FP8 porting code Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-189361] Fix white list extend Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3 * [SW-191317] Raise exception according to hqt config object Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a * [SW-184714] Port HQT code into INC HQT lib content was copied as is under fp8_quant Tests were copied to 3.x torch location Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9 * [SW-184714] Add internal folder to fp8 quant This is a folder used for experiments, not to be used by users Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc * [SW-177468] Removed unused code + cleanup Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638 * Fix errors in regression_detection Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438 * [SW-187731] Save orig module as member of patched module This allows direct usage of the original module methods, which solves torch compile issue Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632 * [SW-190899] Install packages according to configuration Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525 * [SW-184689] use finalize_calibration intrenaly for one step flow Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89 * [SW-191945] align requirement_pt.txt in gerrit INC with Github INC Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192358] Remove HQT reference in INC Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e * [SW-191415] update fp8 maxAbs observer using torch.copy_ Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f * [SW-184943] Enhance INC WOQ model loading - Support loading huggingface WOQ model - Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses - Load woq linear weight module by module - Save hpu format tensor to reuse it once load it again Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf * [SW-190303] Implement HPUWeightOnlyLinear class in INC Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41 * [SW-192809] fix json_file bug when instantiating FP8Config class Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-192931] align setup.py with github INC and remove fp8_convert Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192917] Update all HQT logic files with pre-commit check Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42 Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update docstring Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * add fp8 example and document (#1639) Signed-off-by: xinhe3 <xinhe3@hababa.ai> * Update settings to be compatible with gerrit * enhance ut Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * move fp8 sample to helloworld folder Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * update torch version of habana docker Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update readme demo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update WeightOnlyLinear to INCWeightOnlyLinear Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for FP8Config Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix pylint Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scripts Signed-off-by: chensuyue <suyue.chen@intel.com> * delete deps Signed-off-by: chensuyue <suyue.chen@intel.com> * update container into v1.17.0 Signed-off-by: chensuyue <suyue.chen@intel.com> * update docker version Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update pt ut Signed-off-by: chensuyue <suyue.chen@intel.com> * add lib path Signed-off-by: chensuyue <suyue.chen@intel.com> * fix dir issue Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * fix typo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * update pre-commit-ci Signed-off-by: chensuyue <suyue.chen@intel.com> * work around for hpu Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix UT Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix parameter Signed-off-by: chensuyue <suyue.chen@intel.com> * omit some test Signed-off-by: chensuyue <suyue.chen@intel.com> * update main page example to llm loading Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autotune Signed-off-by: xinhe3 <xinhe3@hababa.ai> --------- Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> Signed-off-by: xinhe3 <xinhe3@hababa.ai> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: smarkovichgolan <smarkovich@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> commit de0fa21cd9d6291b521281b2b5fc8f6519cb84ae Author: Huang, Tai <tai.huang@intel.com> Date: Fri Aug 9 22:32:37 2024 +0800 Fix broken link in docs (#1969) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 385da7c7ed018a66fcba6e28658d1a5eea2e52e4 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Aug 9 21:53:51 2024 +0800 Add 3.x readme (#1971) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit acd8f4f182eaccf03b221f765ec0ddb451be3415 Author: Huang, Tai <tai.huang@intel.com> Date: Fri Aug 9 15:24:14 2024 +0800 Add version mapping between INC and Gaudi SW Stack (#1967) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 74a4641390b4d8c11dc66ff8ef40df92c298b996 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Aug 9 10:23:59 2024 +0800 remove unnecessary CI (#1966) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit b99abae5d937380cf9df80c9050fce18bddfb72d Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Aug 6 16:02:03 2024 +0800 Fix `opt_125m_woq_gptq_int4_dq_ggml` issue (#1965) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit b35ff8f0044bdf12da87647d0404b62ae5ff7d3d Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Aug 2 09:06:35 2024 +0800 example update for 3.x ipex sq (#1902) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit 000946fce147a02ad6662538e337570c0a56329d Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Thu Aug 1 10:19:32 2024 +0800 add SDXL model example to INC 3.x (#1887) * add SDXL model example to INC 3.x Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add evaluation script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * add test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_quant.sh * add iter limit Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * modify test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update json Signed-off-by: chensuyue <suyue.chen@intel.com> * add requirements Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_benchmark.sh * Update sdxl_smooth_quant.py * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> commit aa42e5edcd0b5196a21ee7bb68a7965125601fea Author: xinhe <xin3.he@intel.com> Date: Wed Jul 31 15:36:06 2024 +0800 replenish docstring (#1955) * replenish docstring Signed-off-by: xin3he <xin3.he@intel.com> * update Quantizer API docstring Signed-off-by: xin3he <xin3.he@intel.com> * Add docstring for auto accelerator (#1956) Signed-off-by: yiliu30 <yi4.liu@intel.com> * temporary remove torch/quantization and add it back after fp8 code is updated. * Update config.py --------- Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com> commit 81a076d7c59609be666ddddf64a574cacf1a5c36 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Wed Jul 31 13:51:33 2024 +0800 fix welcome.html link issue (#1962) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> commit 87f02c15a2f1047a8b4bcb5b7f443a4cecb4dfc7 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 31 10:09:47 2024 +0800 fix docs link (#1959) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 03813e2894871fce7a95fb4ee584aab6c5bb18f7 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Jul 31 10:09:29 2024 +0800 Bump tensorflow version (#1961) Signed-off-by: dependabot[bot] <support@github.com> commit 3b5dbf681d8e9beb47eb0d1be4c5a58f4018d42a Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 30 17:27:21 2024 +0800 Set low_gpu_mem_usage=False for AutoRound Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 41244d3bc65fd646d2d6d88ca2c6686f2ab65bc6 Author: chen, suyue <suyue.chen@intel.com> Date: Mon Jul 29 23:05:36 2024 +0800 new previous results could not find all raise issues in CI model test (#1958) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 190e6b2be6b31158a1101729bcf621bc93e85531 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Jul 29 19:39:57 2024 +0800 Fix itrex qbits nf4/int8 training core dumped issue (#1954) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> commit 0e724a4d96ca0d6a170281688ca644b37fa340e0 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Jul 29 16:22:13 2024 +0800 Add save/load for pt2e example (#1927) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 50eb6fb6f5924054b38d8ed99e78e0ebdab51f50 Author: chen, suyue <suyue.chen@intel.com> Date: Mon Jul 29 13:40:36 2024 +0800 update 3x torch installation (#1957) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 6e1b1da712d20d9291e5932974bc3167b00dd214 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jul 26 15:58:00 2024 +0800 add ipex xpu example to 3x API (#1948) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit 19024b351372ca76934db33b0d230552c13bff39 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jul 26 14:52:01 2024 +0800 Enable yolov5 Example for TF 3x API (#1943) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit d84a93f7db8eeb69099aa332a4c01a743c9f4090 Author: zehao-intel <zehao.huang@intel.com> Date: Thu Jul 25 14:45:19 2024 +0800 Complement UT of calibration function for TF 3x API (#1945) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit fb8577931c11c3bdc55868e01576b73372d9912b Author: zehao-intel <zehao.huang@intel.com> Date: Thu Jul 25 14:04:25 2024 +0800 Update Examples for TF 3x API (#1901) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 6b30207d0a3b6d6d497ecf8f6bb5891765d798ba Author: zehao-intel <zehao.huang@intel.com> Date: Thu Jul 25 13:39:06 2024 +0800 Add Docstring for TF 3x API and Torch 3x Mixed Precision (#1944) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit d254d508be9c6b14c474fd643ad448a4e261ca72 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jul 24 21:50:44 2024 +0800 Update doc for client-usage and LWQ (#1947) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit f253d35a152b8003cfc8738fd3c6db7930149264 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Wed Jul 24 17:48:05 2024 +0800 Update publish.yml (#1950) commit 6cda338a042073aba61ba411a6fc563fc8731889 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Wed Jul 24 17:31:19 2024 +0800 Update publish.yml (#1949) * Update publish.yml * Update publish.yml commit c80b68afdba7a55b19898b1b9ff3e21d18b57427 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 23 21:26:53 2024 +0800 Update AutoRound commit version (#1941) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 9077b382259e2e56ff5796084a1f4275e4387537 Author: zehao-intel <zehao.huang@intel.com> Date: Tue Jul 23 17:04:37 2024 +0800 Refine Pytorch 3x Mixed Precision Example (#1946) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit efcb2930be6b9d575b1fb8a6e86afdd6a09b5857 Author: Neo Zhang Jianyu <jianyu.zhang@intel.com> Date: Tue Jul 23 10:15:41 2024 +0800 Update for API 3.0 online doc (#1940) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> commit b787940ea2868e1fc8a56a81b94d62d4ea3d8454 Author: Wang, Mengni <mengni.wang@intel.com> Date: Tue Jul 23 10:12:34 2024 +0800 add docstring for mx quant (#1932) Signed-off-by: Mengni Wang <mengni.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> commit 0c52e1243b78734e95fc348834303bc3c3cfe369 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 23 09:59:17 2024 +0800 Add docstring for WOQ&LayerWise (#1938) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> commit 08914d6b0e365212fee6016d03dcdc087bd7e441 Author: Huang, Tai <tai.huang@intel.com> Date: Mon Jul 22 11:14:44 2024 +0800 add read permission token (#1942) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit e106dea73471ddecdb1cfc702e90fcb1a5d41452 Author: zehao-intel <zehao.huang@intel.com> Date: Sun Jul 21 21:48:51 2024 +0800 Update Example for Pytorch 3x Mixed Precision (#1882) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 1ebf6987bd054b926d3cdd5630ae058c8d3a66c2 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jul 19 15:56:09 2024 +0800 add docstring for static quant and smooth quant (#1936) * add docstring for static quant and smooth quant Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * format fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update scan path Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update utility.py --------- Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit 296c5d4f1138e5bf33584fb75cea0f6ca5080122 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Fri Jul 19 15:08:05 2024 +0800 Add docstring for PT2E and HQQ (#1937) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 437c8e75706cff1767dcde115e428654766b3f18 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Thu Jul 18 10:00:41 2024 +0800 Fix unused pkgs import (#1931) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit ff3740146a829e845d79266acf233b202843d3fd Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 17 23:11:15 2024 +0800 3.X API installation update (#1935) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 6c27c19c3ec7a318455bd12d6e66ad9bb757ab93 Author: zehao-intel <zehao.huang@intel.com> Date: Wed Jul 17 20:35:42 2024 +0800 Support calib_func on TF 3x API (#1934) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 53e6ee6b75d476bae0382c7d6fb9aa1348c2ab5e Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Wed Jul 17 20:35:03 2024 +0800 Support xpu for ipex static quant (#1916) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> commit a1cc618df6efa823bb1834ff2f8be83531f91178 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 17 17:29:49 2024 +0800 remove peft version limit (#1933) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 30583882df76838ea3e4a719e25ddca7bb449b9b Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jul 17 15:31:38 2024 +0800 Add doc for client usage (#1914) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 29471df05a9e2c36c4ad8083c0b0b285011748d8 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jul 17 12:12:40 2024 +0800 Enhance load_empty_model import (#1930) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit fd96851f7f8339ec8bfabd602cf494ac6c31d17b Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jul 17 12:05:32 2024 +0800 Integrate AutoRound v0.3 to 2x (#1926) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit bfa27e422dc4760f6a9b1783eee7dae10fe5324f Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jul 17 09:33:13 2024 +0800 Integrate AutoRound v0.3 (#1925) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 5767aed4dbc9a400f65f74bdc9c09209f0a4c145 Author: xinhe <xin3.he@intel.com> Date: Wed Jul 17 09:16:37 2024 +0800 add docstring for torch.quantization and torch.utils (#1928) Signed-off-by: xin3he <xin3.he@intel.com> commit f909bca86cfe7881119b62c4e75ca1f330718764 Author: chen, suyue <suyue.chen@intel.com> Date: Tue Jul 16 21:12:54 2024 +0800 update itrex ut test (#1929) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 649e6b148755bda737009bc323b735b92231c579 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 16 21:05:55 2024 +0800 Support LayerWise for RTN/GPTQ (#1883) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> commit de43d851a24a5f4290fe148f7d3607cad6d8433f Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 16 17:18:12 2024 +0800 Support absorb dict for awq (#1920) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit e9765955f991e1270e3b65635285f6b6cb8fc38c Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jul 16 17:17:56 2024 +0800 Support woq Autotune (#1921) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit d56075c7e9f6e3e85385abbff9f1b0d07d157a04 Author: Huang, Tai <tai.huang@intel.com> Date: Tue Jul 16 15:21:06 2024 +0800 fix typo in architecture diagram (#1924) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 0a542397ac1ea8d6fe2edf04565d3cb673001b2c Author: chen, suyue <suyue.chen@intel.com> Date: Tue Jul 16 15:12:43 2024 +0800 update documentation for 3x API (#1923) Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> commit be42d033b25c6dd3bcac0ead964699f25f939014 Author: xinhe <xin3.he@intel.com> Date: Tue Jul 16 09:48:48 2024 +0800 implement TorchBaseConfig (#1911) Signed-off-by: xin3he <xin3.he@intel.com> commit 7a4715c1d488441e383b7c999fd1b574a3f6ceda Author: Kaihui-intel <kaihui.tang@intel.com> Date: Mon Jul 15 14:59:03 2024 +0800 Support PT2E save and load (#1918) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 34f0a9f450b385aa3227f7f34e8d0f16460080a9 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Mon Jul 15 09:10:14 2024 +0800 Add `save`/`load` support for HQQ (#1913) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit d3204604aad007f3db67c46dcb0575aa8f5cd584 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Fri Jul 12 14:48:12 2024 +0800 remove 1x docs (#1900) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 6c547f7c4cd71342e28a1b23f827a6aa7aa91bb8 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Jul 12 14:42:04 2024 +0800 fix CI docker container clean up issue (#1917) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 17036587d84d2b42e0e9eb501d175e78d552c063 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Jul 12 11:14:48 2024 +0800 Remove deprecated modules (#1872) Signed-off-by: chensuyue <suyue.chen@intel.com> commit f698c96c817c56292a66aee07b3e1396e074b966 Author: chen, suyue <suyue.chen@intel.com> Date: Thu Jul 11 18:00:28 2024 +0800 update Gaudi CI baseline artifacts name (#1912) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 4a45093c1418f34da2660a54052a2ff5c2b4edff Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jul 11 17:47:47 2024 +0800 Add export support for TEQ (#1910) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 16a7b11508c008d4d4180a0fe0e31c75b8e5d662 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jul 11 17:13:24 2024 +0800 Get default config based on the auto-detect CPU type (#1904) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 2fc72555c987dc7bce8476b389720e1a29159a43 Author: xinhe <xin3.he@intel.com> Date: Thu Jul 11 13:22:52 2024 +0800 implement `incbench` command for ease-of-use benchmark (#1884) implement incbench command as entrypoint for ease-of-use benchmark automatically check numa/socket info and dump it with table for ease-of-understand supports both Linux and Windows platform add benchmark documents dump benchmark summary add benchmark UTs incbench main.py: run 1 instance on NUMA:0. incbench --num_i 2 main.py: run 2 instances on NUMA:0. incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0. incbench -C 24-47 main.py: run 1 instance on COREs:24-47. incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit de8577ef5874f85d39c3b08f63c98f22c3ce25c6 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 10 17:21:45 2024 +0800 bump version into 3.0 (#1908) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 01f16c4e816fec9d05d34f9d2bd7e425a59b803c Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 10 17:19:57 2024 +0800 support habana fp8 UT test in CI (#1909) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 28578b96bf6217fa2b79699838e5a4af30843de4 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jul 10 13:19:27 2024 +0800 Add docstring for `common` module (#1905) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 5fde50f2c0476dbc08d59481b742515f5a210de1 Author: Wang, Chang <chang1.wang@intel.com> Date: Wed Jul 10 10:34:46 2024 +0800 update fp4_e2m1 mapping list (#1906) * update fp4_e2m1 mapping list * Update utility.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> commit 3fe2fd9aadda4991552d65fef09a75ba5127b5db Author: xinhe <xin3.he@intel.com> Date: Tue Jul 9 15:01:25 2024 +0800 fix bf16 symbolic_trace bug (#1892) Description: fix bf16 symbolic_trace bug, - cause abnormal recursive calling. - missing necessary attributes - By moving BF16 fallback ahead of quantization and removing bf16_symbolic_trace, we fix it. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit e080e06d38447d2ab869fe8265a04e464a732057 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Tue Jul 9 11:04:30 2024 +0800 remove neural insight CI (#1903) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit f28fcee6cc7bd6b3e1642157744f38686b1b9a91 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Fri Jul 5 15:47:37 2024 +0800 Remove 1x API (#1865) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit 1386ac5ec7be40608dfac082d2275307b8e4d14e Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jul 4 12:18:03 2024 +0800 Port auto-detect absorb layers for TEQ (#1895) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 856118e36f7670136c8d83dfbc232010d13d72a6 Author: Wang, Chang <chang1.wang@intel.com> Date: Wed Jul 3 13:50:00 2024 +0800 remove import pdb (#1897) Signed-off-by: changwangss <chang1.wang@intel.com> commit f75ff4082bc7a22d9367d3e91a3ea2c7aaec2bd2 Author: xinhe <xin3.he@intel.com> Date: Wed Jul 3 13:07:48 2024 +0800 support auto_host2device on RTN and GPTQ(#1894) Signed-off-by: He, Xin3 <xin3.he@intel.com> commit b9e73f5cf34f824a9b84d74f725c6157dc6430a2 Author: chen, suyue <suyue.chen@intel.com> Date: Wed Jul 3 11:10:45 2024 +0800 tmp fix nas deps issue (#1896) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 63b29126b7c1958939af388d48e56fcceb85db6f Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Tue Jul 2 14:46:02 2024 +0800 Refine HQQ UTs (#1888) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 5592acc60562b7fccb308af0eaaba9cad53004a5 Author: zehao-intel <zehao.huang@intel.com> Date: Tue Jul 2 14:18:51 2024 +0800 Remove Gelu Fusion for TF Newapi (#1886) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 4372a762585189accc65196e081a0a7a85f5af9e Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Jun 28 14:55:10 2024 +0800 Fix sql injection for Neural Solution gRPC (#1879) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 4ae2e87d2f98eb34c2e523a76ffa6ff77bf767e1 Author: xinhe <xin3.he@intel.com> Date: Thu Jun 27 09:56:52 2024 +0800 support quant_lm_head arg in all WOQ configs (#1881) Signed-off-by: xin3he <xin3.he@intel.com> commit cc763f5134f5f84b3020a8ea1bee409a60d15218 Author: Dina Suehiro Jones <dina.s.jones@intel.com> Date: Wed Jun 26 18:29:06 2024 -0700 Update the Gaudi container example in the README (#1885) commit 1f58f024d812b6c1f7f3430b62e61051599cd1b2 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Thu Jun 20 22:03:45 2024 +0800 Add `set_local` support for static quant with pt2e (#1870) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 0341295de95dce5d5c775fdba78de85e3d3a041d Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jun 19 09:40:11 2024 +0800 rm cov (#1878) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 503d9ef4136023f1952e397a2ab0f7f476040901 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jun 18 17:12:12 2024 +0800 Add op statistics dump for woq (#1876) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit 5a0374e7db23cac209af78f1ace9b38d23bebbb0 Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Tue Jun 18 16:21:05 2024 +0800 Enhance autotune to return the best `q_model` directly (#1875) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 90fb43135397a035968b5334eba21931c18a83c0 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Tue Jun 18 16:06:04 2024 +0800 fix layer match (#1873) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> commit f4eb66073fc2c3f13d624c31056d94f2b6735076 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Mon Jun 17 16:12:06 2024 +0800 Limit numpy versions (#1874) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> commit 2928d856336d3cd2db9068950e054ce4f7c7bbe0 Author: chen, suyue <suyue.chen@intel.com> Date: Fri Jun 14 21:51:13 2024 +0800 update v2.6 release readme (#1871) Signed-off-by: chensuyue <suyue.chen@intel.com> commit 48c5e3a9c22b8f16446a6849d63fed0cdf4a0a7a Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Jun 14 21:10:14 2024 +0800 Modify WOQ examples structure (#1866) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> commit 498af747839af0f54e8b1e946ac20fb52b0fbb89 Author: Sun, Xuehao <xuehao.sun@intel.com> Date: Fri Jun 14 21:09:36 2024 +0800 Update SQ/WOQ status (#1869) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> commit b401b02db2cc7d7f4f8412a815fa435e66e330a0 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Fri Jun 14 17:48:03 2024 +0800 Add PT2E cv&llm example (#1853) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit e470f6cdfbbad32fcf17be56903e649a05059780 Author: xinhe <xin3.he@intel.com> Date: Fri Jun 14 17:34:26 2024 +0800 [3x] add recommendation examples (#1844) Signed-off-by: xin3he <xin3.he@intel.com> commit a1415128a8d63af7e1d2798521f11b137eccec81 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jun 14 14:56:30 2024 +0800 Improve UT Branch Coverage for TF 3x (#1867) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit b99a79d029e8010d234d3b4259994e598bec1a06 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jun 14 14:10:49 2024 +0800 modify 3.x ipex example structure (#1858) * modify 3.x ipex example structure Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add json path Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix for sq Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * remove old files Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix act_algo Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Co-authored-by: xinhe <xin3.he@intel.com> commit 922b2471e617cc4c56376866e991302d0beb0640 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jun 14 12:33:39 2024 +0800 Add TF 3x Examples (#1839) Signed-off-by: zehao-intel <zehao.huang@intel.com> commit 70a1d501fdfee16a10e34385bca9f15eba4366b4 Author: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Date: Fri Jun 14 10:17:33 2024 +0800 fix 3x ipex static quant regression (#1864) Description fix 3x ipex static quant regression cannot fallback with op type name ('linear') dump wrong op stats (no 'Linear&relu' op type) --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> commit 4e45f8f68bf126ca0c9dd655fce03b21a93ec151 Author: zehao-intel <zehao.huang@intel.com> Date: Fri Jun 14 10:04:11 2024 +0800 Improve UT Coverage for TF 3x (#1852) Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> commit 794b2762c0bb2f076973e1fca5fdecd23efec774 Author: xinhe <xin3.he@intel.com> Date: Thu Jun 13 18:02:04 2024 +0800 migrate export to 2x and 3x from deprecated (#1845) Signed-off-by: xin3he <xin3.he@intel.com> commit 0eced1478c6796a5e2dcb254a65bbc96af4d1b8b Author: yuwenzho <yuwen.zhou@intel.com> Date: Wed Jun 12 18:49:17 2024 -0700 Enhance INC WOQ model loading & support Huggingface WOQ model loading (#1826) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> commit 6733dabc4d48a6625e184e4a29a754949f415097 Author: Wang, Mengni <mengni.wang@intel.com> Date: Wed Jun 12 17:08:31 2024 +0800 update mx script (#1838) Signed-off-by: Mengni Wang <mengni.wang@intel.com> commit a0dee94dab0920ba30de049e871b19a72ddb8996 Author: Wang, Chang <chang1.wang@intel.com> Date: Wed Jun 12 15:01:25 2024 +0800 Remove export_compressed_model in AWQConfig (#1831) commit 2c3556d441de2f0963167db71ecdee7353bd76bb Author: Huang, Tai <tai.huang@intel.com> Date: Wed Jun 12 14:46:14 2024 +0800 Add 3x architecture diagram (#1849) Signed-off-by: Huang, Tai <tai.huang@intel.com> commit 0e2cade66f8c3951e6ce7de226421f6700d2ad85 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Jun 12 14:20:06 2024 +0800 Bump braces from 3.0.2 to 3.0.3 in /neural_insights/gui (#1862) Signed-off-by: dependabot[bot] <support@github.com> commit 5b5579bf953cb24607dc18b3a01ffe1071c3b604 Author: Kaihui-intel <kaihui.tang@intel.com> Date: Wed Jun 12 14:12:00 2024 +0800 Fix Neural Solution security issue (#1856) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> commit e9cb48c9462fdc671c523f93611b30b41b6cff98 Author: xinhe <xin3.he@intel.com> Date: Wed Jun 12 11:19:47 2024 +0800 improve UT coverage of PT Utils and Quantization (#1842) * update UTs --------- Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> commit 6b2738390dfdab543de1ccd9242fe541c78b6a2e Author: Yi Liu <106061964+yiliu30@users.noreply.github.com> Date: Wed Jun 12 11:11:50 2024 +0800 Fix config expansion with empty options (#1861) Signed-off-by: yiliu30 <yi4.liu@intel.com> commit 25c71aad5a55210d87d371257344f21762e3bb0e Author: WenjiaoYue <wenjiao.yue@intel.com> Date: Tue Jun 11 17:54:31 2024 +0800 Delete the static resources of the JupyterLab extension after packaging (#1860) Signed-off-by: Yue, Wenjiao <wenjiao.yue@intel.com> commit 455f1e1f0f0284e87b46d257b6d126ca76fe1748 Author: Wang, Mengni <mengni.wang@intel.com> Date: Tue Jun 11 15:28:40 2024 +0800 Add UT and remove unused code for torch MX quant (#1854) * Add UT and remove unused code for torch MX quant --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> Change-Id: I543550ffcc16143d3e612fac2f9ea3a31a1143e1

* modify 3.x ipex example structure (#1858) * modify 3.x ipex example structure Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add json path Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix for sq Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * Update run_clm_no_trainer.py * minor fix Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * remove old files Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * fix act_algo Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Co-authored-by: xinhe <xin3.he@intel.com> * Improve UT Branch Coverage for TF 3x (#1867) Signed-off-by: zehao-intel <zehao.huang@intel.com> * [3x] add recommendation examples (#1844) Signed-off-by: xin3he <xin3.he@intel.com> * Add PT2E cv&llm example (#1853) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Update SQ/WOQ status (#1869) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * Modify WOQ examples structure (#1866) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> * update v2.6 release readme (#1871) Signed-off-by: chensuyue <suyue.chen@intel.com> * Limit numpy versions (#1874) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix layer match (#1873) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * Enhance autotune to return the best `q_model` directly (#1875) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add op statistics dump for woq (#1876) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * rm cov (#1878) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add `set_local` support for static quant with pt2e (#1870) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Update the Gaudi container example in the README (#1885) * support quant_lm_head arg in all WOQ configs (#1881) Signed-off-by: xin3he <xin3.he@intel.com> * Fix sql injection for Neural Solution gRPC (#1879) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Remove Gelu Fusion for TF Newapi (#1886) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Refine HQQ UTs (#1888) Signed-off-by: yiliu30 <yi4.liu@intel.com> * tmp fix nas deps issue (#1896) Signed-off-by: chensuyue <suyue.chen@intel.com> * support auto_host2device on RTN and GPTQ(#1894) Signed-off-by: He, Xin3 <xin3.he@intel.com> * remove import pdb (#1897) Signed-off-by: changwangss <chang1.wang@intel.com> * Port auto-detect absorb layers for TEQ (#1895) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Remove 1x API (#1865) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * remove neural insight CI (#1903) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix bf16 symbolic_trace bug (#1892) Description: fix bf16 symbolic_trace bug, - cause abnormal recursive calling. - missing necessary attributes - By moving BF16 fallback ahead of quantization and removing bf16_symbolic_trace, we fix it. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * update fp4_e2m1 mapping list (#1906) * update fp4_e2m1 mapping list * Update utility.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Add docstring for `common` module (#1905) Signed-off-by: yiliu30 <yi4.liu@intel.com> * support habana fp8 UT test in CI (#1909) Signed-off-by: chensuyue <suyue.chen@intel.com> * bump version into 3.0 (#1908) Signed-off-by: chensuyue <suyue.chen@intel.com> * implement `incbench` command for ease-of-use benchmark (#1884) # Description implement incbench command as entrypoint for ease-of-use benchmark automatically check numa/socket info and dump it with table for ease-of-understand supports both Linux and Windows platform add benchmark documents dump benchmark summary add benchmark UTs # General Use Cases incbench main.py: run 1 instance on NUMA:0. incbench --num_i 2 main.py: run 2 instances on NUMA:0. incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0. incbench -C 24-47 main.py: run 1 instance on COREs:24-47. incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47. --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * Get default config based on the auto-detect CPU type (#1904) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add export support for TEQ (#1910) Signed-off-by: yiliu30 <yi4.liu@intel.com> * update Gaudi CI baseline artifacts name (#1912) Signed-off-by: chensuyue <suyue.chen@intel.com> * Remove deprecated modules (#1872) Signed-off-by: chensuyue <suyue.chen@intel.com> * fix CI docker container clean up issue (#1917) Signed-off-by: chensuyue <suyue.chen@intel.com> * remove 1x docs (#1900) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add `save`/`load` support for HQQ (#1913) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * Support PT2E save and load (#1918) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * implement TorchBaseConfig (#1911) Signed-off-by: xin3he <xin3.he@intel.com> * update documentation for 3x API (#1923) Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix typo in architecture diagram (#1924) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Support woq Autotune (#1921) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support absorb dict for awq (#1920) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support LayerWise for RTN/GPTQ (#1883) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> * update itrex ut test (#1929) Signed-off-by: chensuyue <suyue.chen@intel.com> * add docstring for torch.quantization and torch.utils (#1928) Signed-off-by: xin3he <xin3.he@intel.com> * Integrate AutoRound v0.3 (#1925) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Integrate AutoRound v0.3 to 2x (#1926) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Enhance load_empty_model import (#1930) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add doc for client usage (#1914) Signed-off-by: yiliu30 <yi4.liu@intel.com> * remove peft version limit (#1933) Signed-off-by: chensuyue <suyue.chen@intel.com> * Support xpu for ipex static quant (#1916) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Support calib_func on TF 3x API (#1934) Signed-off-by: zehao-intel <zehao.huang@intel.com> * 3.X API installation update (#1935) Signed-off-by: chensuyue <suyue.chen@intel.com> * Fix unused pkgs import (#1931) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add docstring for PT2E and HQQ (#1937) Signed-off-by: yiliu30 <yi4.liu@intel.com> * add docstring for static quant and smooth quant (#1936) * add docstring for static quant and smooth quant Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * format fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update scan path Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update utility.py --------- Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update Example for Pytorch 3x Mixed Precision (#1882) Signed-off-by: zehao-intel <zehao.huang@intel.com> * add read permission token (#1942) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Add docstring for WOQ&LayerWise (#1938) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> * add docstring for mx quant (#1932) Signed-off-by: Mengni Wang <mengni.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> * Update for API 3.0 online doc (#1940) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> * Refine Pytorch 3x Mixed Precision Example (#1946) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Update AutoRound commit version (#1941) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Update publish.yml (#1949) * Update publish.yml * Update publish.yml * Update publish.yml (#1950) * Update doc for client-usage and LWQ (#1947) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Add Docstring for TF 3x API and Torch 3x Mixed Precision (#1944) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Update Examples for TF 3x API (#1901) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Complement UT of calibration function for TF 3x API (#1945) Signed-off-by: zehao-intel <zehao.huang@intel.com> * Enable yolov5 Example for TF 3x API (#1943) Signed-off-by: zehao-intel <zehao.huang@intel.com> * add ipex xpu example to 3x API (#1948) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update 3x torch installation (#1957) Signed-off-by: chensuyue <suyue.chen@intel.com> * Add save/load for pt2e example (#1927) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Fix itrex qbits nf4/int8 training core dumped issue (#1954) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> * new previous results could not find all raise issues in CI model test (#1958) Signed-off-by: chensuyue <suyue.chen@intel.com> * Set low_gpu_mem_usage=False for AutoRound Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Bump tensorflow version (#1961) Signed-off-by: dependabot[bot] <support@github.com> * fix docs link (#1959) Signed-off-by: chensuyue <suyue.chen@intel.com> * fix welcome.html link issue (#1962) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> * replenish docstring (#1955) * replenish docstring Signed-off-by: xin3he <xin3.he@intel.com> * update Quantizer API docstring Signed-off-by: xin3he <xin3.he@intel.com> * Add docstring for auto accelerator (#1956) Signed-off-by: yiliu30 <yi4.liu@intel.com> * temporary remove torch/quantization and add it back after fp8 code is updated. * Update config.py --------- Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com> * add SDXL model example to INC 3.x (#1887) * add SDXL model example to INC 3.x Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> * add evaluation script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * add test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_quant.sh * add iter limit Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * modify test script Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * update json Signed-off-by: chensuyue <suyue.chen@intel.com> * add requirements Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Update run_benchmark.sh * Update sdxl_smooth_quant.py * minor fix Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: chensuyue <suyue.chen@intel.com> * example update for 3.x ipex sq (#1902) Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> * Fix `opt_125m_woq_gptq_int4_dq_ggml` issue (#1965) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * remove unnecessary CI (#1966) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Add version mapping between INC and Gaudi SW Stack (#1967) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Add 3.x readme (#1971) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Fix broken link in docs (#1969) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Cherry pick v1.17.0 (#1964) * [SW-184941] INC CI, CD and Promotion Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52 * [SW-183320]updated setup.py Change-Id: I592af89486cb1d9e0b5197521c428920197a9103 * [SW-177474] add HQT FP8 porting code Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-189361] Fix white list extend Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3 * [SW-191317] Raise exception according to hqt config object Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a * [SW-184714] Port HQT code into INC HQT lib content was copied as is under fp8_quant Tests were copied to 3.x torch location Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9 * [SW-184714] Add internal folder to fp8 quant This is a folder used for experiments, not to be used by users Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc * [SW-177468] Removed unused code + cleanup Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638 * Fix errors in regression_detection Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438 * [SW-187731] Save orig module as member of patched module This allows direct usage of the original module methods, which solves torch compile issue Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632 * [SW-190899] Install packages according to configuration Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525 * [SW-184689] use finalize_calibration intrenaly for one step flow Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89 * [SW-191945] align requirement_pt.txt in gerrit INC with Github INC Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192358] Remove HQT reference in INC Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e * [SW-191415] update fp8 maxAbs observer using torch.copy_ Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f * [SW-184943] Enhance INC WOQ model loading - Support loading huggingface WOQ model - Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses - Load woq linear weight module by module - Save hpu format tensor to reuse it once load it again Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf * [SW-190303] Implement HPUWeightOnlyLinear class in INC Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41 * [SW-192809] fix json_file bug when instantiating FP8Config class Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> * [SW-192931] align setup.py with github INC and remove fp8_convert Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [SW-192917] Update all HQT logic files with pre-commit check Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42 Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update docstring Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * add fp8 example and document (#1639) Signed-off-by: xinhe3 <xinhe3@hababa.ai> * Update settings to be compatible with gerrit * enhance ut Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * move fp8 sample to helloworld folder Signed-off-by: yuwenzho <yuwen.zhou@intel.com> * update torch version of habana docker Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update readme demo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update WeightOnlyLinear to INCWeightOnlyLinear Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for FP8Config Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix pylint Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scripts Signed-off-by: chensuyue <suyue.chen@intel.com> * delete deps Signed-off-by: chensuyue <suyue.chen@intel.com> * update container into v1.17.0 Signed-off-by: chensuyue <suyue.chen@intel.com> * update docker version Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update pt ut Signed-off-by: chensuyue <suyue.chen@intel.com> * add lib path Signed-off-by: chensuyue <suyue.chen@intel.com> * fix dir issue Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * fix typo Signed-off-by: xinhe3 <xinhe3@hababa.ai> * update fp8 test scope Signed-off-by: chensuyue <suyue.chen@intel.com> * update pre-commit-ci Signed-off-by: chensuyue <suyue.chen@intel.com> * work around for hpu Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix UT Signed-off-by: xinhe3 <xinhe3@hababa.ai> * fix parameter Signed-off-by: chensuyue <suyue.chen@intel.com> * omit some test Signed-off-by: chensuyue <suyue.chen@intel.com> * update main page example to llm loading Signed-off-by: xinhe3 <xinhe3@hababa.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix autotune Signed-off-by: xinhe3 <xinhe3@hababa.ai> --------- Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> Signed-off-by: xinhe3 <xinhe3@hababa.ai> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: smarkovichgolan <smarkovich@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> * update main page (#1973) Signed-off-by: chensuyue <suyue.chen@intel.com> * fix online doc search issue (#1975) Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> * bump main version into v3.1 (#1974) Signed-off-by: chensuyue <suyue.chen@intel.com> * update readme for fp8 (#1979) Signed-off-by: xinhe3 <xinhe3@habana.ai> * Skip some tests for torch 2.4 (#1981) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Fix UT env and upgrade torch to 2.4.0 (#1978) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * support gptq `true_sequential` and `quant_lm_head` (#1977) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * update installation and ci test for 3x api (#1991) Signed-off-by: chensuyue <suyue.chen@intel.com> * add hasattr check for torch fp8 dtype (#1985) Signed-off-by: xin3he <xin3.he@intel.com> * add quantize, save, load function for transformers-like api (#1986) Signed-off-by: changwangss <chang1.wang@intel.com> * Update installation_guide.md (#1989) Correct typo in installation doc * update 3x pt binary build (#1992) Signed-off-by: chensuyue <suyue.chen@intel.com> * add per_channel_minmax (#1990) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Remove the save of gptq config (#1993) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add recent publications (#1995) * add recent publications Signed-off-by: Huang, Tai <tai.huang@intel.com> * update total count Signed-off-by: Huang, Tai <tai.huang@intel.com> --------- Signed-off-by: Huang, Tai <tai.huang@intel.com> * update docker image prune rules (#2003) Signed-off-by: chensuyue <suyue.chen@intel.com> * Support transformers-like api for woq quantization (#1987) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> * add INC_FORCE_DEVICE introduction (#1988) * add INC_FORCE_DEVICE introduction Signed-off-by: xin3he <xin3.he@intel.com> * Update PyTorch.md * Update PyTorch.md * Update docs/source/3x/PyTorch.md Co-authored-by: Yi Liu <yi4.liu@intel.com> * rename to INC_TARGET_DEVICE Signed-off-by: xin3he <xin3.he@intel.com> --------- Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: Yi Liu <yi4.liu@intel.com> * Replace FORCE_DEVICE with INC_TARGET_DEVICE [transformers] (#2005) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * enable auto_round format export (#2002) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * remove accelerate version in unit test (#2007) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * add repack_awq_to_optimum_format function (#1998) Signed-off-by: changwangss <chang1.wang@intel.com> * Update auto_round requirements for transformers example (#2013) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * add pad_to_buckets in evaluation for hpu performance (#2011) * add pad_to_buckets in evaluation for hpu performance --------- Signed-off-by: xin3he <xin3.he@intel.com> * Update model accuracy (#2006) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix xpu device set weight and bias (#2010) Signed-off-by: changwangss <chang1.wang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * Add transformers-like api doc (#2018) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Adapt transformers 4.45.1 (#2019) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: changwangss <chang1.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add autoround EMNLP24 to pub list (#2014) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Fix transformers rtn layer-wise quant (#2008) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Remove itrex dependency for 3x example (#2016) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * add transformers-like api link in readme (#2022) Signed-off-by: Huang, Tai <tai.huang@intel.com> * Add woq examples (#1982) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * remove ITREX unit test CI (#2021) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Support quant procedure on XPU (#2026) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support generation search for transformers examples (#2029) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Remove itrex dependency for 2x example (#2024) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update the PT2E CV example (#2032) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Cherry pick Habana software 1.18.0 update (#2025) Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> Co-authored-by: Danny <dsemiat@habana.ai> Co-authored-by: Tomer Gafni <tgafni@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com> Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update gaudi version mapping table for v3.1 (#2030) Signed-off-by: Huang, Tai <tai.huang@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> * fix broken link to FP8 example (#2034) Signed-off-by: Huang, Tai <tai.huang@intel.com> * add back missing image (#2035) Signed-off-by: xin3he <xin3.he@intel.com> * Add vlm examples, bugfix (#2012) * add VLM examples Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix, add utils Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docstring issues Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine examples Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix scan issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine shell Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine scripts & requirements Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * typofix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine docs Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * set attn_implementation for Phi3-vision Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine phi3 example Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix code coverage Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update config Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * refine shells, docs and example. enable qwen2-vl quantization Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix EOF error Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update qwen dir Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * refine shell, add llama3.2 inference to doc Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine eval shell Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * fix eval device issue Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * refine eval dtype Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * remove autoround limit (#2036) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Adapt autoround format (#2038) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * remove transformers import from utility (#2045) * remove transformers import from utility Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * bugfix Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixtypos Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> --------- Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add buckets setting for lm_eval (#2044) * add buckets setting for lm_eval Signed-off-by: xinhe3 <xinhe3@habana.ai> * clear graph cache to avoid OOM Signed-off-by: xinhe3 <xinhe3@habana.ai> --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * Enhance example for HPU performance (#2043) * Enhance example for HPU performance Signed-off-by: xinhe3 <xinhe3@habana.ai> * Update run_clm_no_trainer.py * remove wikitext to avoid oom for llama2-7b bs=8 * remove wikitext Signed-off-by: xinhe3 <xinhe3@habana.ai> --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * remove useless code in setup.py (#2046) * Update the default PT2E config (#2041) Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Support non-contiguous weight saving (#2049) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix GPTQ oom issue on HPU (#2042) * fix GPTQ oom issue on HPU Signed-off-by: xinhe3 <xinhe3@habana.ai> --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * fix bug and update readme (#2051) * fix bug and update readme --------- Signed-off-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> * Support safetensors loading for layerwise (#2047) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Enhance WOQ example Readme and help (#2053) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * improve optimum-habana available check (#2054) Signed-off-by: changwang <changwang@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fixed CI IPEX version (#2061) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Update torch config kwargs (#2055) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Support client `use_layer_wise` setting (#2048) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Check autoround before import it (#2062) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Delete fp8_quant/scripts/regression_detection directory (#2059) A missed change when cherry-picking Habana software 1.18.0 * Make PatchedVLLMKVCache resiliant to forward API changes (#2067) Change-Id: I33fad5c3e80e017099f300782809f24669765d42 Co-authored-by: Konrad Zawora <kzawora@habana.ai> * Fix glm-4-9b oom issue on BMG Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Update recipes & Bump version to 3.2 (#2037) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Docs: Add customer defined calibration and update docker run (#2057) Signed-off-by: fengding <feng1.ding@intel.com> * Adapt torch and ipex 2.5 (#2066) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> * Enhance `TBB` check (#2068) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Fix the PT2E UT (#2071) Signed-off-by: yiliu30 <yi4.liu@intel.com> * Support gptq layerwise on client (#2069) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Adapt autoround v0.4 (#2073) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Ensure that mul operators with shared initializer will not be absorbed in SmoothQuant (#2063) Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com> * Integrate AutoRound v0.4 [3x] (#2072) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Update CI framework versions and README badge for release 3.1.1 (#2058) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Remove the examples force required torch 1.13.1 (#2074) * remove alexnet_fashion_mnist notebook Signed-off-by: chensuyue <suyue.chen@intel.com> * remove rnnt in pytorch examples Signed-off-by: chensuyue <suyue.chen@intel.com> --------- Signed-off-by: chensuyue <suyue.chen@intel.com> * Fix truthfulqa task evaluation issue Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Add required library for ONNX example (#2078) * Add required library for ONNX example * Update requirements.txt * support autoround new API for VLM (#2075) Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * add import check (#2076) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * Update utility.py (#2079) * Add gptq known issue (#2080) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Fix sdxl `q_unet` config (#2081) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * Fixed the PT2E LLM example (#2082) Signed-off-by: yiliu30 <yi4.liu@intel.com> * fix dlrm when using incbench (#2084) Signed-off-by: Xin He <xinhe3@habana.ai> * add mapping for v3.2 (#2085) Signed-off-by: Huang, Tai <tai.huang@intel.com> * [SW-192753] unify StaticQuantConfig and FP8Config Change-Id: I2fe09ba4c575810a5b130268d63b9eee926bdf08 Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-200124] Set Scalar as default scale format + Compatibility check Set ScaleFormat.SCALAR as default value of 'scale_format' Add reduction of 'scale_format' to 'CONST' value if using a PCQ scale_format or fake_quant Add test to show Scalar models aren't giving wrong outputs Fix fakequant test as it is problematic use of 'hpu_initialize' and should be fixed in SW-202697 Change-Id: I43ff4900e9e02ce7f50edcdbb19a28f4f615ef9c Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-201679] support unit_scales for FuseMoE Change-Id: I02a63332bc09f1f6cdc3f133dd5f58829fcbad5a Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-203698] Add log for converting prepared model Change-Id: I1464f11bbab27d9041c9ba6f448e5ae6fa43bc2d Signed-off-by: Mengni Wang <mewang@habana.ai> * [SW-199737] Measurement dump improvements Add _validate_dump_path to make sure dump dir is writable and backup measurements Change-Id: Ib64abe772b4c309bbf04de89477cde92ea47ade4 * [SW-203452] Fixing and temp skipping G3 unittests Change-Id: Iafa4a6a8577724bd8a86581bfe38d3269dab2ea2 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-195965] [GPTQ] INC load model loads model in fp32 only Change-Id: I597d19273786c0c169ad952ebe5a357274e358dc Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-204016] Enable scale calculation with disk offload in INC -move calculating scales and quantization config info during the module patching loop as the weights there guaranteed to be on cpu. Change-Id: Ifb2de4e67c1b36c611dcc50b4cd14731b0336c50 * [SW-202614] Llama70b int4 gptq with INC load flow - getting host OOM Change-Id: Id1797371bb136502d89c4e8d17abcac1eaac4534 Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-199823] [HQT] fix INC one-step quantization API workflow 1. fix test_fp8_static_quant.py::TestFP8StaticQuant::test_one_step_quant_cv failure by deepcoping forward function in common.py 2. fix config.py: Object of type dict_keys is not JSON serializable by converting it to list 3. fix download issue of UT by using local tiny_gptj.json Change-Id: I2ad3eac411e8fca9d88a021f6a5b9594e6c75ae9 Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-202617] vllm mixtral MoE quant and measure using forward call Change-Id: I919f1e3597b6c95c3fc60db78ac9c0c06242b416 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-200092] Allow fsdpa and softmax to use scalar scales in INC Change-Id: Ieba4c74c18624fb0c5fce6321671d6f4eb2b8c93 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-205363] Update _load_state_dict_into_meta_model Update _load_state_dict_into_meta_model to compatible with Transformer 4.45 release Change-Id: Ib5d8ca777d38c7ae225b7174a886b333b6246ab1 Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-184948] INC Q/DQ optimization, included conv2d, kv_cache, fsdpa, softmax and other operators. Change-Id: I920f8ad85b3493f1bd4bbe770533343e214fc2d1 Signed-off-by: changwang <changwang@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-198585] Fix typo causing PatchedVLLMKVCache error Change-Id: Iafdcc935f702bc4756e2ba89935becb3bc47a728 * [SW-199208] QDQ Refactor for Registering Patched Modules, Scaling Methods, and Observers 1. Extension APIs - `PatchedModuleBase` , `register_patched_module` - `ScalingMethodBase`, `register_scaling_methods` - `ObserverBase` ``register_observer`, `register_module_config_for_observer` Related files: - fp8_quant/patched_module_base.py - fp8_quant/observer_base.py - fp8_quant/_core/measure.py - test_register_apis.py 2. Device-agnostic Patching - Replaced `hpu` with `cur_accelerator.name()` - Replaced `htcore.mark_step()` with `cur_accelerator.synchronize()` - Removed `torch.device("hpu")` under observers and scaling method - Updated `hpu_accelerator.synchronize()` to `htcore.mark_step()` + `torch.hpu.synchronize()` Change-Id: I83c6de928a991ed2c1b3b434d372f49e095c38d3 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Mengni Wang <mewang@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-203389] scalars scales doesn't provide dtype attribution Change-Id: I4e40dc9b2d9cb65bc9e49571cd57a9ab030f5d7b Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-199208] fix ModuleInfo conversion issue Change-Id: Ib6c35e1623dda3e470e569defccd607a18b43312 * [SW-200168] Enable working with G2 HW scales on G3 Change-Id: I17f71540eb78e828f01f1a11c8b233d60951293e Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-203389] fix get_scale_dtype to support PCQ scales Change-Id: I923ace405a0f751a2e5a0a3aadb7abbb401a6c44 * [SW-199719] reduce PCQ scales memory usage removed persistent full weight scales during PCQ quantization instead we are keeping only the input and output channels scales creating temporary full scale tensor on input quant Op call since the full scale tensor is the same size as the orig bf16 weight keeping all full scales persistently and the quntized weights will result a quantized model that uses more memory than the unquantized. Change-Id: Idc91c5ac8b9cfea2e2a3ad053cb4dc5464cff776 * [SW-206112] INC Q/DQ improvement - use Q/DQ ops Change-Id: Ib03ea8744aa2cca8b606754c45944840da1c3898 Signed-off-by: changwang <changwang@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-206693] Convert conv2d_fp8 params to list if necessary It's needed for the new approach to dynamic shapes in PT2.5. Change-Id: I8d5e620153970b210675459e3d6aecad8ca7cbde * [SW-207411] Add catch for OSError in _validate_dump_path Change-Id: I82bae184257f3da982877b3797f2ee8b40a573c8 * [SW-207328] remove accuracy check due to random issue Change-Id: Ifbd985c31c3755b6ab353ef8fa45e911dd75d688 Signed-off-by: xinhe3 <xinhe3@habana.ai> * [SW-207559] Folder layout refactoring and cleanup (phase 1) Change-Id: Ic9bffd2b7477d4530b4e2a5e411760a731efb84b Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-193262] INC multi device save/load CP design in fp8 (#5) Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-208521] one-step quantization got double memory usage (#3) * [SW-208521] one-step quantization got double memory usage Signed-off-by: Xin <xin3.he@intel.com> * [SW-208789] Support quantizing FP16 model to FP8 (#15) Since layer-wise is using memory mapping from disk, the model could be fp16 as it saved on disk, for example, llama2-7b. We need to add logic to support this case to make sure layer-wise works well. Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-205959] Update _load_state_dict_into_meta_model for model with bias (#7) Signed-off-by: Xin <xin3.he@intel.com> * [SW-208700] release bf16 model memory on HPU in one-step quantization (#14) Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-197077] refactoring maxabs scales and adding arbitrary scales. (#12) * [SW-197077] refactoring maxabs scales and adding arbitrary scales. Change-Id: I2c35cf925b6b21983f1770db7d35e14f3d7d3e47 * [SW-197077] refactoring scale: fix atol Change-Id: I1c99ddd9ade679286988e7d8a96338b32c0ddc07 * [SW-197077] adding arbitrary scales * Skip autoround test for HPU (#19) Change-Id: I6dc9724389c16a05252370b9e09a1db80bc8d696 Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> * [SW-199728] [DeepSpeed] Buffers initialized by model are not correct … (#16) * [SW-199728] [DeepSpeed] Buffers initialized by model are not correct after tensor parallel --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named… (#33) * [SW-209256] fix GPTQ oom issue on HPU (#2042) (#20) * fix GPTQ oom issue on HPU (#2042) --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: xinhe3 <xinhe3@habana.ai> * [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named 'safetensors' Signed-off-by: Xin <xin3.he@intel.com> --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> * [SW-207748] Support Auto-round on HPU (#25) Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> * [SW-209878] Increase threshold to avoid random error in test_layer_wise.py (#36) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * [SW-207579] support load vLLM compatible FP8 model (#18) Support load vLLM compatible FP8 model, both G2 and G3, both single card and multi-cards. --------- Signed-off-by: changwang <changwang@habana.ai> * [SW-207451] Implement block-wise calibration for LLM (#41) * [SW-207451] Implement block-wise calibration for LLM --------- Signed-off-by: Xin <xin3.he@intel.com> Co-authored-by: Xin He <xinhe3@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-208986] fix save&load bug (#40) * [SW-208986] fix save&load bug --------- Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * [SW-207748] Add Auto-round Example (#42) * add autoround hpu example Change-Id: Ibd537f4667c7c077160427722a5eca2c721aa5cd Signed-off-by: Yi Liu <yiliu4@habana.ai> * add requirements Change-Id: I77a95ec05e41247db9903e8622c31f05259ca365 Signed-off-by: Yi Liu <yiliu4@habana.ai> --------- Signed-off-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Signed-off-by: Xin He <xinhe3@habana.ai> * [SW-197077] fix bug (#47) * [SW-210541] loading for fused_sdpa requires additional amax scale (#51) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * fix PatchedLoRACompatibleLinear init (#65) Signed-off-by: changwangss <changwang@habana.ai> * align files with v1.19.0 in fp8_quant folder Signed-off-by: Xin He <xinhe3@habana.ai> * fix missing SaveLoadFormat Signed-off-by: Xin He <xinhe3@habana.ai> * align and fix config after cherry-pick Signed-off-by: Xin He <xinhe3@habana.ai> * Implicit relative imports is abandoned Signed-off-by: Xin He <xinhe3@habana.ai> * fix config issue blocking CI Signed-off-by: Xin He <xinhe3@habana.ai> * remove synchronize for `pack_unpack_tensor_with_numpy` (#2070) * remove pack&unpack synchronize --------- Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * stop auto-fix of pre-commit Signed-off-by: Xin He <xinhe3@habana.ai> * update autoround example for release test Signed-off-by: xin3he <xin3.he@intel.com> * fix AWQ&TEQ loading due to input scale Signed-off-by: xin3he <xin3.he@intel.com> * fix HQQ state_dict loading caused by [SW-195965] Signed-off-by: xin3he <xin3.he@intel.com> * use per_channel as default config (#2091) Signed-off-by: yiliu30 <yi4.liu@intel.com> * workaround transformers issue in version 4.47.0 (#2092) * workaround transformers issue in version 4.47.0 Signed-off-by: xin3he <xin3.he@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Refactor FP8 pytest script (#2089) * Refactor FP8 pytest script --------- Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update ci scan scope Signed-off-by: chensuyue <suyue.chen@intel.com> * [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * [SW-213236] resolve CPU mem issue in CI (#76) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: Xin He <xinhe3@habana.ai> * recover pre-commit Signed-off-by: Xin He <xinhe3@habana.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix `is_sharded` setting for loading quant model (#2094) Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> * fix error message for different python version (#2099) Signed-off-by: changwangss <changwang@habana.ai> * fix UT of RTN on HPU (#2098) Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix device issue during calibration (#2100) Signed-off-by: Xin He <xinhe3@habana.ai> * fix woq example and update document for v1.19.0 (#2097) Signed-off-by: xin3he <xin3.he@intel.com> * Refactor version import paths to common module (#2095) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * update CI gaudi-docker to 1.19.0 (#2096) Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> * fix device mapping issue of llama gptq (#2101) Signed-off-by: Xin He <xinhe3@habana.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * remove fix_measurements.py. exists with a different name - postprocessing_vllm_measurements.py * fix merge * remove unused imported functions with wrong path * change envar requested value from 1 to true --------- Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com> Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: xin3he <xin3.he@intel.com> Signed-off-by: Kaihui-intel <kaihui.tang@intel.com> Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com> Signed-off-by: chensuyue <suyue.chen@intel.com> Signed-off-by: yiliu30 <yi4.liu@intel.com> Signed-off-by: He, Xin3 <xin3.he@intel.com> Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Huang, Tai <tai.huang@intel.com> Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Signed-off-by: Mengni Wang <mengni.wang@intel.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Zhou Yuwen <zyuwen@habana.ai> Signed-off-by: xinhe3 <xinhe3@hababa.ai> Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: xinhe3 <xinhe3@habana.ai> Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com> Signed-off-by: Yi Liu <yiliu4@habana.ai> Signed-off-by: changwang <changwang@habana.ai> Signed-off-by: fengding <feng1.ding@intel.com> Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com> Signed-off-by: Xin He <xinhe3@habana.ai> Signed-off-by: Mengni Wang <mewang@habana.ai> Signed-off-by: Xin <xin3.he@intel.com> Signed-off-by: changwangss <changwang@habana.ai> Co-authored-by: Zixuan Cheng <110808245+violetch24@users.noreply.github.com> Co-authored-by: xinhe <xin3.he@intel.com> Co-authored-by: zehao-intel <zehao.huang@intel.com> Co-authored-by: Kaihui-intel <kaihui.tang@intel.com> Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com> Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com> Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Huang, Tai <tai.huang@intel.com> Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com> Co-authored-by: Wang, Mengni <mengni.wang@intel.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai> Co-authored-by: Uri Livne <ulivne@habana.ai> Co-authored-by: Danny Semiat <dsemiat@habana.ai> Co-authored-by: smarkovichgolan <smarkovich@habana.ai> Co-authored-by: Dudi Lester <dlester@habana.ai> Co-authored-by: Yi Liu <yi4.liu@intel.com> Co-authored-by: WeiweiZhang1 <weiwei1.zhang@intel.com> Co-authored-by: Tomer Gafni <tgafni@habana.ai> Co-authored-by: Eran Geva <egeva@habana.ai> Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com> Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai> Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai> Co-authored-by: xinhe3 <xinhe3@habana.ai> Co-authored-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: feng-intel <110514170+feng-intel@users.noreply.github.com> Co-authored-by: duanshengliu <44742794+duanshengliu@users.noreply.github.com> Co-authored-by: Mengni Wang <mewang@habana.ai> Co-authored-by: Jimin Ha <jha@habana.ai> Co-authored-by: changwang <changwang@habana.ai> Co-authored-by: Yi Liu <yiliu4@habana.ai> Co-authored-by: Amadeusz Skrzypczak <askrzypczak@habana.ai> Co-authored-by: Linoy Buchnik <linoybu@gmail.com>

xin3he force-pushed the cherry_pick_1.18.0 branch from be20c15 to c7f995b Compare October 11, 2024 02:57

xin3he marked this pull request as draft October 11, 2024 02:58

fix missing

279fcb4

Signed-off-by: xinhe3 <xinhe3@habana.ai>

xin3he force-pushed the cherry_pick_1.18.0 branch from 8327ea1 to 279fcb4 Compare October 11, 2024 03:11

xin3he changed the title ~~Cherry pick 1.18.0~~ Cherry pick Habana software 1.18.0 update Oct 11, 2024

Yantom1 and others added 24 commits October 11, 2024 06:12

[SW-192016] fix measurements tool

22dcb77

Change-Id: I69b3228c708b766fa3d3a7b8f8680bc2a98e5e62

[SW-192996] llama promotion pip install versions collision of INC dep…

902e8b2

…endencies Change-Id: I43563223dedb8578cdaee230dd8dd68fb70d17c4 Signed-off-by: xinhe3 <xinhe3@habana.ai>

[SW-198294] INC to read world size and local rank from torch.distribu…

6a54fd7

…ted instead of deepspeed env Change-Id: I5a585037ee049dedc671e320c57e6e13151d79a8

[SW-196480] Add pow2 scale config for more modules

54835da

Change-Id: I1cf46b9cc4f06cfa74f7bbcb7142c1387f294e6d

[SW-198690] Import numba packing func when needed

9dd31fd

Change-Id: Idff4c54d4737a418cd3c56e127259163bdff29e5 Signed-off-by: Yi Liu <yiliu4@habana.ai>

[SW-180042] Change HQT HF8_143 max range

9b963e6

Change-Id: Ie34d80ea536b7b01b38435fe48203c88a3442e37

[SW-187399] Support vLLM Mixtral quantization

aaefde6

Change-Id: Ia53ccae8d1fe5beb45ed625ac0defcd05393c047

[SW-198578] fix parsing of default fake_quant config

8a1e6db

Change-Id: I4ae7770ace4440a998599d3e6ae5b76e34bf404b

[ALGO-797] enabled quarot - modified unittest, improved scripts reada…

671de6f

…bility Change-Id: I125e08364835b87d97cf243a89db13fda8958f20

[SW-199642] Fix way of getting device type in scale calculation in INC

02d0868

Change-Id: Ie2e7ccf5b6cfe016e93378066ccb5730c2255274

[SW-194429] Workaround to cholesky accuracy issue

b7e07e9

Change-Id: I87311e50a5bb1e0298ba39646930be608f783eee

[SW-200060] fixed missing __init__.py in quarot

283cd69

Change-Id: I2bb14d1a4c5840965bf8bd23def0a4df9aa66abb

[SW-199769] Bugfix: fixing ignored types in whitelist

5baf7d3

Change-Id: Iec299bfb45c167bcac7dc12a12991db4eebce440

[SW-199944] Remove pre installed transformers in release dockers

ead322c

Change-Id: I093cb0773f9ca1043c88ba7f1fb80df6ec0570b7 Signed-off-by: xinhe3 <xinhe3@habana.ai>

[SW-174155] Revert ProcessSafeReadWriteLock implementation

212e96f

Change-Id: I049181549e32be923695e18ed31a47e80a57a783

[SW-198238] support scales as scalar for low bs perf

0eec282

Support config of scales as scalar create scales tensors according to config (scale or const) currnetly fsdpa op isn't supported due to op python API Change-Id: Ieb9d550a6118f9134c7d9d39db0bf0355192263c

[SW-199826] rename neural_compressor_3x_pt to neural_compressor_pt an…

ac49e2d

…d update version to 3.1 Change-Id: I58d8e1e2443e3d16f1ac4a18abf5ef0b66319089 Signed-off-by: xinhe3 <xinhe3@habana.ai>

[SW-199866] removing PatchedMoeMatmul weight contiguous call on measure

373d052

Change-Id: I29e7313612435054c751806575b03f3e7a41a9be

[SW-198220] Fix bug with scale_to_scalar with dim != 0

83c3d5f

Change the test to numel() == 1 instead of dim() == 0 Change-Id: I510aa2cc8f04e30d4c5346040ed0611eaa407cf4

Revert "[SW-194429] Workaround to cholesky accuracy issue"

06d5801

This reverts commit 87fe15e. Change-Id: I22d429626a8a024a5893c439c70eb22e844c6736

[SW-198220] Set PatchedSoftmax to work with CONST scale only

598ef84

Block PatchedSoftmax from accepting SCALAR scale_format Change-Id: I3836232ee06a6c4e76c74290b19436ae8bbff41c

[SW-201115] [performance] fallback ops from hpu to cpu

8990ab5

Change-Id: I782d3070b61160562c96dfce243cc4f52c782365 Signed-off-by: xinhe3 <xinhe3@habana.ai>

ftian1 approved these changes Oct 12, 2024

View reviewed changes

add comment

a6722e9

Signed-off-by: xinhe3 <xinhe3@habana.ai>

yiliu30 approved these changes Oct 14, 2024

View reviewed changes

xin3he and others added 4 commits October 14, 2024 09:29

Update PT_FP8Quant.md

a7451c3

move fp8_sample to hello_world/fp8_example

a543abc

Signed-off-by: xinhe3 <xinhe3@habana.ai>

[SW-203452] Fixing and temp skipping G3 unittests

2b802e8

Change-Id: Iafa4a6a8577724bd8a86581bfe38d3269dab2ea2 Signed-off-by: xinhe3 <xinhe3@habana.ai>

set deepspeed to use Habana v1.18.0

9cc006f

Signed-off-by: xinhe3 <xinhe3@habana.ai>

thuang6 approved these changes Oct 14, 2024

View reviewed changes

XuehaoSun and others added 4 commits October 15, 2024 10:16

update unittest

4dd335a

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

remove deepspeed version limit in test

85e223e

Signed-off-by: chensuyue <suyue.chen@intel.com>

improve UT coverage

a595949

Signed-off-by: xinhe3 <xinhe3@habana.ai>

update example requirement

4f4a0bf

Signed-off-by: xinhe3 <xinhe3@habana.ai>

xin3he force-pushed the cherry_pick_1.18.0 branch from 8be33d5 to 4f4a0bf Compare October 15, 2024 06:26

add __init__

2323a20

Signed-off-by: xinhe3 <xinhe3@habana.ai>

chensuyue added this to the v3.1 milestone Oct 15, 2024

xinhe3 added 2 commits October 15, 2024 10:43

change requirement per discussion

55d19c4

Signed-off-by: xinhe3 <xinhe3@habana.ai>

update setup.py

b072f18

Signed-off-by: xinhe3 <xinhe3@habana.ai>

MrGeva reviewed Oct 15, 2024

View reviewed changes

test/3x/torch/algorithms/fp8_quant/unit_tests/test_functions/test_config_json.py Show resolved Hide resolved

xin3he and others added 3 commits October 15, 2024 20:17

Merge branch 'master' into cherry_pick_1.18.0

cb91dff

Merge branch 'master' into cherry_pick_1.18.0

ef5d3dc

update woq example

9d8a2ee

Signed-off-by: xinhe3 <xinhe3@habana.ai>

thuang6 reviewed Oct 16, 2024

View reviewed changes

refine document per suggestion

4c34307

Signed-off-by: xinhe3 <xinhe3@habana.ai>

xin3he mentioned this pull request Oct 17, 2024

Fix load INC load weights compile error due to Transformer 4.45 upgrade. huggingface/optimum-habana#1421

Merged

fix CI

9377ac6

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

chensuyue merged commit 5fb2184 into master Oct 17, 2024
41 of 45 checks passed

chensuyue deleted the cherry_pick_1.18.0 branch October 17, 2024 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cherry pick Habana software 1.18.0 update #2025

Cherry pick Habana software 1.18.0 update #2025

Uh oh!

xin3he commented Oct 11, 2024

Uh oh!

ftian1 commented Oct 12, 2024

Uh oh!

yiliu30 left a comment

Uh oh!

Uh oh!

yiliu30 Oct 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thuang6 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Cherry pick Habana software 1.18.0 update #2025

Cherry pick Habana software 1.18.0 update #2025

Uh oh!

Conversation

xin3he commented Oct 11, 2024

Type of Change

Uh oh!

ftian1 commented Oct 12, 2024

Uh oh!

yiliu30 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiliu30 Oct 14, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thuang6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants