Skip to content

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Feb 28, 2022

Bumps pyyaml from 5.3.1 to 5.4.

Changelog

Sourced from pyyaml's changelog.

5.4 (2021-01-19)

Commits
  • 58d0cb7 5.4 release
  • a60f7a1 Fix compatibility with Jython
  • ee98abd Run CI on PR base branch changes
  • ddf2033 constructor.timezone: _copy & deepcopy
  • fc914d5 Avoid repeatedly appending to yaml_implicit_resolvers
  • a001f27 Fix for CVE-2020-14343
  • fe15062 Add 3.9 to appveyor file for completeness sake
  • 1e1c7fb Add a newline character to end of pyproject.toml
  • 0b6b7d6 Start sentences and phrases for capital letters
  • c976915 Shell code improvements
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

Bumps [pyyaml](https://github.com/yaml/pyyaml) from 5.3.1 to 5.4.
- [Release notes](https://github.com/yaml/pyyaml/releases)
- [Changelog](https://github.com/yaml/pyyaml/blob/master/CHANGES)
- [Commits](yaml/pyyaml@5.3.1...5.4)

---
updated-dependencies:
- dependency-name: pyyaml
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Feb 28, 2022
@chensuyue
Copy link
Contributor

Fixed by internal PR.

@chensuyue chensuyue closed this Mar 8, 2022
@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Mar 8, 2022

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.

@chensuyue chensuyue deleted the dependabot/pip/examples/pytorch/nlp/huggingface_models/common/examples/research_projects/lxmert/pyyaml-5.4 branch March 8, 2022 14:01
VincyZhang pushed a commit that referenced this pull request Feb 12, 2023
…le node setting for AutoDistillation (#54)

* added Distributed Data Parallel training support on multi-GPU in single node setting for AutoDistillation

* add bert-tiny example in autodistillation

* added readme for Distributed Data Parallel training and bert-tiny
xin3he added a commit that referenced this pull request Dec 18, 2024
…xt for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
xin3he added a commit that referenced this pull request Dec 19, 2024
…xt for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
chensuyue pushed a commit that referenced this pull request Dec 19, 2024
…xt for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
xin3he added a commit that referenced this pull request Feb 14, 2025
…xt for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54) (#77)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
yiliu30 pushed a commit that referenced this pull request Feb 14, 2025
…xt for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54) (#77)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
yiliu30 added a commit that referenced this pull request Feb 14, 2025
* modify 3.x ipex example structure (#1858)

* modify 3.x ipex example structure

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* add json path

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* fix for sq

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* minor fix

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* Update run_clm_no_trainer.py

* Update run_clm_no_trainer.py

* Update run_clm_no_trainer.py

* minor fix

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* remove old files

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* fix act_algo

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

---------

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>
Co-authored-by: xinhe <xin3.he@intel.com>

* Improve UT Branch Coverage for TF 3x (#1867)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* [3x] add recommendation examples (#1844)

Signed-off-by: xin3he <xin3.he@intel.com>

* Add PT2E cv&llm example (#1853)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update SQ/WOQ status (#1869)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>

* Modify WOQ examples structure (#1866)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>

* update v2.6 release readme (#1871)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Limit numpy versions (#1874)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix layer match (#1873)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Enhance autotune to return the best `q_model` directly (#1875)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add op statistics dump for woq (#1876)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* rm cov (#1878)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add `set_local` support for static quant with pt2e (#1870)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Update the Gaudi container example in the README (#1885)

* support quant_lm_head arg in all WOQ configs (#1881)

Signed-off-by: xin3he <xin3.he@intel.com>

* Fix sql injection for Neural Solution gRPC (#1879)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Remove Gelu Fusion for TF Newapi (#1886)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* Refine HQQ UTs (#1888)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* tmp fix nas deps issue (#1896)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* support auto_host2device on RTN and GPTQ(#1894)

Signed-off-by: He, Xin3 <xin3.he@intel.com>

* remove import pdb (#1897)

Signed-off-by: changwangss <chang1.wang@intel.com>

* Port auto-detect absorb layers for TEQ (#1895)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Remove 1x API (#1865)

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>

* remove neural insight CI (#1903)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix bf16 symbolic_trace bug (#1892)

Description: fix bf16 symbolic_trace bug,

- cause abnormal recursive calling.
- missing necessary attributes
- By moving BF16 fallback ahead of quantization and removing bf16_symbolic_trace, we fix it.

---------

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* update fp4_e2m1 mapping list (#1906)

* update fp4_e2m1 mapping list

* Update utility.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add docstring for `common` module (#1905)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* support habana fp8 UT test in CI (#1909)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* bump version into 3.0 (#1908)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* implement `incbench` command for ease-of-use benchmark (#1884)

# Description
 implement incbench command as entrypoint for ease-of-use benchmark
 automatically check numa/socket info and dump it with table for ease-of-understand
 supports both Linux and Windows platform
 add benchmark documents
 dump benchmark summary
 add benchmark UTs

# General Use Cases
incbench main.py: run 1 instance on NUMA:0.
incbench --num_i 2 main.py: run 2 instances on NUMA:0.
incbench --num_c 2 main.py: run multi-instances with 2 cores per instance on NUMA:0.
incbench -C 24-47 main.py: run 1 instance on COREs:24-47.
incbench -C 24-47 --num_c 4 main.py: run multi-instances with 4 COREs per instance on COREs:24-47.

---------

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>

* Get default config based on the auto-detect CPU type (#1904)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add export support for TEQ (#1910)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* update Gaudi CI baseline artifacts name (#1912)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Remove deprecated modules (#1872)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* fix CI docker container clean up issue (#1917)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* remove 1x docs (#1900)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add `save`/`load` support for HQQ (#1913)

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>

* Support PT2E save and load (#1918)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* implement TorchBaseConfig (#1911)

Signed-off-by: xin3he <xin3.he@intel.com>

* update documentation for 3x API (#1923)

Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix typo in architecture diagram (#1924)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Support woq Autotune (#1921)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Support absorb dict for awq (#1920)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Support LayerWise for RTN/GPTQ (#1883)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: chensuyue <suyue.chen@intel.com>

* update itrex ut test (#1929)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* add docstring for torch.quantization and torch.utils (#1928)

Signed-off-by: xin3he <xin3.he@intel.com>

* Integrate AutoRound v0.3 (#1925)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Integrate AutoRound v0.3 to 2x (#1926)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Enhance load_empty_model import (#1930)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Add doc for client usage (#1914)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* remove peft version limit (#1933)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Support xpu for ipex static quant (#1916)

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* Support calib_func on TF 3x API (#1934)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* 3.X API installation update (#1935)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Fix unused pkgs  import (#1931)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Add docstring for PT2E and HQQ (#1937)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* add docstring for static quant and smooth quant (#1936)

* add docstring for static quant and smooth quant

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* format fix

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* update scan path

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* Update utility.py

---------

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>
Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* Update Example for Pytorch 3x Mixed Precision (#1882)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* add read permission token (#1942)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Add docstring for WOQ&LayerWise (#1938)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: xinhe <xin3.he@intel.com>

* add docstring for mx quant (#1932)

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: xinhe <xin3.he@intel.com>

* Update for API 3.0 online doc (#1940)

Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>

* Refine Pytorch 3x Mixed Precision Example (#1946)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* Update AutoRound commit version (#1941)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update publish.yml (#1949)

* Update publish.yml

* Update publish.yml

* Update publish.yml (#1950)

* Update doc for client-usage and LWQ (#1947)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Add Docstring for TF 3x API and Torch 3x Mixed Precision (#1944)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* Update Examples for TF 3x API (#1901)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* Complement UT of calibration function for TF 3x API (#1945)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* Enable yolov5 Example for TF 3x API  (#1943)

Signed-off-by: zehao-intel <zehao.huang@intel.com>

* add ipex xpu example to 3x API (#1948)

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* update 3x torch installation (#1957)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Add save/load for pt2e example (#1927)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Fix itrex qbits nf4/int8 training core dumped issue (#1954)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>

* new previous results could not find all raise issues in CI model test (#1958)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Set low_gpu_mem_usage=False for AutoRound

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Bump tensorflow version (#1961)

Signed-off-by: dependabot[bot] <support@github.com>

* fix docs link (#1959)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* fix welcome.html link issue (#1962)

Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>

* replenish docstring (#1955)

* replenish docstring

Signed-off-by: xin3he <xin3.he@intel.com>

* update  Quantizer API docstring

Signed-off-by: xin3he <xin3.he@intel.com>

* Add docstring for auto accelerator (#1956)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* temporary remove torch/quantization and add it back after fp8 code is updated.

* Update config.py

---------

Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com>

* add SDXL model example to INC 3.x (#1887)

* add SDXL model example to INC 3.x

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>

* add evaluation script

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* add test script

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* minor fix

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* Update run_quant.sh

* add iter limit

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* modify test script

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* update json

Signed-off-by: chensuyue <suyue.chen@intel.com>

* add requirements

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* Update run_benchmark.sh

* Update sdxl_smooth_quant.py

* minor fix

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

---------

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>
Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>
Co-authored-by: chensuyue <suyue.chen@intel.com>

* example update for 3.x ipex sq (#1902)

Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>

* Fix `opt_125m_woq_gptq_int4_dq_ggml` issue (#1965)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* remove unnecessary CI (#1966)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Add version mapping between INC and Gaudi SW Stack (#1967)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Add 3.x readme (#1971)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Fix broken link in docs (#1969)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Cherry pick v1.17.0 (#1964)

* [SW-184941] INC CI, CD and Promotion

Change-Id: I60c420f9776e1bdab7bb9e02e5bcbdb6891bfe52

* [SW-183320]updated setup.py

Change-Id: I592af89486cb1d9e0b5197521c428920197a9103

* [SW-177474] add HQT FP8 porting code

Change-Id: I4676f13a5ed43c444f2ec68675cc41335e7234dd
Signed-off-by: Zhou Yuwen <zyuwen@habana.ai>

* [SW-189361] Fix white list extend

Change-Id: Ic2021c248798fce37710d28014a6d59259c868a3

* [SW-191317] Raise exception according to hqt config object

Change-Id: I06ba8fa912c811c88912987c11e5c12ef328348a

* [SW-184714] Port HQT code into INC

HQT lib content was copied as is under fp8_quant

Tests were copied to 3.x torch location

Change-Id: Iec6e1fa7ac4bf1df1c95b429524c40e32bc13ac9

* [SW-184714] Add internal folder to fp8 quant

This is a folder used for experiments,
not to be used by users

Change-Id: I9e221ae582794e304e95392c0f37638f7bce69bc

* [SW-177468] Removed unused code + cleanup

Change-Id: I4d27c067e87c1a30eb1da9df16a16c46d092c638

* Fix errors in regression_detection

Change-Id: Iee5318bd5593ba349812516eb5641958ece3c438

* [SW-187731] Save orig module as member of patched module

This allows direct usage of the original module methods,
which solves torch compile issue

Change-Id: I464d8bd1bacdfc3cd1f128a67114e1e43f092632

* [SW-190899] Install packages according to configuration

Change-Id: I570b490658f5d2c5399ba1db93f8f52f56449525

* [SW-184689] use finalize_calibration intrenaly for one step flow

Change-Id: Ie0b8b426c951cf57ed7e6e678c86813fb2d05c89

* [SW-191945] align requirement_pt.txt in gerrit INC with Github INC

Change-Id: If5c0dbf21bf989af37a8e29246e4f8760cd215ef
Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* [SW-192358] Remove HQT reference in INC

Change-Id: Ic25f9323486596fa2dc6d909cd568a37ab84dd5e

* [SW-191415] update fp8 maxAbs observer  using torch.copy_

Change-Id: I3923c832f9a8a2b14e392f3f4719d233a457702f

* [SW-184943] Enhance INC WOQ model loading

- Support loading huggingface WOQ model
- Abstract WeightOnlyLinear base class. Add INCWeightOnlyLinear and HPUWeighOnlyLinear subclasses
- Load woq linear weight module by module
- Save hpu format tensor to reuse it once load it again

Change-Id: I679a42759b49e1f45f52bbb0bdae8580a23d0bcf

* [SW-190303] Implement HPUWeightOnlyLinear class in INC

Change-Id: Ie05c8787e708e2c3559dce24ef0758d6c498ac41

* [SW-192809] fix json_file bug when instantiating FP8Config class

Change-Id: I4a715d0a706efe20ccdb49033755cabbc729ccdc
Signed-off-by: Zhou Yuwen <zyuwen@habana.ai>

* [SW-192931] align setup.py with github INC and remove fp8_convert

Change-Id: Ibbc157646cfcfad64b323ecfd96b9bbda5ba9e2f
Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* [SW-192917] Update all HQT logic files with pre-commit check

Change-Id: I119dc8578cb10932fd1a8a674a8bdbf61f978e42
Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* update docstring

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

* add fp8 example and document (#1639)

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* Update settings to be compatible with gerrit

* enhance ut

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

* move fp8 sample to helloworld folder

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

* update torch version of habana docker

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update readme demo

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* update WeightOnlyLinear to INCWeightOnlyLinear

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add docstring for FP8Config

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* fix pylint

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* update fp8 test scripts

Signed-off-by: chensuyue <suyue.chen@intel.com>

* delete deps

Signed-off-by: chensuyue <suyue.chen@intel.com>

* update container into v1.17.0

Signed-off-by: chensuyue <suyue.chen@intel.com>

* update docker version

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* update pt ut

Signed-off-by: chensuyue <suyue.chen@intel.com>

* add lib path

Signed-off-by: chensuyue <suyue.chen@intel.com>

* fix dir issue

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update fp8 test scope

Signed-off-by: chensuyue <suyue.chen@intel.com>

* fix typo

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* update fp8 test scope

Signed-off-by: chensuyue <suyue.chen@intel.com>

* update pre-commit-ci

Signed-off-by: chensuyue <suyue.chen@intel.com>

* work around for hpu

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* fix UT

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* fix parameter

Signed-off-by: chensuyue <suyue.chen@intel.com>

* omit some test

Signed-off-by: chensuyue <suyue.chen@intel.com>

* update main page example to llm loading

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix autotune

Signed-off-by: xinhe3 <xinhe3@hababa.ai>

---------

Signed-off-by: Zhou Yuwen <zyuwen@habana.ai>
Signed-off-by: xinhe3 <xinhe3@hababa.ai>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: smarkovichgolan <smarkovich@habana.ai>
Co-authored-by: Dudi Lester <dlester@habana.ai>

* update main page (#1973)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* fix online doc search issue (#1975)

Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>

* bump main version into v3.1 (#1974)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* update readme for fp8 (#1979)

Signed-off-by: xinhe3 <xinhe3@habana.ai>

* Skip some tests for torch 2.4 (#1981)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Fix UT env and upgrade torch to 2.4.0 (#1978)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* support gptq `true_sequential` and `quant_lm_head` (#1977)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* update installation and ci test for 3x api (#1991)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* add hasattr check for torch fp8 dtype (#1985)

Signed-off-by: xin3he <xin3.he@intel.com>

* add quantize, save, load function for transformers-like api (#1986)

Signed-off-by: changwangss <chang1.wang@intel.com>

* Update installation_guide.md (#1989)

Correct typo in installation doc

* update 3x pt binary build (#1992)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* add per_channel_minmax (#1990)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Remove the save of gptq config (#1993)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Add recent publications (#1995)

* add recent publications

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* update total count

Signed-off-by: Huang, Tai <tai.huang@intel.com>

---------

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* update docker image prune rules (#2003)

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Support transformers-like api for woq quantization (#1987)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>

* add INC_FORCE_DEVICE introduction (#1988)

* add INC_FORCE_DEVICE introduction

Signed-off-by: xin3he <xin3.he@intel.com>

* Update PyTorch.md

* Update PyTorch.md

* Update docs/source/3x/PyTorch.md

Co-authored-by: Yi Liu <yi4.liu@intel.com>

* rename to INC_TARGET_DEVICE

Signed-off-by: xin3he <xin3.he@intel.com>

---------

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: Yi Liu <yi4.liu@intel.com>

* Replace FORCE_DEVICE with INC_TARGET_DEVICE [transformers] (#2005)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* enable auto_round format export (#2002)

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* remove accelerate version in unit test (#2007)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* add repack_awq_to_optimum_format function (#1998)

Signed-off-by: changwangss <chang1.wang@intel.com>

* Update auto_round requirements for transformers example (#2013)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* add pad_to_buckets in evaluation for hpu performance (#2011)

* add pad_to_buckets in evaluation for hpu performance
---------

Signed-off-by: xin3he <xin3.he@intel.com>

* Update model accuracy (#2006)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix xpu device set weight and bias (#2010)

Signed-off-by: changwangss <chang1.wang@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Add transformers-like api doc (#2018)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Adapt transformers 4.45.1 (#2019)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: changwangss <chang1.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add autoround EMNLP24 to pub list (#2014)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Fix transformers rtn layer-wise quant (#2008)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Remove itrex dependency for 3x example (#2016)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* add transformers-like api link in readme (#2022)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* Add woq examples (#1982)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* remove ITREX unit test CI (#2021)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Support quant procedure on XPU (#2026)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Support generation search for transformers examples (#2029)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Remove itrex dependency for 2x example  (#2024)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update the PT2E CV example (#2032)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Cherry pick Habana software 1.18.0 update (#2025)

Signed-off-by: xinhe3 <xinhe3@habana.ai>
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Dudi Lester <dlester@habana.ai>
Co-authored-by: Danny <dsemiat@habana.ai>
Co-authored-by: Tomer Gafni <tgafni@habana.ai>
Co-authored-by: Eran Geva <egeva@habana.ai>
Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com>
Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai>
Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update gaudi version mapping table for v3.1 (#2030)

Signed-off-by: Huang, Tai <tai.huang@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>

* fix broken link to FP8 example (#2034)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* add back missing image (#2035)

Signed-off-by: xin3he <xin3.he@intel.com>

* Add vlm examples, bugfix (#2012)

* add VLM examples

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* bugfix, add utils

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix docstring issues

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bugfix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine examples

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix scan issue

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refine shell

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* refine scripts & requirements

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* typofix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* refine docs

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* set attn_implementation for Phi3-vision

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* refine phi3 example

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix code coverage

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update config

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* refine shells, docs and example. enable qwen2-vl quantization

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ci

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix EOF error

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update qwen dir

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* refine shell, add llama3.2 inference to doc

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* bugfix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bugfix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* bugfix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* refine eval shell

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* fix eval device issue

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* refine eval dtype

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* remove autoround limit (#2036)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Adapt autoround format (#2038)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* remove transformers import from utility (#2045)

* remove transformers import from utility

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* bugfix

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixtypos

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>

---------

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add buckets setting for lm_eval (#2044)

* add buckets setting for lm_eval

Signed-off-by: xinhe3 <xinhe3@habana.ai>

* clear graph cache to avoid OOM

Signed-off-by: xinhe3 <xinhe3@habana.ai>

---------

Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>

* Enhance example for HPU performance (#2043)

* Enhance example for HPU performance

Signed-off-by: xinhe3 <xinhe3@habana.ai>

* Update run_clm_no_trainer.py

* remove wikitext to avoid oom for llama2-7b bs=8

* remove wikitext

Signed-off-by: xinhe3 <xinhe3@habana.ai>

---------

Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>

* remove useless code in setup.py (#2046)

* Update the default PT2E config (#2041)

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Support non-contiguous weight saving (#2049)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* fix GPTQ oom issue on HPU (#2042)

* fix GPTQ oom issue on HPU

Signed-off-by: xinhe3 <xinhe3@habana.ai>

---------

Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>

* fix bug and update readme (#2051)

* fix bug and update readme

---------

Signed-off-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>

* Support safetensors loading for layerwise (#2047)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Enhance WOQ example Readme and help (#2053)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* improve optimum-habana available check (#2054)

Signed-off-by: changwang <changwang@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fixed CI IPEX version (#2061)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Update torch config kwargs (#2055)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Support client `use_layer_wise` setting (#2048)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Check autoround before import it (#2062)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Delete fp8_quant/scripts/regression_detection directory (#2059)

A missed change when cherry-picking Habana software 1.18.0

* Make PatchedVLLMKVCache resiliant to forward API changes (#2067)

Change-Id: I33fad5c3e80e017099f300782809f24669765d42

Co-authored-by: Konrad Zawora <kzawora@habana.ai>

* Fix glm-4-9b oom issue on BMG

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Update recipes & Bump version to 3.2 (#2037)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Docs: Add customer defined calibration and update docker run (#2057)

Signed-off-by: fengding <feng1.ding@intel.com>

* Adapt torch and ipex 2.5 (#2066)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>

* Enhance `TBB` check (#2068)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Fix the PT2E UT (#2071)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* Support gptq layerwise on client (#2069)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Adapt autoround v0.4 (#2073)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Ensure that mul operators with shared initializer will not be absorbed in SmoothQuant (#2063)

Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>

* Integrate AutoRound v0.4 [3x] (#2072)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Update CI framework versions and README badge for release 3.1.1 (#2058)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Remove the examples force required torch 1.13.1  (#2074)

* remove alexnet_fashion_mnist notebook

Signed-off-by: chensuyue <suyue.chen@intel.com>

* remove rnnt in pytorch examples

Signed-off-by: chensuyue <suyue.chen@intel.com>

---------

Signed-off-by: chensuyue <suyue.chen@intel.com>

* Fix truthfulqa task evaluation issue

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Add required library for ONNX example (#2078)

* Add required library for ONNX example

* Update requirements.txt

* support autoround new API for VLM (#2075)

Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add import check (#2076)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* Update utility.py (#2079)

* Add gptq known issue (#2080)


Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Fix sdxl `q_unet` config (#2081)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* Fixed the PT2E LLM example (#2082)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* fix dlrm when using incbench (#2084)

Signed-off-by: Xin He <xinhe3@habana.ai>

* add mapping for v3.2 (#2085)

Signed-off-by: Huang, Tai <tai.huang@intel.com>

* [SW-192753] unify StaticQuantConfig and FP8Config

Change-Id: I2fe09ba4c575810a5b130268d63b9eee926bdf08
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-200124] Set Scalar as default scale format + Compatibility check

Set ScaleFormat.SCALAR as default value of 'scale_format'
Add reduction of 'scale_format' to 'CONST' value if using a PCQ scale_format or fake_quant
Add test to show Scalar models aren't giving wrong outputs
Fix fakequant test as it is problematic use of 'hpu_initialize' and should be fixed in SW-202697

Change-Id: I43ff4900e9e02ce7f50edcdbb19a28f4f615ef9c
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-201679] support unit_scales for FuseMoE

Change-Id: I02a63332bc09f1f6cdc3f133dd5f58829fcbad5a
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-203698] Add log for converting prepared model

Change-Id: I1464f11bbab27d9041c9ba6f448e5ae6fa43bc2d
Signed-off-by: Mengni Wang <mewang@habana.ai>

* [SW-199737] Measurement dump improvements

Add _validate_dump_path to make sure dump dir is writable and backup measurements

Change-Id: Ib64abe772b4c309bbf04de89477cde92ea47ade4

* [SW-203452] Fixing and temp skipping G3 unittests

Change-Id: Iafa4a6a8577724bd8a86581bfe38d3269dab2ea2
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-195965] [GPTQ] INC load model loads model in fp32 only

Change-Id: I597d19273786c0c169ad952ebe5a357274e358dc
Signed-off-by: xinhe3 <xinhe3@habana.ai>

* [SW-204016] Enable scale calculation with disk offload in INC

-move calculating scales and quantization config info during the module
patching loop as the weights there guaranteed to be on cpu.

Change-Id: Ifb2de4e67c1b36c611dcc50b4cd14731b0336c50

* [SW-202614] Llama70b int4 gptq with INC load flow - getting host OOM

Change-Id: Id1797371bb136502d89c4e8d17abcac1eaac4534
Signed-off-by: xinhe3 <xinhe3@habana.ai>

* [SW-199823] [HQT] fix INC one-step quantization API workflow

1. fix test_fp8_static_quant.py::TestFP8StaticQuant::test_one_step_quant_cv failure by deepcoping forward function in common.py
2. fix config.py: Object of type dict_keys is not JSON serializable by converting it to list
3. fix download issue of UT by using local tiny_gptj.json

Change-Id: I2ad3eac411e8fca9d88a021f6a5b9594e6c75ae9
Signed-off-by: xinhe3 <xinhe3@habana.ai>

* [SW-202617] vllm mixtral MoE quant and measure using forward call

Change-Id: I919f1e3597b6c95c3fc60db78ac9c0c06242b416
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-200092] Allow fsdpa and softmax to use scalar scales in INC

Change-Id: Ieba4c74c18624fb0c5fce6321671d6f4eb2b8c93
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-205363] Update _load_state_dict_into_meta_model

Update _load_state_dict_into_meta_model to compatible with
Transformer 4.45 release

Change-Id: Ib5d8ca777d38c7ae225b7174a886b333b6246ab1
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-184948] INC Q/DQ optimization, included conv2d, kv_cache, fsdpa,
softmax and other operators.

Change-Id: I920f8ad85b3493f1bd4bbe770533343e214fc2d1
Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-198585] Fix typo causing PatchedVLLMKVCache error

Change-Id: Iafdcc935f702bc4756e2ba89935becb3bc47a728

* [SW-199208] QDQ Refactor for Registering Patched Modules, Scaling Methods, and Observers

1. Extension APIs
    - `PatchedModuleBase` , `register_patched_module`
    - `ScalingMethodBase`, `register_scaling_methods`
    - `ObserverBase` ``register_observer`, `register_module_config_for_observer`

    Related files:
    - fp8_quant/patched_module_base.py
    - fp8_quant/observer_base.py
    - fp8_quant/_core/measure.py
    - test_register_apis.py

2. Device-agnostic Patching
    - Replaced `hpu` with `cur_accelerator.name()`
    - Replaced `htcore.mark_step()` with `cur_accelerator.synchronize()`
    - Removed `torch.device("hpu")` under observers and scaling method
    - Updated `hpu_accelerator.synchronize()` to `htcore.mark_step()` + `torch.hpu.synchronize()`

Change-Id: I83c6de928a991ed2c1b3b434d372f49e095c38d3
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Mengni Wang <mewang@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-203389] scalars scales doesn't provide dtype attribution

Change-Id: I4e40dc9b2d9cb65bc9e49571cd57a9ab030f5d7b
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-199208] fix ModuleInfo conversion issue

Change-Id: Ib6c35e1623dda3e470e569defccd607a18b43312

* [SW-200168] Enable working with G2 HW scales on G3

Change-Id: I17f71540eb78e828f01f1a11c8b233d60951293e
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-203389] fix get_scale_dtype to support PCQ scales

Change-Id: I923ace405a0f751a2e5a0a3aadb7abbb401a6c44

* [SW-199719] reduce PCQ scales memory usage

removed persistent full weight scales during PCQ quantization
instead we are keeping only the input and output channels scales
creating temporary full scale tensor on input quant Op call
since the full scale tensor is the same size as the orig bf16 weight
keeping all full scales persistently and the quntized weights will
result a quantized model that uses more memory than the unquantized.

Change-Id: Idc91c5ac8b9cfea2e2a3ad053cb4dc5464cff776

* [SW-206112] INC Q/DQ improvement - use Q/DQ ops

Change-Id: Ib03ea8744aa2cca8b606754c45944840da1c3898
Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-206693] Convert conv2d_fp8 params to list if necessary

It's needed for the new approach to dynamic shapes in PT2.5.

Change-Id: I8d5e620153970b210675459e3d6aecad8ca7cbde

* [SW-207411] Add catch for OSError in _validate_dump_path

Change-Id: I82bae184257f3da982877b3797f2ee8b40a573c8

* [SW-207328] remove accuracy check due to random issue

Change-Id: Ifbd985c31c3755b6ab353ef8fa45e911dd75d688
Signed-off-by: xinhe3 <xinhe3@habana.ai>

* [SW-207559] Folder layout refactoring and cleanup (phase 1)

Change-Id: Ic9bffd2b7477d4530b4e2a5e411760a731efb84b
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-193262] INC multi device save/load CP design in fp8 (#5)

Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-208521] one-step quantization got double memory usage (#3)

* [SW-208521] one-step quantization got double memory usage

Signed-off-by: Xin <xin3.he@intel.com>

* [SW-208789] Support quantizing FP16 model to FP8 (#15)

Since layer-wise is using memory mapping from disk, the model could be fp16 as it saved on disk, for example, llama2-7b.

We need to add logic to support this case to make sure layer-wise works well.

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-205959] Update _load_state_dict_into_meta_model for model with bias (#7)

Signed-off-by: Xin <xin3.he@intel.com>

* [SW-208700] release bf16 model memory on HPU in one-step quantization (#14)

Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] refactoring maxabs scales and adding arbitrary scales. (#12)

* [SW-197077] refactoring maxabs scales and adding arbitrary scales.

Change-Id: I2c35cf925b6b21983f1770db7d35e14f3d7d3e47

* [SW-197077] refactoring scale:
fix atol

Change-Id: I1c99ddd9ade679286988e7d8a96338b32c0ddc07

* [SW-197077]  adding arbitrary scales

* Skip autoround test for HPU (#19)

Change-Id: I6dc9724389c16a05252370b9e09a1db80bc8d696

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-199728] [DeepSpeed] Buffers initialized by model are not correct … (#16)

* [SW-199728] [DeepSpeed] Buffers initialized by model are not correct after tensor parallel

---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named… (#33)

* [SW-209256] fix GPTQ oom issue on HPU (#2042) (#20)

* fix GPTQ oom issue on HPU (#2042)
---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: xinhe3 <xinhe3@habana.ai>

* [SW-208151] CD 1.19.0 - PT Docker - test_quantization No module named 'safetensors'

Signed-off-by: Xin <xin3.he@intel.com>

---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>

* [SW-207748] Support Auto-round on HPU (#25)

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-209878] Increase threshold to avoid random error in test_layer_wise.py (#36)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-207579] support load vLLM compatible FP8 model (#18)

Support load vLLM compatible FP8 model, both G2 and G3, both single card and multi-cards.
---------

Signed-off-by: changwang <changwang@habana.ai>

* [SW-207451] Implement block-wise calibration for LLM (#41)

* [SW-207451] Implement block-wise calibration for LLM

---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-208986] fix save&load bug (#40)

* [SW-208986] fix save&load bug

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-207748] Add Auto-round Example (#42)

* add autoround hpu example

Change-Id: Ibd537f4667c7c077160427722a5eca2c721aa5cd
Signed-off-by: Yi Liu <yiliu4@habana.ai>

* add requirements

Change-Id: I77a95ec05e41247db9903e8622c31f05259ca365
Signed-off-by: Yi Liu <yiliu4@habana.ai>

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] fix bug (#47)

* [SW-210541] loading for fused_sdpa requires additional amax scale (#51)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* fix PatchedLoRACompatibleLinear init (#65)

Signed-off-by: changwangss <changwang@habana.ai>

* align files with v1.19.0 in fp8_quant folder

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix missing SaveLoadFormat

Signed-off-by: Xin He <xinhe3@habana.ai>

* align and fix config after cherry-pick

Signed-off-by: Xin He <xinhe3@habana.ai>

* Implicit relative imports is abandoned

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix config issue blocking CI

Signed-off-by: Xin He <xinhe3@habana.ai>

* remove synchronize for `pack_unpack_tensor_with_numpy` (#2070)

* remove pack&unpack synchronize

---------

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* stop auto-fix of pre-commit

Signed-off-by: Xin He <xinhe3@habana.ai>

* update autoround example for release test

Signed-off-by: xin3he <xin3.he@intel.com>

* fix AWQ&TEQ loading due to input scale

Signed-off-by: xin3he <xin3.he@intel.com>

* fix HQQ state_dict loading caused by [SW-195965]

Signed-off-by: xin3he <xin3.he@intel.com>

* use per_channel as default config (#2091)

Signed-off-by: yiliu30 <yi4.liu@intel.com>

* workaround transformers issue in version 4.47.0 (#2092)

* workaround transformers issue in version 4.47.0

Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Refactor FP8 pytest script (#2089)

* Refactor FP8 pytest script

---------

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update ci scan scope

Signed-off-by: chensuyue <suyue.chen@intel.com>

* [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-213236] resolve CPU mem issue in CI (#76)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* recover pre-commit

Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix `is_sharded` setting for loading quant model (#2094)

Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>

* fix error message for different python version (#2099)

Signed-off-by: changwangss <changwang@habana.ai>

* fix UT of RTN on HPU (#2098)

Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix device issue during calibration (#2100)

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix woq example and update document for v1.19.0 (#2097)

Signed-off-by: xin3he <xin3.he@intel.com>

* Refactor version import paths to common module (#2095)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* update CI gaudi-docker to 1.19.0 (#2096)

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

* fix device mapping issue of llama gptq (#2101)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* remove fix_measurements.py. exists with a different name - postprocessing_vllm_measurements.py

* fix merge

* remove unused imported functions with wrong path

* change envar requested value from 1 to true

---------

Signed-off-by: Cheng, Zixuan <zixuan.cheng@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: xin3he <xin3.he@intel.com>
Signed-off-by: Kaihui-intel <kaihui.tang@intel.com>
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Signed-off-by: chensuyue <suyue.chen@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: He, Xin3 <xin3.he@intel.com>
Signed-off-by: changwangss <chang1.wang@intel.com>
Signed-off-by: Huang, Tai <tai.huang@intel.com>
Signed-off-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>
Signed-off-by: Mengni Wang <mengni.wang@intel.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Zhou Yuwen <zyuwen@habana.ai>
Signed-off-by: xinhe3 <xinhe3@hababa.ai>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: xinhe3 <xinhe3@habana.ai>
Signed-off-by: Zhang, Weiwei1 <weiwei1.zhang@intel.com>
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: fengding <feng1.ding@intel.com>
Signed-off-by: duansheng.liu <44742794+duanshengliu@users.noreply.github.com>
Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Mengni Wang <mewang@habana.ai>
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: changwangss <changwang@habana.ai>
Co-authored-by: Zixuan Cheng <110808245+violetch24@users.noreply.github.com>
Co-authored-by: xinhe <xin3.he@intel.com>
Co-authored-by: zehao-intel <zehao.huang@intel.com>
Co-authored-by: Kaihui-intel <kaihui.tang@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: Yi Liu <106061964+yiliu30@users.noreply.github.com>
Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>
Co-authored-by: Wang, Chang <chang1.wang@intel.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Huang, Tai <tai.huang@intel.com>
Co-authored-by: violetch24 <zixuan@aia-sdp-spr-117706.jf.intel.com>
Co-authored-by: Wang, Mengni <mengni.wang@intel.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
Co-authored-by: ZhangJianyu <zhang.jianyu@outlook.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
Co-authored-by: Ron Ben Moshe <rbenmoshe@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: smarkovichgolan <smarkovich@habana.ai>
Co-authored-by: Dudi Lester <dlester@habana.ai>
Co-authored-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: WeiweiZhang1 <weiwei1.zhang@intel.com>
Co-authored-by: Tomer Gafni <tgafni@habana.ai>
Co-authored-by: Eran Geva <egeva@habana.ai>
Co-authored-by: Daniel Ohayon <danielohayon444@gmail.com>
Co-authored-by: Roi Tiefenbrunn <rtiefenbrunn@habana.ai>
Co-authored-by: Kamil Felskowski <kfelskowskix@habana.ai>
Co-authored-by: xinhe3 <xinhe3@habana.ai>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: feng-intel <110514170+feng-intel@users.noreply.github.com>
Co-authored-by: duanshengliu <44742794+duanshengliu@users.noreply.github.com>
Co-authored-by: Mengni Wang <mewang@habana.ai>
Co-authored-by: Jimin Ha <jha@habana.ai>
Co-authored-by: changwang <changwang@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Amadeusz Skrzypczak <askrzypczak@habana.ai>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
XuehaoSun added a commit that referenced this pull request Feb 27, 2025
* [SW-210525] release HPU memory when loading neural_magic fp8 models (#48)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-211178] save generation_config when saving model if exists (#57)

* [SW-211178] save generation_config when saving model if exists

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-210543] update gitignore to simplify the git message (#50)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-205334][SW-187731] llama70b vLLM fix graph breaks with  torch.compile (#67)

* fix graph breaks with torch.compile

* remove orig_mod from helper_modules

* fix typos

* fix test_register_apis

---------

Co-authored-by: Rafal Litka <rlitka@habana.ai>

* [SW-213890] Disable test_two_step_layer_wise temporarily (#84)

* [SW-205437] - Support LM-HEAD patching (#79)

* [SW-205437] - Support LM-HEAD patching

* fix CR comments

* Enhance and rename fix_measurements tool to postprocessing_vllm_measurements (#82)

* [SW-214088] Fix graph break caused by PatchedMixtralMoE (#74)

* [SW-208528] Support FP8 per channel Q/DQ (#13)

* add per channel qdq support

Signed-off-by: changwang <changwang@habana.ai>

* improve ut

Signed-off-by: changwang <changwang@habana.ai>

* improve get_scale_dtype func and qdq init

Signed-off-by: changwangss <changwang@habana.ai>

* improve DequantOutput QuantInput init

Signed-off-by: changwangss <changwang@habana.ai>

* add scale_method improve PCQ

Signed-off-by: changwangss <changwang@habana.ai>

* remove scale name

Signed-off-by: changwangss <changwang@habana.ai>

* fix PCQ scale_inv expanding

Signed-off-by: changwangss <changwang@habana.ai>

* merge the qdq_per_channel, qdq_per_tensor to qdq

Signed-off-by: changwangss <changwang@habana.ai>

* move scale_inv change to the QuantInput init

Signed-off-by: changwangss <changwang@habana.ai>

* remove  scale_dtype list judge

Signed-off-by: changwangss <changwang@habana.ai>

* fix missing axis parameter

Signed-off-by: changwangss <changwang@habana.ai>

---------

Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: changwangss <changwang@habana.ai>

* [SW-204341] explicit scale format for ops (#73)

* [SW-204341] explicit scale format for ops

Added wrapper around fp8 functions

Wrapper decides which flavor of the function to call,
according to scale format

Helper modules call the wrapper

Decide which cast flavor to call,
according to scale format

* [SW-204341] Adjust softmax API , remove commented-out code

* [SW-204341] Fixes from CR 1

* [SW-204341] Fixed CR 2

* [SW-204341] add missing arg is fsdpa

Signed-off-by: Uri Livne <ulivne@habana.ai>

* [SW-204341] Enhance SDPA for measure and quant

* [SW-204341] remove sdpa quantized ops

* reland per op class with more enchancments

* [SW-204341] reland specfic arguments , rename class to wrapper

* added call with self in patched lm head

rebased on top of master next
force push

* fix mistake in conflict resolution

resotore MethodType fix

* antoher fix

* modified fp8 mtamul test to test quantized matmul func

* another fix of rebase mistake

* hopefully last rebase mistake fix

* restore backward compatibly import protection

---------

Signed-off-by: Uri Livne <ulivne@habana.ai>

* [SW-213890] Revert "[SW-213890] Disable test_two_step_layer_wise temporarily (#84)" (#86)

This reverts commit 27162ae.

* Revert "[SW-205334][SW-187731] llama70b vLLM fix graph breaks with  torch.com…" (#87)

This reverts commit 01a5734.

Co-authored-by: Danny Semiat <dsemiat@habana.ai>

* [ALGO-809] PatchedLmHeadLinearAllreduce: replacing the sharding code with the one from deepspeed-fork (#85)

Change-Id: Icb9670cfefdd1880c1ebb9a804a97c9ba79ecdc3

Co-authored-by: smarkovichgolan <smarkovich@habana.ai>

* fix bug of FusedMoE object has no attribute w13_weight (#94)

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>

* [SW-208588] Add HPU fp8 Dynamic MOE (#88)

* [SW-208588] Add HPU fp8 Dynamic MOE

* fix review comments

* fix more review comments

* fix comments

* fix tests

* minor config fixes (#96)

* [SW-0] minor cosmetic fixes in quant_config

* remove hooks

* [SW-196641] - Fix type mismatch in linear quantization unit tests (#99)

* [SW-196641] - Fix type mismatch in linear quantization unit tests

* fix atol value

* add hp_dtype to fp8 config dict before parsing

* [SW-214785] Apply PatchedModuleBase for all existing PatchedModules (#92)

* [SW-214785] Apply PatchedModuleBase for all existing PatchedModules

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-215319] threshold of memory usage in test_block_wise.py is too tight (#100)

* [SW-215543] Revert "minor config fixes (#96)" (#104)

This reverts commit fa40142.

* fix RowParalleLinear func names from string to tuple (#106)

* [SW-215615] memory is unreleased during loading neural_magic models on multi-cards (#105)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-212423] RuntimeError when load the gptq model from HF (#70)

* [SW-212423] RuntimeError when load the gptq model from HF
* skip tie_word_embeddings=False

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-214785] fix issue when self._mod_extra_config is None (#108)

* [SW-211826] [example] demonstrate layer-wise, block-wise and lm_eval usage (#66)

* [SW-211826] [example] demonstrate layer-wise&block-wise usage to quantize LLM with limited host&device memory

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-215295] Force single object from quantized func wrapper classes (#103)

* [SW-215295] Force single object from quantized func wrapper classes

* Modify the factory object to be cleared after module patching

* Move cleanup to Quantizer object

* [SW-216292]Minor update for lm-eval (#113)

* Enable lm-eval 0.4.2 and expose `add_bos_token`

---------

Signed-off-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Yi Liu <yiliu4@habana.ai>

* [SW-209207] add vllm fp8 dynamic MoE (#116)

* [SW-216239] Align Softmax fp8 scale calc with configuration (#112)

* [SW-217321] Skip auto round tests (#119) (#125)

* Test Commit

* [SW-217321] Skip auto round tests do to CI breakage

* remove uneeded print

* [SW-207451] Implement block-wise calibration for LLM (#24)

For LLMs, measurement on bf16 requires high hpu memory usage.
This change can help measure bf16 llama-405b on 8 Gaudi2 card, or measure llama-70b on 1 Gaudi card.
Shortage: cannot measure lm_head layer, maybe we can enhance it later.

---------

Signed-off-by: Xin <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] fix bug in output arbitrary scales (#45)

* [SW-197077] fix bug

* [SW-197077] fix bug in outputs arbitrary scales

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-197077] fix bug in output arbitrary scales (#45)

* [SW-197077] fix bug

* [SW-197077] fix bug in outputs arbitrary scales

* [SW-210500] [Optimum-Habana] [Regression] [fp8] [INC] No generated text for llava models [llava-1.5-7b-hf] [llava-1.5-13b-hf ] (#54) (#77)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-213236] resolve CPU mem issue in CI (#76) (#83)

Cherry-pick from 1.19
Co-authored-by: Xin He <xin3.he@intel.com>

* [SW-213368] requirements_pt.txt: allow newer pydantic versions to >= 1.10.13 (#80)

* requirements_pt.txt: upgrade pydantic version to >= 2.0.0

* allow newer version of pydantic

newer deepspeed uses pydantic v2, which have slight different APIs.

* Update requirements_pt.txt

* [SW-212057] Enable scalar scale to support QDQ (#98)

* [SW-212057] Enable scalar scale to support QDQ

Change-Id: Ib5f5accd7a770675609e91c18bd04497b15937c5

* PR comment fixes

Change-Id: I01be41c29721b8d59c887f3d2b4e3cef8433331c
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-215845] Run some unit tests from top level API (#109)

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-212629] Support saving weight-only quantization INT4 model in Hugging Face format (#101)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-205970] update state_dict to save scalar scales (#6)

* update state_dict method in save/load function

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* Revert "[SW-205970] update state_dict to save scalar scales (#6)" (#114)

This reverts commit ffcb97e.

* [SW-212092] Save vllm compatible format (#102)

* save vllm compatible format

Signed-off-by: changwangss <changwang@habana.ai>

* add assertion and improve max_file_size to human reading

Signed-off-by: changwangss <changwang@habana.ai>

* support default the same with huggingface when saving

Signed-off-by: changwangss <changwang@habana.ai>

* separate save funtion for single device and multi devices.

Signed-off-by: changwangss <changwang@habana.ai>

* rebase

Signed-off-by: changwangss <changwang@habana.ai>

* rebase save

Signed-off-by: changwangss <changwang@habana.ai>

* remove weight and scale convert on G2

Signed-off-by: changwangss <changwang@habana.ai>

* rebase master_next due to revert #6

Signed-off-by: changwangss <changwang@habana.ai>

* improve convert weight to vllm compatable function

Signed-off-by: changwangss <changwang@habana.ai>

* replace print to logger

Signed-off-by: changwangss <changwang@habana.ai>

* move unit_mapping to common utils

Signed-off-by: changwangss <changwang@habana.ai>

---------

Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-205970] update state_dict to save scalar scales (#115)

* [SW-205970] update state_dict to save scalar scales (#6)

* update state_dict method in save/load function

* support mixtral
---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* [SW-215009] support loading per-channel scales (#95)

* [SW-215009] support loading per-channel scales

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix UT

Signed-off-by: Xin He <xinhe3@habana.ai>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* Refactoring scales (#22) (#122)

* Refactoring scales (#22)

* [SW-197077] refactoring maxabs scales and adding arbitrary scales.

* [SW-199696] Supporting Dynamic Quantization (#128)

* Calculating dynamic scales using nn.Modules

Change-Id: I8c344ae737803b39117037edaaa3d3b9cbd09f30

* [SW-199696] Supporting Dynamic Quantization

Change-Id: Ic5d6f04ec0b5032ac305e1b3097747c47250385b

* Code cleanup

Change-Id: I213bc7438e06bd1002775066bfb0dc6f10e8a84a

* Review changes and model print issue (circular dependency fix)

Change-Id: I5c41d2f9a937416ce260f55cb045c86858dd201a

* removed debug code from patching_common.py

* Round 2 + CI import issue

Change-Id: I27dbb33de8e027fb0b726336b38156b5d23a6896
Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-217334] enable fp8 qdq mode using PatchedModuleBase (#129)

* [SW-217334] enable fp8 qdq mode using PatchedModuleBase

* fix review commnets

* [SW-218871] fp8 multi-cards is not loaded correctly (#138)

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>

* Fix bug in mixtral unitscale (#141)

* [SW-218197] fix bug in Mixtral unitscale

* [SW-218197] fix bug in Mixtral unitscale

* update version to 3.3 for release

Signed-off-by: Xin He <xinhe3@habana.ai>

* [SW-20808] Make sure save&load format is an Enum object (#58)

* [SW-20808] Make sure save&load format is an Enum object

Signed-off-by: Xin He <xinhe3@habana.ai>

* Update save_load_entry.py

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Co-authored-by: Xin He <xinhe3@habana.ai>
Signed-off-by: Xin He <xinhe3@habana.ai>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add xfail for torchvision

Signed-off-by: Xin He <xinhe3@habana.ai>

* fix ILITV-3859

Signed-off-by: xin3he <xin3.he@intel.com>

* workaround for ILITV-3858

Signed-off-by: xin3he <xin3.he@intel.com>

* fix sdxl_smooth_quant

Signed-off-by: xin3he <xin3.he@intel.com>

* fix ILITV-3854

Signed-off-by: xin3he <xin3.he@intel.com>

---------

Signed-off-by: Xin He <xinhe3@habana.ai>
Signed-off-by: changwang <changwang@habana.ai>
Signed-off-by: changwangss <changwang@habana.ai>
Signed-off-by: Uri Livne <ulivne@habana.ai>
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Yi Liu <yiliu4@habana.ai>
Signed-off-by: Xin <xin3.he@intel.com>
Signed-off-by: xin3he <xin3.he@intel.com>
Co-authored-by: Xin He <xinhe3@habana.ai>
Co-authored-by: RafLit <rafal.litka@intel.com>
Co-authored-by: Rafal Litka <rlitka@habana.ai>
Co-authored-by: Dany Kiazada <141814181+kiazada@users.noreply.github.com>
Co-authored-by: Nir David <124874956+nirda7@users.noreply.github.com>
Co-authored-by: Yuwen Zhou <yuwen.zhou@intel.com>
Co-authored-by: Wang, Chang <changwang@habana.ai>
Co-authored-by: Uri Livne <ulivne@habana.ai>
Co-authored-by: Oz Abramovich <oabramovich@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Danny Semiat <dsemiat@habana.ai>
Co-authored-by: smarkovichgolan <smarkovich@habana.ai>
Co-authored-by: Yi Liu <yi4.liu@intel.com>
Co-authored-by: Yi Liu <yiliu4@habana.ai>
Co-authored-by: Linoy Buchnik <linoybu@gmail.com>
Co-authored-by: Nadav Elyahu <88962733+nelyahu@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: chen, suyue <suyue.chen@intel.com>
Co-authored-by: Sun, Xuehao <xuehao.sun@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant