include nncf in openvino extras #586

eaidova · 2024-03-05T06:08:16Z

What does this PR do?

install nncf with openvino extras. role nncf in model optimization during openvino model export and loading is increased in last releases (e.g. export model with int8 or int4 weights compression for large models), it gives more benefits to users with having it from scratch

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-03-05T06:14:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

helena-intel · 2024-03-05T19:55:21Z

it gives more benefits to users with having it from scratch

Users already get NNCF by default, unless they choose to not install the nncf extra, even though that is what we recommend in documentation (or if they don't read documentation, but then they would also not know to install the openvino extra)

Two downsides of including NNCF in [openvino]:

NNCF adds 33 new dependencies (of dependencies). All these dependencies can have bugs, or incompatibilities. NNCF is imported on importing OVModelFor.... so these issues then can also affect users who do not even need NNCF (this happens from time to time).
Having nncf installed increases import time (again because when it is available, it is imported). On my laptop (10th gen Core) from optimum.intel import OVModelForSequenceClassification taskes 4.5-5.5 seconds without nncf installed, and 7.5-10 seconds with nncf installed. Not a problem when you're exporting a model once and need NNCF, but when you're writing code and have to run an inference script or notebook cell many times, this is noticeable.

AlexKoff88 · 2024-03-06T09:04:02Z

it gives more benefits to users with having it from scratch

Users already get NNCF by default, unless they choose to not install the nncf extra, even though that is what we recommend in documentation (or if they don't read documentation, but then they would also not know to install the openvino extra)

Two downsides of including NNCF in [openvino]:

NNCF adds 33 new dependencies (of dependencies). All these dependencies can have bugs, or incompatibilities. NNCF is imported on importing OVModelFor.... so these issues then can also affect users who do not even need NNCF (this happens from time to time).

Having nncf installed increases import time (again because when it is available, it is imported). On my laptop (10th gen Core) from optimum.intel import OVModelForSequenceClassification taskes 4.5-5.5 seconds without nncf installed, and 7.5-10 seconds with nncf installed. Not a problem when you're exporting a model once and need NNCF, but when you're writing code and have to run an inference script or notebook cell many times, this is noticeable.

Thanks, @helena-intel for providing your opinion here.
Regarding the additional 33 dependencies you mentioned, I would like to comment that originally it is much less (see https://github.com/openvinotoolkit/nncf/blob/481ce9d9624c4b699878aa337d2e9b78dc25a517/setup.py#L104). We are also validating compatibility of NNCF with OpenVINO on regular basis since this is a part of the product. Moreover, we rely only on official releases of NNCF in Optimum. If you know the cases where import of NNCF causes problems please share with reproducible step,
As for the import time comment, I don't think it is a stopper but we will investigate it. cc'ed @alexsu52, @l-bat please take a look.

Again, our motivation is to provide a smooth UX when working with LLMs and optimizing them. We are also enabling SD optimization in the separate PR. It involves NNCF as well.

helena-intel · 2024-03-06T11:57:42Z

Thanks @AlexKoff88 I agree, my comments are not intended as stopper. There is still a method to not install NNCF, so we can use that when needed.

…tension-for-transformers. (#455) * Support weight-only quantization with quantized operators in intel-extension-for-transformers * Update code style * Update readme for weight-only quantization example * Update code * Adapt intel-extension-for-transformers 1.3 API change Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Support weight-only quantization with quantized operators in intel-extension-for-transformers * Update code * rebase code on main branch Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Update example Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * [OV]: Fixed inference after 4 bit weight compression (#569) * [OV]: Fixed inferece after 4 bit weight compression * Fixed issue * Update optimum/intel/openvino/modeling_decoder.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Applied comments * Fixed issue when request is None --------- Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Updated docs with load_in_4bit (#558) * Updated docs with load_in_4bit * Update documentation * Update documentation * typo --------- Co-authored-by: Ella Charlaix <ella@huggingface.co> * Update Transformers dependency requirements (#571) * Fix compatibility for latest transformers release (#570) * fix compatibility for latest transformers release * update setup * update setup * fix test input size * fix prepare generation for llama models * Deprecate compression options (#565) * deprecate compression options * style * fix configuration * Update CLI argument * update documentation * deprecate torch nn modules for ov quantizer * fix ov config for fp32 models * fix format * update documentation * Add check for configuration * fix ratio default value for SD models * add quantization_config argument for OVModel * remove commented line * Update docs/source/inference.mdx Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com> * add default config for causal LM * fix warning message --------- Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com> * Add default quantization int4 config for Mixtral-8x7B (#576) * Update stable diffusion example requirements (#579) * Fix collecting duplicate tensors in quantization calibration dataset (#577) * Added deepcopying of inputs collected by InferRequestWrapper. Added a test covering the fixed issue. * Phrasing tweaks * Add soundfile to test requirements * Added librosa to test requirements * Added copying to other data cache appends * Remove the need for real test data * Process __call__ call properly * Addressed suggested changes * Save an openvino config summarizing all information related to quantization when saving model (#578) * fix doc * remove default compression value * set default compression config when not provided * save openvino config to include quantization configuration * fix style * add test * update setup * style * remove from quantization_config key from ov_config * add test * update setup * modify method name * Fix warning (#582) * Fix warning * fix message warning * Add reference to the temporary directory for windows fix (#581) * Fix documentation (#583) * Fix documentation * fix * Add llama test model to cover MQA (#585) * change llama test model to cover MQA * keep llama and llama2 in tests * fix code style * Include nncf in openvino extra (#586) * Fix title documentation (#588) * Update OpenVINO documentation links in README.md (#587) * Update OpenVINO documentation links in README.md The links are now aligned with OpenVINO 2024.0 documentation, and include permalinks instead of direct links, when possible. * Update inference.mdx * Update index.mdx * Update installation.mdx * Update README.md * Fix default int8 quantization for CLI (#592) * Change model output parameter to last_hidden_states for IPEXModel (#589) * change model output parameter to last_hidden_states * update ipex model testiong * update testing * add output name to ipex model * Add IPEX model patcher (#567) * llama model patcher * fix jit model * fix jit model * rm autocast in model * add llama model patcher * support assisted decoding and add reorder cache function * add comment for _prepare_past_key_values * rebase main * fix model_dtype * rm useless comments * fix llama * add comments for ipex_rope and ipex_scale_dot_product * fix comments * add enable_tpp comments * fix import * fix review aroun2 * add torch.no_grad to avoid auto_kernel_selection issue * use torch.no_grad in jit trace * fix ipex model testing * add tests for ipex model generation with multi inputs * fix code style * remove __get__(self) as _reorder_cache is static method for the class * fix reorder_cache * use model_type * check if reorder_cache is a static method * fix _reorder_cache * fix raise import error * test ipex patching * fix comments * update API name and testing * disable untill ipex version 2.5.0 * update testing name * Update optimum/intel/ipex/modeling_base.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update tests/ipex/test_modeling.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * fix tests --------- Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Updates weight quantization section in the docs (#593) * Remove accelerate and onnxruntime from required dependencies (#590) * Remove accelerate dependency * Add accelerate to import backend mapping * Add eval method to OVModels * add onnxruntime install for OV test * fix test expected int8 * Fix OpenVINO image classification examples (#598) * Fix weights compression for OPenVINO models (#596) * hot fix for weights compression * rewrite mcok tests * Fix default ov config (#600) * Add warning for transformers>=4.38 and OpenVINO 2024.0 (#599) * Add warning for transformers>=4.38 and OpenVINO 2024.0 * Use is_openvino_version to compare versions * Show version warning only for llama and gpt-bigcode * Fix style, show OpenVINO version * Include affected model types in warning message * Add hybrid quantization for StableDiffusion pipelines (#584) * Add hybrid quantization for StableDiffusion pipelines * apply black * fix tests * fix ruff * fix lcm bug * apply review comments * rework dataset processing * Add doc * remove SDXL test * Apply comments * reformat * Show device name in _print_compiled_model_properties (#541) * Show device name in _print_compiled_model_properties Enable CACHE_DIR also for devices like "GPU:0" * Update optimum/intel/openvino/modeling_seq2seq.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Change check for gpu device --------- Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update code with comments Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed pylint error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Update optimum/intel/neural_compressor/configuration.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Fixed example and UT for weight-only quantization Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed pre-ci test error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed pre-ci test error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed UT and examples error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed pre-CI error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed UT error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Update tests/openvino/test_modeling_basic.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update examples/neural_compressor/language-modeling/README.md Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Load weight-only quantized model with INCModelForCausalLM Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Changed parameters name for GPTQ in example Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Changed parameters order in INCQuantizer.quantize Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed UT error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Update examples/neural_compressor/text-generation/run_generation.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update optimum/intel/neural_compressor/quantization.py Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> * Update import message Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Limit intel-extension-for-transformers version Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Limit torch version for weight-only quantization Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> * Fixed doc building error Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> --------- Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com> Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com> Co-authored-by: Alexander Kozlov <alexander.kozlov@intel.com> Co-authored-by: Ella Charlaix <ella@huggingface.co> Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com> Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com> Co-authored-by: Nikita Savelyev <nikita.savelyev@intel.com> Co-authored-by: jiqing-feng <107918818+jiqing-feng@users.noreply.github.com> Co-authored-by: Ekaterina Aidova <ekaterina.aidova@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> Co-authored-by: Liubov Talamanova <liubov.talamanova@intel.com>

include nncf in openvino extras

14e623e

eaidova requested review from AlexKoff88 and echarlaix March 5, 2024 06:08

AlexKoff88 approved these changes Mar 5, 2024

View reviewed changes

echarlaix requested a review from helena-intel March 5, 2024 14:33

echarlaix approved these changes Mar 5, 2024

View reviewed changes

echarlaix merged commit 1a821b6 into huggingface:main Mar 6, 2024
8 of 10 checks passed

PenghuiCheng pushed a commit to PenghuiCheng/optimum-intel that referenced this pull request Mar 13, 2024

Include nncf in openvino extra (huggingface#586)

7516637

echarlaix mentioned this pull request Apr 16, 2024

Remove check as nncf now a required dependency #664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include nncf in openvino extras #586

include nncf in openvino extras #586

eaidova commented Mar 5, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 5, 2024

helena-intel commented Mar 5, 2024

AlexKoff88 commented Mar 6, 2024

helena-intel commented Mar 6, 2024

include nncf in openvino extras #586

include nncf in openvino extras #586

Conversation

eaidova commented Mar 5, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Mar 5, 2024

helena-intel commented Mar 5, 2024

AlexKoff88 commented Mar 6, 2024

helena-intel commented Mar 6, 2024

eaidova commented Mar 5, 2024 •

edited

Loading