Enable NeuralChat Unit Test process #195

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* initial commit of n_head_kv in MQA Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com> * add attn ln Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com> * reorder QKV weight when convert Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com> * fix typo Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com> * cherry-pick ggml MQA Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com> * fix kv cache and reduce handmade mem buffer size Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com> --------- Signed-off-by: Yu, Zhentao <zhentao.yu@intel.com>

no need to maintain mpt model any more in itrex (contained in transformers 4.32.0) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Haihao Shen <haihao.shen@intel.com>

* Update README.md Update the readme * Update README.md * Update README.md * Update README.md

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

* Update README.md * Refine the collaboration Signed-off-by: hshen14 <haihao.shen@intel.com> --------- Signed-off-by: hshen14 <haihao.shen@intel.com>

* refine code-generation example Signed-off-by: changwangss <chang1.wang@intel.com> * remove code Signed-off-by: changwangss <chang1.wang@intel.com> * remove invalid code * improve readme and line length Signed-off-by: changwangss <chang1.wang@intel.com> --------- Signed-off-by: changwangss <chang1.wang@intel.com> Co-authored-by: Haihao Shen <haihao.shen@intel.com>

* add gptq examples Signed-off-by: YIYANGCAI <yiyang.cai@intel.com> --------- Signed-off-by: YIYANGCAI <yiyang.cai@intel.com> Co-authored-by: xinhe <xin3.he@intel.com>

* add OPTIMIZATION_ONLY for setup Signed-off-by: Xin He <xin3.he@intel.com> * change name: backends to runtime Signed-off-by: Xin He <xin3.he@intel.com> --------- Signed-off-by: Xin He <xin3.he@intel.com>

This reverts commit 120e233.

Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Refine Inference Workflow Readme --------- Signed-off-by: hshen14 <haihao.shen@intel.com> Co-authored-by: lvliang-intel <liang1.lv@intel.com> Co-authored-by: Wang, Chang <chang1.wang@intel.com>

) Signed-off-by: jiafu zhang <jiafu.zhang@intel.com> Co-authored-by: chen, suyue <suyue.chen@intel.com>

* add finetuning test for mpt-7b-chat with hpu Signed-off-by: jiafu zhang <jiafu.zhang@intel.com> --------- Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

* add s8 perchannel quant and kernel. * add QKV , add fusion support for s8 PerN * add amx_int8 pern gelu fusion * add gelu add fusion for vnni * split jblas file. add compute type fp32. * add comp_type fp32 for ffn fusion * add bf16 for s4 and s4 ffn fusion * add workspace for jblas functions * keep one jblas code * disable mmap as default. change arg --no_mmap to --use_mmap.

* add OPTIMIZATION_ONLY for setup Signed-off-by: Xin He <xin3.he@intel.com> * change name: backends to runtime Signed-off-by: Xin He <xin3.he@intel.com> * fix bug Signed-off-by: Xin He <xin3.he@intel.com> --------- Signed-off-by: Xin He <xin3.he@intel.com>

* Update generate.py * limit autocast Signed-off-by: changwangss <chang1.wang@intel.com> * update readme Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * update readme Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> * Unify the BKC settings Signed-off-by: hshen14 <haihao.shen@intel.com> * Unify the BKC settings Signed-off-by: hshen14 <haihao.shen@intel.com> * Simplify docker file readme Signed-off-by: hshen14 <haihao.shen@intel.com> * Format the readme Signed-off-by: hshen14 <haihao.shen@intel.com> * Add short description Signed-off-by: hshen14 <haihao.shen@intel.com> --------- Signed-off-by: changwangss <chang1.wang@intel.com> Signed-off-by: Lv, Liang1 <liang1.lv@intel.com> Signed-off-by: hshen14 <haihao.shen@intel.com> Co-authored-by: Lv, Liang1 <liang1.lv@intel.com> Co-authored-by: hshen14 <haihao.shen@intel.com>

* refine reademe * refine reademe * refine table * Refine LLM Runtime readme Signed-off-by: hshen14 <haihao.shen@intel.com> * Continue updating the readme Signed-off-by: hshen14 <haihao.shen@intel.com> * Simplify the readme Signed-off-by: hshen14 <haihao.shen@intel.com> * add back run_llm.py * change script arg name * rename arg * fix * add description * add another way to convert model * remove additional line * refine readme * refine readme, but we need to modify convert script later * fix model_maps Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * fix convert_gptj Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> * refine readme * refine --------- Signed-off-by: hshen14 <haihao.shen@intel.com> Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: hshen14 <haihao.shen@intel.com> Co-authored-by: zhenwei-intel <zhenwei.liu@intel.com>

* Update README.md * Update README.md * Update README.md --------- Co-authored-by: Haihao Shen <haihao.shen@intel.com>

* refined finetuning config. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * updated readme for new finetuning config. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> * simplified code. Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com> --------- Signed-off-by: Ye, Xinyu <xinyu.ye@intel.com>

Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

* support bloom Signed-off-by: Dong, Bo1 <bo1.dong@intel.com>

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…ransformers into lvl/neuralchat_ut Signed-off-by: Lv, Liang1 <liang1.lv@intel.com>