<h1 style="text-align: center">Large Language Model on Edge</h1>

### What is Large Language Model (LLM)

A large language model (LLM) is a deep learning algorithm that can perform a variety of natural language processing (NLP) tasks. Large language models use **transformer** models and are trained using massive datasets — hence, large. This enables them to recognize, translate, predict, or generate text or other content.

<img src="./imgs/tf_image.png" width="800"/>


LLMs have a rich history dating back to the 1960s with the inception of the first chatbot, Eliza, designed by MIT researcher Joseph Weizenbaum. Eliza utilized pattern recognition to simulate human conversation, laying the foundation for research into NLP. The field saw key innovations, including the introduction of Long Short-Term Memory (LSTM) networks in 1997, enabling the development of deeper neural networks. Stanford's CoreNLP suite in 2010 provided essential tools for complex NLP tasks. The launch of Google Brain in 2011 offered researchers powerful resources, contributing to advancements like Transformer models in 2017. This architecture, exemplified by OpenAI's GPT-3, has revolutionized LLMs. Recent years witnessed the emergence of user-friendly frameworks like Hugging Face and BARD, empowering researchers and developers to create their own LLMs, further propelling the field forward.

![image.png](attachment:98858846-32ac-43df-983e-b7bc25337977.png)

[Reference: LLM History Evolutions & Future](https://www.scribbledata.io/large-language-models-history-evolutions-and-future/#:~:text=A%20large%20language%20model%2C%20or,sentiment%20analysis%20to%20mathematical%20reasoning.)

### What is the difference between LLM and Generative AI?
Generative AI is an umbrella term that refers to artificial intelligence models that have the capability to generate content. Generative AI can generate text, code, images, video, and music. Examples of generative AI include Midjourney, DALL-E, and ChatGPT.

LLM is a type of generative AI that is trained on text and produce textual content. ChatGPT is a popular example of generative text AI.

All LLMs are Generative AI.

### Types of LLM

1. **Generic or raw language models**
   predict the next word based on the language in the training data. These language models perform information retrieval tasks
3. **Instruction-tuned language models**
   trained to predict responses to the instructions given in the input. These allow them to perform sentiment analysis, or to generate text or code
5. **Dialog-tuned language models**
   trained to have a dialog by predicting the next response. Think of chatbots or conversational AI AI.

<h1 style="text-align: center">Intel® Offering for LLM</h1>

<div align="center" style="margin-top: 20px">
<h2> Inferencing </h2>
<img src="./imgs/openvino-logo-purple-black.png" width="300"/></br>
<img src="./imgs/hf_logo.png" width="300"/>
</div>
<div align="center" style="margin-top: 20px">
<h2> Fine Tuning </h2><img src="./imgs/bigdl_logo.png" width="300"/>
</div>


## OpenVINO™ + Optimum Intel

<p style='text-align: justify;'>
🤗 Optimum Intel is the interface between the 🤗 Transformers and Diffusers libraries and the different tools and libraries provided by Intel® to accelerate end-to-end pipelines on Intel® architectures.

Intel® Neural Compressor is an open-source library enabling the usage of the most popular compression techniques such as quantization, pruning and knowledge distillation. It supports automatic accuracy-driven tuning strategies in order for users to easily generate quantized model. The users can easily apply static, dynamic and aware-training quantization approaches while giving an expected accuracy criteria. It also supports different weight pruning techniques, enabling the creation of pruned model giving a predefined sparsity target. OpenVINO™ is an open-source toolkit that enables high performance inference capabilities for Intel® CPUs, GPUs, and special DL inference engines. It is supplied with a set of tools to optimize your models with compression techniques such as quantization, pruning and knowledge distillation. Optimum Intel provides a simple interface to optimize your Transformers and Diffusers models, convert them to the OpenVINO™ Intermediate Representation (IR) format and run inference using OpenVINO™ Runtime.
</p>


## BigDL-LLM
**[`bigdl-llm`](https://bigdl.readthedocs.io/en/latest/doc/LLM/index.html)** is a library for running **LLM** on Intel® **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4** with very low latency[^1] (for any **PyTorch** model).

Developers can enhance their LLM models for edge devices by utilizing BigDL-LLM and INT4 support on compatible Intel® XPUs. This optimization enables efficient execution, improved memory utilization, and enhanced computational performance.

> *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
>

You can use BigDL-LLM to run any **Hugging Face Transformers PyTorch model**. It automatically optimizes and accelerates LLMs using low-precision **(INT4/INT5/INT8)** techniques, modern hardware accelerations and latest software optimizations.
Hugging Face transformers-based applications can run on BigDL-LLM with one-line code change, and you’ll immediately observe significant sped up

## Supported Models
Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, 
Mistral, Falcon, MPT, Dolly, StarCoder, Whisper, Baichuan, InternLM, QWen, Aquila, MOSS,* and more;  see the complete list below.


| Model      | CPU      | GPU     |
|------------|----------|---------| 
| LLaMA      | **Yes**  | **Yes** |  
| Vicuna     | **Yes**  | **Yes** |
| Guanaco    | **Yes**  | **Yes** |
| Baize      | **Yes**  | **Yes** |
| WizardLM   | **Yes**  | **Yes** |
| LLaMA 2    | **Yes**  | **Yes** |
| ChatGLM    | **Yes**  | **No**  |
| ChatGLM2   | **Yes**  | **Yes** |
| ChatGLM3   | **Yes**  | **Yes** |
| Mistral    | **Yes**  | **Yes** |
| Falcon     | **Yes**  | **Yes** |
| MPT        | **Yes**  | **Yes** |
| Dolly-v1   | **Yes**  | **Yes** |
| Dolly-v2   | **Yes**  | **Yes** |
| Replit Code| **Yes**  | **Yes** |
| RedPajama  | **Yes**  | **No**  |
| Phoenix    | **Yes**  | **No**  |
| StarCoder  | **Yes**  | **Yes** |
| Baichuan   | **Yes**  | **Yes** |
| Baichuan2  | **Yes**  | **Yes** |
| InternLM   | **Yes**  | **Yes** |
| Qwen       | **Yes**  | **Yes** |
| Qwen-VL    | **Yes**  | **Yes** |
| Aquila     | **Yes**  | **Yes** |
| Aquila2    | **Yes**  | **Yes** |
| MOSS       | **Yes**  | **No**  |
| Whisper    | **Yes**  | **Yes** |
| Phi-1_5    | **Yes**  | **Yes** |
| Flan-t5    | **Yes**  | **Yes** |
| Qwen-VL    | **Yes**  | **No**  |
| LLaVA      | **Yes**  | **Yes** |
| CodeLlama  | **Yes**  | **Yes** |
| Skywork    | **Yes**  | **No**  |
| InternLM-XComposer    | **Yes**  | **No**  |
| WizardCoder-Python | **Yes**  | **No**  |
| CodeShell  | **Yes**  | **No**  |
| Fuyu       | **Yes**  | **No**  |
| Distil-Whisper        | **Yes** | **Yes** |
| Yi         | **Yes**  | **Yes**|.



## Notices & Disclaimers 

Intel technologies may require enabled hardware, software or service activation. 

No product or component can be absolutely secure.  

Your costs and results may vary.  

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document, with the sole exception that code included in this document is licensed subject to the Zero-Clause BSD open source license (0BSD), Open Source Initiative. No rights are granted to create modifications or derivatives of this document. 

© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.  