Simple LLM Inference: Interactive Text Generation with Intel GPUs

LLM Inference example is a Python script designed to demonstrate interactive text generation using pre-trained Language Models (LLMs) from Hugging Face Transformers and Intel dGPUs. Although the code structure resembles a chatbot, it's important to note that the models used were not specifically trained for conversational purposes. This repository provides a Command Line Interface (CLI) to interact with the models in two modes: with context (remembering previous interactions) and without context. The code is optimized to run on Intel GPUs using Intel Extension for PyTorch (IPEX).

Features

Model Selection: Choose between predefined models or enter a custom model repository from Hugging Face Hub.
Context Control: Interact with the model in two modes, with and without context.
Generation Parameters Control: Customize the response generation by adjusting parameters like temperature, top_p, top_k, num_beams, and repetition_penalty.
Repetition Removal: The code includes logic to remove repetitive sentences from the generated text.

Prerequisites

Python 3.6 or higher
Intel Extension for PyTorch (IPEX)
Hugging Face Transformers
Hugging Face Accelerate

Setup

Clone the repository

git clone https://github.com/rahulunair/simple_llm_inference.git && cd simple_llm_inference

Install dependencies:

python -m pip install torch==2.0.1a0 torchvision==0.15.2a0 intel_extension_for_pytorch==2.0.110+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
python -m pip install transformers accelerate sentencepiece

To know more about IPEX on Intel GPUs use the following guide.

Usage

Run bot.py

python bot.py

Follow the on-screen prompts to select a model and the interaction mode.

Sample output

Please select a model:
1. Writer/camel-5b-hf
2. openlm-research/open_llama_3b_v2
3. Enter a custom model repo from HuggingFace Hub
Enter 1 to 3: 2
Using max length: 256
Note: This is a demonstration using pretrained models which were not fine-tuned for chat.
You can choose between two modes of interaction:
1. Interact with context
2. Interact without context
Enter 1 or 2: 1
You: Hello, Bot!
Bot: Hello! How can I assist you today?

Contributing

Feel free to submit issues and pull requests. Contributions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
utils		utils
LICENSE		LICENSE
README.md		README.md
bot.py		bot.py
simple_llm_inference.ipynb		simple_llm_inference.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple LLM Inference: Interactive Text Generation with Intel GPUs

Features

Prerequisites

Setup

Usage

Sample output

Contributing

About

Releases

Packages

Languages

License

rahulunair/simple_llm_inference

Folders and files

Latest commit

History

Repository files navigation

Simple LLM Inference: Interactive Text Generation with Intel GPUs

Features

Prerequisites

Setup

Usage

Sample output

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages