⚙️ AMD GPU Inference

This project provides a Docker-based inference engine for running Large Language Models (LLMs) on AMD GPUs. It's designed to work with models from Hugging Face, with a focus on the LLaMA model family.

Prerequisites

AMD GPU with ROCm support
Docker installed on your system
ROCm drivers installed on your host system (version 5.4.2 or compatible)

Project Structure

amd-gpu-inference/
├── src/
│   ├── __init__.py
│   ├── engine.py
│   ├── model.py
│   ├── utils.py
│   └── amd_setup.py
├── Dockerfile
├── requirements.txt
├── run_inference.py
├── run-docker-amd.sh
└── README.md

Quick Start

Clone this repository:

git clone https://github.com/slashml/amd-gpu-inference.git
cd amd-gpu-inference

Make the run script executable:
```
chmod +x run-docker-amd.sh
```
Run the inference engine with a specified model and prompt:
```
./run-docker-amd.sh "meta-llama/Llama-2-7b-chat-hf" "Translate the following English text to French: 'Hello, how are you?'"
```
Replace "meta-llama/Llama-2-7b-chat-hf" with the Hugging Face model you want to use, and provide your own prompt.

Detailed Usage

Aptfile

The project includes an Aptfile that lists the necessary ROCm packages to be installed in the Docker container. This ensures that all required ROCm drivers and libraries are available for the inference engine to utilize the AMD GPU effectively.

Building the Docker Image

The run-docker-amd.sh script builds the Docker image automatically. If you want to build it manually, use:

docker build -t amd-gpu-inference .

Running the Container

The run-docker-amd.sh script handles running the container with the necessary AMD GPU flags. If you want to run it manually:

docker run --rm -it \
    --device=/dev/kfd \
    --device=/dev/dri \
    --group-add=video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    amd-gpu-inference "model_name" "your prompt here"

Replace "model_name" with the Hugging Face model you want to use, and "your prompt here" with your input text.

Customization

Changing the Model

You can use any model available on Hugging Face by specifying its repository name when running the container. For example:

./run-docker-amd.sh "facebook/opt-1.3b" "Your prompt here"

Modifying the Inference Logic

If you need to change how the inference is performed, modify the run_inference.py file. Remember to rebuild the Docker image after making changes.

Troubleshooting

Ensure that your AMD GPU drivers and ROCm are correctly installed and configured on your host system.
If you encounter "out of memory" errors, try using a smaller model or reducing the input/output length.
For model-specific issues, refer to the model's documentation on Hugging Face.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgements

This project uses the Hugging Face Transformers library.
ROCm is developed by AMD. Licensed under MIT License See https://rocm.docs.amd.com/en/latest/about/license.html for details.

For any questions or issues, please open an issue in the GitHub repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚙️ AMD GPU Inference

Prerequisites

Project Structure

Quick Start

Detailed Usage

Aptfile

Building the Docker Image

Running the Container

Customization

Changing the Model

Modifying the Inference Logic

Troubleshooting

Contributing

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
src		src
Aptfile		Aptfile
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run-docker-amd.sh		run-docker-amd.sh
run_inference.py		run_inference.py

License

slashml/amd_inference

Folders and files

Latest commit

History

Repository files navigation

⚙️ AMD GPU Inference

Prerequisites

Project Structure

Quick Start

Detailed Usage

Aptfile

Building the Docker Image

Running the Container

Customization

Changing the Model

Modifying the Inference Logic

Troubleshooting

Contributing

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages