Run LLM chat in realtime on an 8GB NVIDIA GPU

Dockerfile for alpaca_lora_4bit

This repo is a Dockerfile wrapper for https://github.com/johnsmith0031/alpaca_lora_4bit

Use

Can run real-time LLM chat using alpaca on a 8GB NVIDIA/CUDA GPU (ie 3070 Ti mobile)

Requirements

Linux
Docker
NVIDIA GPU with driver version that supports CUDA 11.7+ (e.g. 525)

Installation

git clone https://github.com/andybarry/alpaca_lora_4bit_docker.git
docker build -t alpaca_lora_4bit .
docker run --gpus=all -p 7860:7860 alpaca_lora_4bit

Point your browser to http://localhost:7860

Results

It's fast on a 3070 Ti mobile. Uses 5-6 GB of GPU RAM.

The model isn't all that good, sometimes it goes crazy. But hey, as I always say, "when 4-bits you reach look as good, you will not."

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
text-generation-webui		text-generation-webui
.gitignore		.gitignore
Dockerfile		Dockerfile
Finetune4bConfig.py		Finetune4bConfig.py
LICENSE		LICENSE
README.md		README.md
alpaca_lora_4bit_penguin_fact.gif		alpaca_lora_4bit_penguin_fact.gif
amp_wrapper.py		amp_wrapper.py
arg_parser.py		arg_parser.py
autograd_4bit.py		autograd_4bit.py
data.txt		data.txt
finetune.py		finetune.py
gradient_checkpointing.py		gradient_checkpointing.py
inference.py		inference.py
matmul_utils_4bit.py		matmul_utils_4bit.py
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt
train_data.py		train_data.py
triton_utils.py		triton_utils.py

License

lguzzon-scratchbook/alpaca_lora_4bit_docker

Folders and files

Latest commit

History

Repository files navigation

Run LLM chat in realtime on an 8GB NVIDIA GPU

Dockerfile for alpaca_lora_4bit

Use

Requirements

Installation

Results

References

About

Resources

License

Stars

Watchers

Forks

Languages