GitHub - huggingface/hf-rocm-benchmark: A reproducible benchmark of Text Generation Inference and Transformers as of April 2024 on AMD Instinct MI250 and MI300

TGI benchmark

TGI benchmark with TP=8 can be reproduced as follow on MI250 and MI300:

docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
    --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g \
    --net host -v $(pwd)/hf_cache:/data -e HUGGING_FACE_HUB_TOKEN=$HF_READ_TOKEN \
    ghcr.io/huggingface/text-generation-inference:sha-293b8125-rocm \
    --model-id meta-llama/Meta-Llama-3-70B-Instruct --num-shard 8

Then, a second shell needs to be open in TGI's server container:

docker container ls
docker exec -it container_name /bin/bash

From the second shell:

huggingface-cli login --token your_hf_read_token

text-generation-benchmark --tokenizer-name meta-llama/Meta-Llama-3-70B-Instruct \
    --sequence-length 2048 --decode-length 128 --warmups 2 --runs 10 \
    -b 1 -b 2 -b 4 -b 8 -b 16 -b 32 -b 64

Once the benchmark is finished, one can press Ctrl+C in the benchmark shell and should find a markdown table summarizing prefill and decode latency, as well as throughput.

Note: TGI's tool text-generation-benchmark tends to OOM, which does not reflect the real memory limit of the benchmarked GPUs. For reference: huggingface/text-generation-inference#1831, huggingface/text-generation-inference#1286

Note: Once released, we recommend to use the image ghcr.io/huggingface/text-generation-inference:2.1-rocm instead of ghcr.io/huggingface/text-generation-inference:sha-293b8125-rocm. TGI on ROCm can also be built from source using this dockerfile.

Recommended setup

We recommend setting on the host (reference):

sudo sh -c "/usr/bin/echo 0 > /proc/sys/kernel/numa_balancing"
sudo rocm-smi --setperfdeterminism 1900

More details: https://github.com/ROCm/triton/wiki/A-script-to-set-program-execution-environment-in-ROCm

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
deepspeed_zero3.yml		deepspeed_zero3.yml
peft_fine_tuning.py		peft_fine_tuning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TGI benchmark

Recommended setup

About

Releases

Packages

Languages

License

huggingface/hf-rocm-benchmark

Folders and files

Latest commit

History

Repository files navigation

TGI benchmark

Recommended setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages