Skip to content

A reproducible benchmark of Text Generation Inference and Transformers as of April 2024 on AMD Instinct MI250 and MI300

License

Notifications You must be signed in to change notification settings

huggingface/hf-rocm-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TGI benchmark

TGI benchmark with TP=8 can be reproduced as follow on MI250 and MI300:

docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
    --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 256g \
    --net host -v $(pwd)/hf_cache:/data -e HUGGING_FACE_HUB_TOKEN=$HF_READ_TOKEN \
    ghcr.io/huggingface/text-generation-inference:sha-293b8125-rocm \
    --model-id meta-llama/Meta-Llama-3-70B-Instruct --num-shard 8

Then, a second shell needs to be open in TGI's server container:

docker container ls
docker exec -it container_name /bin/bash

From the second shell:

huggingface-cli login --token your_hf_read_token

text-generation-benchmark --tokenizer-name meta-llama/Meta-Llama-3-70B-Instruct \
    --sequence-length 2048 --decode-length 128 --warmups 2 --runs 10 \
    -b 1 -b 2 -b 4 -b 8 -b 16 -b 32 -b 64

Once the benchmark is finished, one can press Ctrl+C in the benchmark shell and should find a markdown table summarizing prefill and decode latency, as well as throughput.

Note: TGI's tool text-generation-benchmark tends to OOM, which does not reflect the real memory limit of the benchmarked GPUs. For reference: huggingface/text-generation-inference#1831, huggingface/text-generation-inference#1286

Note: Once released, we recommend to use the image ghcr.io/huggingface/text-generation-inference:2.1-rocm instead of ghcr.io/huggingface/text-generation-inference:sha-293b8125-rocm. TGI on ROCm can also be built from source using this dockerfile.

Recommended setup

We recommend setting on the host (reference):

sudo sh -c "/usr/bin/echo 0 > /proc/sys/kernel/numa_balancing"
sudo rocm-smi --setperfdeterminism 1900

More details: https://github.com/ROCm/triton/wiki/A-script-to-set-program-execution-environment-in-ROCm

About

A reproducible benchmark of Text Generation Inference and Transformers as of April 2024 on AMD Instinct MI250 and MI300

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages