llm_benchmark on llama2-70B

Run on A100(40GiB)x8, with tensor para = 8.

	batch_size: 32, input_len: 1, output_len: 2048	batch_size: 24, input_len: 1024, output_len: 1024
Huggingface	948.26 s	439.02 s
Triton	70.19 s	38.97 s
vLLM	133.70 s	76.12 s

Name		Name	Last commit message	Last commit date
Latest commit History 350 Commits
new		new
prompts		prompts
.gitignore		.gitignore
README.md		README.md
benchmark_huggingface.py		benchmark_huggingface.py
benchmark_triton.py		benchmark_triton.py
benchmark_triton_real.py		benchmark_triton_real.py
benchmark_trtllm.py		benchmark_trtllm.py
benchmark_vllm.py		benchmark_vllm.py
main.py		main.py
utils.py		utils.py

Provide feedback