slight performance improving(ㄒoㄒ) #50

480284856 · 2023-12-14T07:27:45Z

I only got a little improvement than the native code. Was there any I missed?

Commands

cli 1:
time python generate.py --compile --compile_prefill --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens 32 --num_samples 50

cli 2:
time python generate.py --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens 32 --num_samples 50

Results

result of cli 1: 4.45tokens/sec & 151.52GB/s for bandwidth
result of cli 2: 4.24tokens/sec & 144.55GB/s for bandwidth

relative improvement(compile vs not compile):
speed: 4.9%
memory bandwidth: 4.8%

Env

gpu： 1*L40S
docker: python:3.9
pytorch installation: pip install torch

Chillee · 2023-12-15T01:51:35Z

Are you using pytorch nightly? This perf seems much worse than I would expect

480284856 changed the title ~~slightly performance improving(ㄒoㄒ)~~ slight performance improving(ㄒoㄒ) Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slight performance improving(ㄒoㄒ) #50

slight performance improving(ㄒoㄒ) #50

480284856 commented Dec 14, 2023

Chillee commented Dec 15, 2023

slight performance improving(ㄒoㄒ) #50

slight performance improving(ㄒoㄒ) #50

Comments

480284856 commented Dec 14, 2023

Commands

Results

Env

Chillee commented Dec 15, 2023