Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slight performance improving(ㄒoㄒ) #50

Open
480284856 opened this issue Dec 14, 2023 · 1 comment
Open

slight performance improving(ㄒoㄒ) #50

480284856 opened this issue Dec 14, 2023 · 1 comment

Comments

@480284856
Copy link

I only got a little improvement than the native code. Was there any I missed?

Commands

cli 1:
time python generate.py --compile --compile_prefill --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens 32 --num_samples 50

cli 2:
time python generate.py --checkpoint_path /root/gpt-fast/codellama-34b-python/model_int8.pth --prompt "def quicksort(arr):" --max_new_tokens 32 --num_samples 50

Results

result of cli 1: 4.45tokens/sec & 151.52GB/s for bandwidth
result of cli 2: 4.24tokens/sec & 144.55GB/s for bandwidth

relative improvement(compile vs not compile):
speed: 4.9%
memory bandwidth: 4.8%

Env

gpu: 1*L40S
docker: python:3.9
pytorch installation: pip install torch

@480284856 480284856 changed the title slightly performance improving(ㄒoㄒ) slight performance improving(ㄒoㄒ) Dec 14, 2023
@Chillee
Copy link
Contributor

Chillee commented Dec 15, 2023

Are you using pytorch nightly? This perf seems much worse than I would expect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants