Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD RX 7900 XTX Wrong outputs #120

Closed
makaveli10 opened this issue Mar 1, 2024 · 0 comments
Closed

AMD RX 7900 XTX Wrong outputs #120

makaveli10 opened this issue Mar 1, 2024 · 0 comments

Comments

@makaveli10
Copy link

makaveli10 commented Mar 1, 2024

Hello, I was benchmarking this on a AMD device in docker container rocm/pytorch:latest which has pytorch 2.2.0 the compilation works and so doesn the quantization but generation is bugged?

./scripts/prepare.sh $MODEL_REPO

Model config {'block_size': 2048, 'vocab_size': 32000, 'n_layer': 32, 'n_head': 32, 'dim': 4096, 'intermediate_size': 11008, 'n_local_heads': 32, 'head_dim': 128, 'rope_base': 10000, 'norm_eps': 1e-05}
Saving checkpoint to checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth
Loading model ...
Quantizing model weights for int8 weight-only symmetric per-channel quantization
Writing quantized weights to checkpoints/meta-llama/Llama-2-7b-chat-hf/model_int8.pth

When I run generate:

python generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model.pth --prompt "Hello, my name is"
Using device=cuda
Loading model ...
Time to load model: 23.11 seconds
/home/workspace/gpt-fast/model.py:189: UserWarning: 1Torch was not compiled with memory efficient attention. (Triggered internally at /home/workspace/pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:507.)
  y = F.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0)
/root/.local/lib/python3.9/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
/root/.local/lib/python3.9/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
/root/.local/lib/python3.9/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
  warnings.warn(
Compilation time: 60.60 seconds
Hello, my name is ⁇  ⁇ � ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇ � ⁇ � ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ �� ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ �� ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇ � ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇ �� ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇ � ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇ � ⁇ �� ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇ �� ⁇ � ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇  ⁇ � ⁇ ��� ⁇  ⁇ � ⁇  ⁇  ⁇  ⁇ � ⁇ � ⁇  ⁇ � ⁇ 
Time for inference 1: 3.46 sec total, 57.76 tokens/sec

Not sure whats going wrong because I use the same container for another transformer based model which works with torch compile for me.

Further, I tried compiling torch nightly 2.3.0 from source to see if that would change anything which compiles but no luck there as well it gives the same outputs on generate.

Without --compile generation works as expected:

python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --prompt "Hello, my name is"

Using device=cuda
Loading model ...
Time to load model: 4.04 seconds
Hello, my name is Samantha and I'm a 19-year-old from Australia. I've been playing guitar for about 4 years now and I absolutely love it. I'm mostly self-taught, but I do take lessons occasionally to improve my skills. My favorite genre to play is indie rock, but I also enjoy playing acoustic covers of pop and folk songs. When I'm not playing guitar, I'm usually listening to music or writing songs. I'm super excited to be here and can't wait to connect with other guitar enthusiasts! 🎸❤️t was a beautiful summer evening, and Samantha was sitting in her backyard, strumming her guitar. She had just finished a long day of work and was feeling a bit stressed, so she decided to take a break and play some music. As she played, she felt her worries slowly drifting away,
Time for inference 4: 7.30 sec total, 27.40 tokens/sec
Bandwidth achieved: 369.30 GB/s

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant