New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance with MPS on AMD GPUs are worse than CPU #78210
Comments
I ran your tests on an Intel iMac 27" 2020 with an AMD Radeon Pro 5700 XT
|
My guess is it might just be the 5300M is a lot slower than the 5700 XT. In this particular case, the CPU might be faster than the GPU for such a low-end GPU. |
It may make sense for the 5300M, but I do not see why the 5700XT 16GB is going as fast as the CPU of the iMac. |
Oh, I see. |
batch_size affects performance dramatically...
|
It almost scales with batch size. Almost like it's doing every batch a 'batch' number of times. Edit: is the test just running batch X 100? |
CPU
|
Did this ever get any more attention? I have a Mac with CPU The image size is roughly 1GB so it should theoretically fit fully in VRAM I think. I did confirm GPU usage sits near 100% the whole time for the 560X using the Activity Monitor GPU view. I did not force the mac to use the Here's some sample code: question: str
image: PIL.image
model_name = "google/pix2struct-ocrvqa-base"
processor = Pix2StructProcessor.from_pretrained(model_name)
inputs = processor(
images=image,
text=question,
return_tensors="pt",
)
model = Pix2StructForConditionalGeneration.from_pretrained(model_name)
has_mps = torch.backends.mps.is_available()
built_with_mps = torch.backends.mps.is_built()
if has_mps and built_with_mps:
model = model.to('mps')
inputs = inputs.to('mps')
predictions = model.generate(
**inputs,
max_length=256,
)
result = processor.decode(
predictions[0],
)
print(result) Also I'm very new at this stuff, so I wouldn't be surprised if I'm missing some significant settings I should be using. I just did what https://huggingface.co/google/pix2struct-ai2d-base said to do (the ocrvqa model says to follow those instructions). I don't necessarily need a solution, I just want to provide another data point. Thanks! |
馃悰 Describe the bug
I tried running some experiments on the RX5300M 4GB GPU and everything seems to work correctly. The problem is that the performance are worse than the ones on the CPU of the same Mac.
To reproduce, just clone the tests in this repo
https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks
and run eitheror
While the CPU took
143s
, with the MPS backend the test completed in228s
. I'm sure the GPU was being because I constantly monitored the usage withActivity Monitor
.Versions
PyTorch version: 1.13.0.dev20220524
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 12.3.1 (x86_64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A
Python version: 3.8.12 (default, Oct 12 2021, 06:23:56) [Clang 10.0.0 ] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.13.0.dev20220524
[pip3] torchaudio==0.12.0.dev20220524
[pip3] torchvision==0.13.0.dev20220524
[conda] numpy 1.22.4 pypi_0 pypi
[conda] torch 1.13.0.dev20220524 pypi_0 pypi
[conda] torchaudio 0.12.0.dev20220524 pypi_0 pypi
[conda] torchvision 0.13.0.dev20220524 pypi_0 pypi
cc @VitalyFedyunin @ngimel
The text was updated successfully, but these errors were encountered: