Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance with MPS on AMD GPUs are worse than CPU #78210

Open
lucadiliello opened this issue May 24, 2022 · 8 comments
Open

Performance with MPS on AMD GPUs are worse than CPU #78210

lucadiliello opened this issue May 24, 2022 · 8 comments
Labels
module: mps Related to Apple Metal Performance Shaders framework module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@lucadiliello
Copy link

lucadiliello commented May 24, 2022

馃悰 Describe the bug

I tried running some experiments on the RX5300M 4GB GPU and everything seems to work correctly. The problem is that the performance are worse than the ones on the CPU of the same Mac.

To reproduce, just clone the tests in this repo https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks and run either

python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 16

or

python tests/transformers_sequence_classification.py --device cpu --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 16

While the CPU took 143s, with the MPS backend the test completed in 228s. I'm sure the GPU was being because I constantly monitored the usage with Activity Monitor.

Versions

PyTorch version: 1.13.0.dev20220524
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 12.3.1 (x86_64)
GCC version: Could not collect
Clang version: 13.1.6 (clang-1316.0.21.2.5)
CMake version: Could not collect
Libc version: N/A

Python version: 3.8.12 (default, Oct 12 2021, 06:23:56) [Clang 10.0.0 ] (64-bit runtime)
Python platform: macOS-10.16-x86_64-i386-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] torch==1.13.0.dev20220524
[pip3] torchaudio==0.12.0.dev20220524
[pip3] torchvision==0.13.0.dev20220524
[conda] numpy 1.22.4 pypi_0 pypi
[conda] torch 1.13.0.dev20220524 pypi_0 pypi
[conda] torchaudio 0.12.0.dev20220524 pypi_0 pypi
[conda] torchvision 0.13.0.dev20220524 pypi_0 pypi

cc @VitalyFedyunin @ngimel

@malfet malfet added module: performance Issues related to performance, either of kernel code or framework glue module: mps Related to Apple Metal Performance Shaders framework triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 24, 2022
@dbl001
Copy link

dbl001 commented May 25, 2022

I ran your tests on an Intel iMac 27" 2020 with an AMD Radeon Pro 5700 XT
E.g.
3.8 GHz 8-Core Intel Core i7
AMD Radeon Pro 5700 XT 16 GB
The Activity Monitor showed heavy GPU usage on the 'mps' test.

% python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 16
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 570/570 [00:00<00:00, 209kB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 213k/213k [00:00<00:00, 997kB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 436k/436k [00:00<00:00, 1.47MB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 29.0/29.0 [00:00<00:00, 26.6kB/s]
Downloading: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 436M/436M [00:40<00:00, 10.7MB/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([16, 128])
INFO:root: * attention_mask: torch.Size([16, 128])
INFO:root: * labels: torch.Size([16])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [01:40<00:00,  1.00s/it]
INFO:root:Model bert-base-cased took 100.02 seconds to do 100 steps in inference with batch size 16 on mps.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % python tests/transformers_sequence_classification.py --device cpu --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 16
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([16, 128])
INFO:root: * attention_mask: torch.Size([16, 128])
INFO:root: * labels: torch.Size([16])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [01:38<00:00,  1.02it/s]
INFO:root:Model bert-base-cased took 98.11 seconds to do 100 steps in inference with batch size 16 on cpu.
 % python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 8
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([8, 128])
INFO:root: * attention_mask: torch.Size([8, 128])
INFO:root: * labels: torch.Size([8])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:51<00:00,  1.94it/s]
INFO:root:Model bert-base-cased took 51.48 seconds to do 100 steps in inference with batch size 8 on mps.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % 

@xsacha
Copy link
Contributor

xsacha commented May 25, 2022

My guess is it might just be the 5300M is a lot slower than the 5700 XT. In this particular case, the CPU might be faster than the GPU for such a low-end GPU.
It doesn't mean the GPU is useless as it still offloads work from the CPU.

@lucadiliello
Copy link
Author

My guess is it might just be the 5300M is a lot slower than the 5700 XT. In this particular case, the CPU might be faster than the GPU for such a low-end GPU.
It doesn't mean the GPU is useless as it still offloads work from the CPU.

It may make sense for the 5300M, but I do not see why the 5700XT 16GB is going as fast as the CPU of the iMac.

@xsacha
Copy link
Contributor

xsacha commented May 26, 2022

Oh, I see.

@dbl001
Copy link

dbl001 commented May 26, 2022

batch_size affects performance dramatically...

% python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 4
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([4, 128])
INFO:root: * attention_mask: torch.Size([4, 128])
INFO:root: * labels: torch.Size([4])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:28<00:00,  3.51it/s]
INFO:root:Model bert-base-cased took 28.49 seconds to do 100 steps in inference with batch size 4 on mps.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 2
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([2, 128])
INFO:root: * attention_mask: torch.Size([2, 128])
INFO:root: * labels: torch.Size([2])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:16<00:00,  6.11it/s]
INFO:root:Model bert-base-cased took 16.39 seconds to do 100 steps in inference with batch size 2 on mps.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % python tests/transformers_sequence_classification.py --device mps --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 1
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([1, 128])
INFO:root: * attention_mask: torch.Size([1, 128])
INFO:root: * labels: torch.Size([1])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:09<00:00, 10.37it/s]
INFO:root:Model bert-base-cased took 9.67 seconds to do 100 steps in inference with batch size 1 on mps.

@xsacha
Copy link
Contributor

xsacha commented May 26, 2022

It almost scales with batch size. Almost like it's doing every batch a 'batch' number of times.

Edit: is the test just running batch X 100?

@dbl001
Copy link

dbl001 commented May 26, 2022

CPU

 % python tests/transformers_sequence_classification.py --device cpu --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 1
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([1, 128])
INFO:root: * attention_mask: torch.Size([1, 128])
INFO:root: * labels: torch.Size([1])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:05<00:00, 16.86it/s]
INFO:root:Model bert-base-cased took 5.95 seconds to do 100 steps in inference with batch size 1 on cpu.
 % python tests/transformers_sequence_classification.py --device cpu --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 2
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([2, 128])
INFO:root: * attention_mask: torch.Size([2, 128])
INFO:root: * labels: torch.Size([2])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:10<00:00,  9.12it/s]
INFO:root:Model bert-base-cased took 10.98 seconds to do 100 steps in inference with batch size 2 on cpu.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % 
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % python tests/transformers_sequence_classification.py --device cpu --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 4
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([4, 128])
INFO:root: * attention_mask: torch.Size([4, 128])
INFO:root: * labels: torch.Size([4])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:22<00:00,  4.46it/s]
INFO:root:Model bert-base-cased took 22.45 seconds to do 100 steps in inference with batch size 4 on cpu.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % python tests/transformers_sequence_classification.py --device cpu --pre_trained_name bert-base-cased --mode inference --steps 100 --sequence_length 128 --batch_size 8
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
INFO:root:Input tensors size:
INFO:root: * input_ids: torch.Size([8, 128])
INFO:root: * attention_mask: torch.Size([8, 128])
INFO:root: * labels: torch.Size([8])
Testing...: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 100/100 [00:43<00:00,  2.31it/s]
INFO:root:Model bert-base-cased took 43.39 seconds to do 100 steps in inference with batch size 8 on cpu.
(base) davidlaxer@x86_64-apple-darwin13 pytorch-apple-silicon-benchmarks % 

@james-brown-upfeat
Copy link

Did this ever get any more attention?

I have a Mac with CPU 2.9 GHz 6-Core Intel Core i9 and GPU Radeon Pro 560X 4 GB, and when I run google/pix2struct-ocrvqa-base on a random image, it takes 38.959s on CPU, but seems like it would just run forever on GPU.

The image size is roughly 1GB so it should theoretically fit fully in VRAM I think.

I did confirm GPU usage sits near 100% the whole time for the 560X using the Activity Monitor GPU view.

I did not force the mac to use the Intel UHD Graphics 630 1536 MB for my monitors, so the mouse got laggy while the model was running, maybe the window manager used up some VRAM?

Here's some sample code:

question: str
image: PIL.image
model_name = "google/pix2struct-ocrvqa-base"

processor = Pix2StructProcessor.from_pretrained(model_name)
inputs = processor(
    images=image,
    text=question,
    return_tensors="pt",
)
model = Pix2StructForConditionalGeneration.from_pretrained(model_name)

has_mps = torch.backends.mps.is_available()
built_with_mps = torch.backends.mps.is_built()
if has_mps and built_with_mps:
    model = model.to('mps')
    inputs = inputs.to('mps')

predictions = model.generate(
    **inputs,
    max_length=256,
)
result = processor.decode(
    predictions[0],
)
print(result)

Also I'm very new at this stuff, so I wouldn't be surprised if I'm missing some significant settings I should be using. I just did what https://huggingface.co/google/pix2struct-ai2d-base said to do (the ocrvqa model says to follow those instructions).

I don't necessarily need a solution, I just want to provide another data point. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: mps Related to Apple Metal Performance Shaders framework module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

5 participants