Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResNet50 inference: fail to set device #148

Closed
mguo2021 opened this issue Aug 4, 2023 · 1 comment
Closed

ResNet50 inference: fail to set device #148

mguo2021 opened this issue Aug 4, 2023 · 1 comment

Comments

@mguo2021
Copy link

mguo2021 commented Aug 4, 2023

I am trying to run resnet50 inference pytorch on our cluster. I launched a container on one compute node (it has 8 PVC cards), and called quickstart/image_recognition/pytorch/resnet50v1_5/inference/gpu/inference_block_format.sh.

These are the parameters I set
export DATASET_DIR=/mnt/daos/datasets_shared/imagenet export OUTPUT_DIR=/mnt/daos/scratch_guob/resnet50/inference/logs export BATCH_SIZE=1024 export NUM_ITERATIONS=2 export Tile=1

And this is the error I got
resnet50 int8 inference block oneccl_bindings_for_pytorch not available! Use XPU: 0 => using pre-trained model 'resnet50' /nfs/home/guob/.local/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead. warnings.warn( /nfs/home/guob/.local/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum orNonefor 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passingweights=ResNet50_Weights.IMAGENET1K_V1. You can also use weights=ResNet50_Weights.DEFAULTto get the most up-to-date weights. warnings.warn(msg) Traceback (most recent call last): File "/applications.devops.montecristo.onboarding/example_resnet50_inference_pt/code/AImodels/models/image_recognition/pytorch/resnet50v1_5/inference/gpu/main.py", line 1116, in <module> main() File "/applications.devops.montecristo.onboarding/example_resnet50_inference_pt/code/AImodels/models/image_recognition/pytorch/resnet50v1_5/inference/gpu/main.py", line 276, in main main_worker(ngpus_per_node, args) File "/applications.devops.montecristo.onboarding/example_resnet50_inference_pt/code/AImodels/models/image_recognition/pytorch/resnet50v1_5/inference/gpu/main.py", line 416, in main_worker torch.xpu.set_device(args.xpu) File "/nfs/home/guob/.local/lib/python3.9/site-packages/intel_extension_for_pytorch/xpu/__init__.py", line 159, in set_device intel_extension_for_pytorch._C._setDevice(device) AttributeError: module 'intel_extension_for_pytorch._C' has no attribute '_setDevice' awk: cmd. line:1: fatal: division by zero attempted

Could someone tell me what's wrong and how to solve the problem? Thanks!

@mguo2021
Copy link
Author

mguo2021 commented Aug 4, 2023

I fixed it by using
python -m pip install intel_extension_for_pytorch==1.10.200+gpu -f https://developer.intel.com/ipex-whl-stable-xpu

@mguo2021 mguo2021 closed this as completed Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant