RuntimeError: 0 <= device.index() && device.index() < static_cast<c10::DeviceIndex>(device_ready_queues_.size()) INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/engine.cpp":1418 #571

SoldierWz · 2024-03-26T06:16:29Z

Describe the bug

When this problem occurred, I tried to disable the CPU core, and then I could run normally, but the running results were very poor, the accuracy dropped sharply and the training time became longer. I have submitted this issue #565. Then when I restored the CPU core, the above error occurred.
Here is the part where the problem occurs.
device = 'xpu'
for train_idx, test_idx in kf.split(X_tensor):
X_train, X_test = X_tensor[train_idx], X_tensor[test_idx]
y_train, y_test = y_tensor[train_idx], y_tensor[test_idx]

train_dataset = CustomDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

model = MLP(X_train.shape[1]) 
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

model = model.to("xpu")
criterion = criterion.to("xpu")
model, optimizer = ipex.optimize(model, optimizer=optimizer)
for epoch in range(1000):
    model.train() 
    for features, labels in train_loader:
        features, labels = features.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(features)
        loss = criterion(outputs, labels)
        **loss.backward()**
        optimizer.step()

Versions

wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/master/scripts/collect_env.py

For security purposes, please check the contents of collect_env.py before running it.

python collect_env.py

The text was updated successfully, but these errors were encountered:

jgong5 · 2024-03-26T11:50:13Z

May I know what you mean by "disable CPU core"? It sounds like no GPU was found according to the error message. But we should report more meaningful error messages. cc @gujinghui

SoldierWz · 2024-03-27T08:26:38Z

May I know what you mean by "disable CPU core"? It sounds like no GPU was found according to the error message. But we should report more meaningful error messages. cc @gujinghui

I edited the GRUB configuration file
Change GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
Changed to GRUB_CMDLINE_LINUX_DEFAULT="nohz=off"
There is another line which is GRUB_CMDLINE_LINUX="i915.enable_hangcheck=0" which I did not change.
After editing like this, the GPU can be used
But I just tried and a new problem occurred. The error is reported below.

ImportError Traceback (most recent call last)
Cell In[2], line 8
6 import modin.pandas as pd
7 import numpy as np
----> 8 import torch
9 import intel_extension_for_pytorch as ipex
10 import torch.nn as nn

File ~/mambaforge/envs/pytorch-arc/lib/python3.11/site-packages/torch/init.py:235
233 if USE_GLOBAL_DEPS:
234 _load_global_deps()
--> 235 from torch._C import * # noqa: F403
237 # Appease the type checker; ordinarily this binding is inserted by the
238 # torch._C module initialization code in C
239 if TYPE_CHECKING:

ImportError: /home/wangzhen/mambaforge/envs/pytorch-arc/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

SoldierWz mentioned this issue Mar 26, 2024

Error using Intel PyTorch extension for Arc GPU on Ubuntu pytorch/pytorch#122549

Closed

jgong5 assigned gujinghui Mar 26, 2024

ZhaoqiongZ added the ARC ARC GPU label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: 0 <= device.index() && device.index() < static_cast<c10::DeviceIndex>(device_ready_queues_.size()) INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/engine.cpp":1418 #571

RuntimeError: 0 <= device.index() && device.index() < static_cast<c10::DeviceIndex>(device_ready_queues_.size()) INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/engine.cpp":1418 #571

SoldierWz commented Mar 26, 2024

jgong5 commented Mar 26, 2024

SoldierWz commented Mar 27, 2024

RuntimeError: 0 <= device.index() && device.index() < static_cast<c10::DeviceIndex>(device_ready_queues_.size()) INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/engine.cpp":1418 #571

RuntimeError: 0 <= device.index() && device.index() < static_cast<c10::DeviceIndex>(device_ready_queues_.size()) INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/engine.cpp":1418 #571

Comments

SoldierWz commented Mar 26, 2024

Describe the bug

Versions

For security purposes, please check the contents of collect_env.py before running it.

jgong5 commented Mar 26, 2024

SoldierWz commented Mar 27, 2024