-
Notifications
You must be signed in to change notification settings - Fork 327
Description
Issue
On a Windows laptop with an Intel iGPU and discrete NVIDIA GPU (NVIDIA Optimus), the results differ between CPU and DirectML running on Intel iGPU. CPU and DirectML running on NVIDIA GPU match and are within 1e-5 tolerance.
Originally discovered using ONNXRuntime DirectML (issue opened here: microsoft/onnxruntime#14214) but on further testing, torch-directml is also affected that indicates that the issue is in the underlying DirectML backend.
Using Python 3.9, torch 1.13.1 and torch-directml 0.1.13.dev221216 installed from PyPI
System information:
Intel Core i7-11800H with Intel UHD Graphics (Tiger Lake GT1)
NVIDIA RTX 3060 Laptop GPU
Laptop is running with NVIDIA Optimus to be able to switch between Intel iGPU and NVIDIA discrete GPU
Platform: Windows
OS Version: Windows 11 21H2 Build 22000.1335
ONNX Runtime Version or Commit ID
1.13.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
onnxruntime-directml 1.13.1
To Reproduce
Run the following code, updating the device IDs based on the ordering on your device:
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = "TRUE"
import torch
import torch.nn as nn
import torch.onnx
import torchvision
import onnxruntime as rt
import numpy as np
import matplotlib.pyplot as plt
import torch_directml
# Print devices
print([torch_directml.device_name(d_idx) for d_idx in range(torch_directml.device_count())])
# On my machine, NVIDIA GPU is device 0, Intel iGPU is devices 1 and 2 (not sure why it's showing up twice)
import skimage.data
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image
img = torch.from_numpy(skimage.data.cat().T)
## Run using CPU
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.to('cpu')
model.eval()
# Step 2: Initialize the inference transforms
preprocess = weights.transforms()
# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
# Step 4: Use the model and visualize the prediction
with torch.no_grad():
prediction_cpu = model(batch)["out"][0, 8] # Class 8 is cat
## Run using NVIDIA GPU
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
device = torch_directml.device(0) # !! Update device number here
model.to(device)
model.eval()
# Step 2: Initialize the inference transforms
preprocess = weights.transforms()
# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
# Step 4: Use the model and visualize the prediction
with torch.no_grad():
prediction_nvidia_gpu = model(batch.to(device))["out"].to('cpu')[0, 8]
## Run using Intel iGPU
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
device = torch_directml.device(1) # !! Update device number here
model.to(device)
model.eval()
# Step 2: Initialize the inference transforms
preprocess = weights.transforms()
# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)
# Step 4: Use the model and visualize the prediction
with torch.no_grad():
prediction_intel_igpu = model(batch.to(device))["out"].to('cpu')[0, 8]
# Print numerical comparisons
print(np.isclose(prediction_cpu.detach().numpy(), prediction_intel_igpu.detach().numpy(), atol=1e-4).all()) # Returns False, only returns True around atol=1e1
print(np.isclose(prediction_cpu.detach().numpy(), prediction_nvidia_gpu.detach().numpy(), atol=1e-5).all()) # Returns True
# Display image results
plt.figure()
plt.subplot(1, 4, 1)
plt.imshow(np.transpose(img, [1, 2, 0]))
plt.subplot(1, 4, 2)
plt.imshow(np.squeeze(prediction_cpu.detach().numpy()))
plt.colorbar()
plt.subplot(1, 4, 3)
plt.imshow(np.squeeze(prediction_intel_igpu.detach().numpy()))
plt.colorbar()
plt.subplot(1, 4, 4)
plt.imshow(np.squeeze(prediction_nvidia_gpu.detach().numpy()))
plt.colorbar()
# Note that colorbar scales will be different
Other models have the same issue.