Skip to content

[DirectML] Divergent results between CPU and DirectML only on Intel Integrated GPU #362

@apleynes

Description

@apleynes

Issue

On a Windows laptop with an Intel iGPU and discrete NVIDIA GPU (NVIDIA Optimus), the results differ between CPU and DirectML running on Intel iGPU. CPU and DirectML running on NVIDIA GPU match and are within 1e-5 tolerance.

Originally discovered using ONNXRuntime DirectML (issue opened here: microsoft/onnxruntime#14214) but on further testing, torch-directml is also affected that indicates that the issue is in the underlying DirectML backend.

Using Python 3.9, torch 1.13.1 and torch-directml 0.1.13.dev221216 installed from PyPI

System information:
Intel Core i7-11800H with Intel UHD Graphics (Tiger Lake GT1)
NVIDIA RTX 3060 Laptop GPU
Laptop is running with NVIDIA Optimus to be able to switch between Intel iGPU and NVIDIA discrete GPU

Platform: Windows

OS Version: Windows 11 21H2 Build 22000.1335

ONNX Runtime Version or Commit ID
1.13.1

ONNX Runtime API
Python

Architecture
X64

Execution Provider
DirectML

Execution Provider Library Version
onnxruntime-directml 1.13.1

To Reproduce

Run the following code, updating the device IDs based on the ordering on your device:

import os
os.environ['KMP_DUPLICATE_LIB_OK'] = "TRUE"
import torch
import torch.nn as nn
import torch.onnx
import torchvision
import onnxruntime as rt
import numpy as np
import matplotlib.pyplot as plt
import torch_directml

# Print devices
print([torch_directml.device_name(d_idx) for d_idx in range(torch_directml.device_count())])
# On my machine, NVIDIA GPU is device 0, Intel iGPU is devices 1 and 2 (not sure why it's showing up twice)

import skimage.data
from torchvision.io.image import read_image
from torchvision.models.segmentation import fcn_resnet50, FCN_ResNet50_Weights
from torchvision.transforms.functional import to_pil_image

img = torch.from_numpy(skimage.data.cat().T)

## Run using CPU
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
model.to('cpu')
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
with torch.no_grad():
    prediction_cpu = model(batch)["out"][0, 8]  # Class 8 is cat


## Run using NVIDIA GPU
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
device = torch_directml.device(0)  # !! Update device number here
model.to(device)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
with torch.no_grad():
    prediction_nvidia_gpu = model(batch.to(device))["out"].to('cpu')[0, 8]

## Run using Intel iGPU
# Step 1: Initialize model with the best available weights
weights = FCN_ResNet50_Weights.DEFAULT
model = fcn_resnet50(weights=weights)
device = torch_directml.device(1)  # !! Update device number here
model.to(device)
model.eval()

# Step 2: Initialize the inference transforms
preprocess = weights.transforms()

# Step 3: Apply inference preprocessing transforms
batch = preprocess(img).unsqueeze(0)

# Step 4: Use the model and visualize the prediction
with torch.no_grad():
    prediction_intel_igpu = model(batch.to(device))["out"].to('cpu')[0, 8]


# Print numerical comparisons
print(np.isclose(prediction_cpu.detach().numpy(), prediction_intel_igpu.detach().numpy(), atol=1e-4).all())  # Returns False, only returns True around atol=1e1
print(np.isclose(prediction_cpu.detach().numpy(), prediction_nvidia_gpu.detach().numpy(), atol=1e-5).all())  # Returns True

# Display image results
plt.figure()
plt.subplot(1, 4, 1)
plt.imshow(np.transpose(img, [1, 2, 0]))
plt.subplot(1, 4, 2)
plt.imshow(np.squeeze(prediction_cpu.detach().numpy()))
plt.colorbar()
plt.subplot(1, 4, 3)
plt.imshow(np.squeeze(prediction_intel_igpu.detach().numpy()))
plt.colorbar()
plt.subplot(1, 4, 4)
plt.imshow(np.squeeze(prediction_nvidia_gpu.detach().numpy()))
plt.colorbar()
# Note that colorbar scales will be different

Other models have the same issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    hardwareIssue is likely hardware-specific or driver version relatedonnx-runtimeIssues that affect ONNX Runtime's DML EPpytorch-directmlIssues in PyTorch when using its DirectML backend

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions