# An alternative method to leverage the integrated GPU on Windows laptops for deep learning tasks using direct_ml

## Overview:
DirectML (Direct Machine Learning) is a hardware-accelerated machine learning API from Microsoft, built on top of DirectX 12. It enables efficient ML inferencing on GPUs and other accelerators across Windows devices.

By leveraging DirectML, you can utilize integrated GPUs on standard laptops for deep learning tasks, which is beneficial for setting up baseline models. However, this approach has limitations and may not fully replace dedicated GPUs or cloud-based solutions. For long-term use cases, transitioning to these more robust methods is recommended.

DirectML provides support for popular frameworks like TensorFlow and PyTorch. For PyTorch, the (torch-directml) package enables GPU acceleration via DirectML. Similarly, (TensorFlow-DirectML) allows TensorFlow to perform high-performance training and inferencing on any Windows device with a DirectX 12-capable GPU. 

Additionally, community projects have explored running YOLOv8 on DirectML, though these implementations may still face some constraints and require further modifications. 

## Limitation
TensorFlow and PyTorch offer some level of support for DirectML, but not full compatibility, meaning certain data types may still be unsupported.

Please find at link:   
[Pytorch](https://github.com/microsoft/DirectML/wiki/PyTorch-DirectML-Operator-Roadmap)   
[Tensorflow](https://github.com/microsoft/tensorflow-directml-plugin)   

I'm aware that this topic has been widely discussed, and this summary is intended to consolidate key points and help the community save time and resources.

## Pytorch 
With Pytorch, please install torch_directml. When running, you will initiate the device and add your model into it as below template.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import torch_directml

In [2]:
# Define a simple neural network
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

In [3]:
# Load the MNIST dataset
transform = transforms.Compose([transforms.ToTensor()])
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./data\MNIST\raw\train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:11<00:00, 853121.91it/s] 


Extracting ./data\MNIST\raw\train-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 95845.89it/s]


Extracting ./data\MNIST\raw\train-labels-idx1-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:01<00:00, 936383.22it/s] 


Extracting ./data\MNIST\raw\t10k-images-idx3-ubyte.gz to ./data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 404: Not Found

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 1845981.47it/s]

Extracting ./data\MNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\MNIST\raw






In [4]:
# Create a DirectML device
dml = torch_directml.device()

In [5]:
# Initialize the model, loss function, and optimizer
model = SimpleNN()
model = model.to(dml)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

In [6]:
# Training loop
for epoch in range(5):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # Move data and target to the DirectML device
        data, target = data.to(dml), target.to(dml)

        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        if batch_idx % 100 == 0:
            print(f'Epoch [{epoch+1}/5], Step [{batch_idx+1}/{len(train_loader)}], Loss: {loss.item():.4f}')

print('Training completed.')


Epoch [1/5], Step [1/938], Loss: 2.3056
Epoch [1/5], Step [101/938], Loss: 0.6486
Epoch [1/5], Step [201/938], Loss: 0.5244
Epoch [1/5], Step [301/938], Loss: 0.3761
Epoch [1/5], Step [401/938], Loss: 0.3318
Epoch [1/5], Step [501/938], Loss: 0.4377
Epoch [1/5], Step [601/938], Loss: 0.2180
Epoch [1/5], Step [701/938], Loss: 0.2488
Epoch [1/5], Step [801/938], Loss: 0.2940
Epoch [1/5], Step [901/938], Loss: 0.5269
Epoch [2/5], Step [1/938], Loss: 0.1859
Epoch [2/5], Step [101/938], Loss: 0.2807
Epoch [2/5], Step [201/938], Loss: 0.2691
Epoch [2/5], Step [301/938], Loss: 0.2416
Epoch [2/5], Step [401/938], Loss: 0.2049
Epoch [2/5], Step [501/938], Loss: 0.0756
Epoch [2/5], Step [601/938], Loss: 0.3464
Epoch [2/5], Step [701/938], Loss: 0.0964
Epoch [2/5], Step [801/938], Loss: 0.1446
Epoch [2/5], Step [901/938], Loss: 0.3289
Epoch [3/5], Step [1/938], Loss: 0.2025
Epoch [3/5], Step [101/938], Loss: 0.0832
Epoch [3/5], Step [201/938], Loss: 0.1234
Epoch [3/5], Step [301/938], Loss: 0.187

## Tensorflow 
Please install tensorflow-directml-plugin. Other steps are same as standard approach

In [8]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers, losses
from tensorflow.keras.datasets import mnist

In [9]:
# Ensure DirectML is used (if available)
# Use the following line to list all devices and check DirectML as a device
physical_devices = tf.config.list_physical_devices()

# Optionally, if DirectML is not listed, enable it
if any("DirectML" in device.name for device in physical_devices):
    tf.config.set_visible_devices([device for device in physical_devices if "DirectML" in device.name], 'GPU')


In [10]:
physical_devices

[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
 PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

In [11]:
# Load and preprocess the MNIST dataset
(train_images, train_labels), _ = mnist.load_data()
train_images = train_images.reshape((60000, 28 * 28)) / 255.0
train_labels = tf.keras.utils.to_categorical(train_labels)

# Define a simple neural network model
model = models.Sequential([
    layers.Dense(128, activation='relu', input_shape=(28 * 28,)),
    layers.Dense(10)
])

# Compile the model
model.compile(optimizer=optimizers.SGD(learning_rate=0.1),
              loss=losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Training loop
model.fit(train_images, train_labels, epochs=5, batch_size=64, verbose=1)

print('Training completed.')

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Training completed.
