# Implementing DoRA: Weight-Decomposed Low-Rank Adaptation

In this notebook we will be implementing the [Weight-Decomposed Low-Rank Adaptation (DoRA)](https://arxiv.org/abs/2402.09353) architecture, proposed by researchers as a technique that outperforms LoRA by a large margin.

To follow along conceptually, you can refer to the [writeup](https://medium.com/p/f814ba519af4/edit) where we go over the theoretical concepts and the motivation behind LoRA and DoRA.

## Importing libraries

In [None]:
import time
import numpy as np
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
import torch.nn.functional as F
import torch.nn as nn
import torch


if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

## Settings and dataset

In [None]:
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
BATCH_SIZE = 64

train_dataset = datasets.MNIST(root='data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='data/',
                               train=False,
                               transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset,
                          batch_size=BATCH_SIZE,
                          shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
                         batch_size=BATCH_SIZE,
                         shuffle=False)

for images, labels in train_loader:
    print('Image batch dimensions:', images.shape)
    print('Image label dimensions:', labels.shape)
    break # Only print one, all will have the same dimensions

Image batch dimensions: torch.Size([64, 1, 28, 28])
Image label dimensions: torch.Size([64])
