Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy drops during extended training. #8

Closed
Sahel13 opened this issue Oct 15, 2021 · 6 comments
Closed

Accuracy drops during extended training. #8

Sahel13 opened this issue Oct 15, 2021 · 6 comments

Comments

@Sahel13
Copy link

Sahel13 commented Oct 15, 2021

Hi,

I've built the following quaternion CNN using the methods provided.

  class QLeNet_300_100(nn.Module):
      def __init__(self):
          super().__init__()
          self.fc1 = layers.QLinear(196, 75)
          self.fc2 = layers.QLinear(75, 25)
          self.fc3 = layers.QLinear(25, 10)
          self.abs = layers.QuaternionToReal(10)
  
      def forward(self, x):
          x = torch.flatten(x, 1)
          x = F.relu(self.fc1(x))
          x = F.relu(self.fc2(x))
          x = self.abs(self.fc3(x))
          return x

When training the model for an extended duration on the MNIST dataset, the accuracy suddenly drops to nearly 10%, which is what we would expect from an untrained model, and doesn't improve any further. An image of the accuracy values as training progresses is attached.

accuracy

The same issue also persists when using the methods in Parcollet's original repo. I would appreciate some insight into why this might be happening. If you need additional info, I can provide the code to recreate this issue.

Thanks,
Sahel

@giorgiozannini
Copy link
Contributor

Thanks a lot for the interest in our work!
Yes if you may kindly provide the full code we will look into it ASAP!

@Sahel13
Copy link
Author

Sahel13 commented Oct 16, 2021

Hi,

Thanks for the quick reply. I'm attaching the file.

@Sahel13 Sahel13 closed this as completed Oct 16, 2021
@Sahel13
Copy link
Author

Sahel13 commented Oct 16, 2021

I closed the issue by mistake. Here's the code. You might need to change the 'data_directory' variable.

import os
import torch
import torch.nn as nn
import torchvision
from htorch import layers
import torchvision.transforms as transforms
import torch.nn.functional as F


"""
Parameters.
"""
batch_size = 128
learning_rate = 0.1
num_epochs = 40
data_directory = os.path.join('data', 'mnist')

use_gpu = True
device = torch.device("cuda:0" if use_gpu else "cpu")

"""
Get the data.
"""
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.1307], std=[0.3081])
])

trainset = torchvision.datasets.MNIST(
    root=data_directory,
    train=True,
    download=True,
    transform=transform
)
testset = torchvision.datasets.MNIST(
    root=data_directory,
    train=False,
    download=True,
    transform=transform
)

trainloader = torch.utils.data.DataLoader(
    trainset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=2
)
testloader = torch.utils.data.DataLoader(
    testset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=2
)


"""
Define the model.
"""
class LeNet_300_100(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = layers.QLinear(196, 75)
        self.fc2 = layers.QLinear(75, 25)
        self.fc3 = layers.QLinear(25, 10)
        self.abs = layers.QuaternionToReal(10)

    def forward(self, x):
        x = torch.flatten(x, 1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.abs(self.fc3(x))
        return x


model = LeNet_300_100()
model.to(device)


"""
Train and test.
"""
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)


def test_model(model, testloader, device):
    correct = 0
    total = 0

    with torch.no_grad():
        for data in testloader:
            images, labels = data[0].to(device), data[1].to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    return accuracy


for epoch in range(num_epochs):
    if epoch == 0:
        accuracy = test_model(model, testloader, device)
        print("ep  {:03d}  loss    {:.3f}  acc  {:.3f}%".format(epoch,
               0, accuracy))

    epoch_loss = 0.0
    for i, data in enumerate(trainloader, 0):

        images, labels = data[0].to(device), data[1].to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()

    # Test accuracy at the end of each epoch
    accuracy = test_model(model, testloader, device)

    print("ep  {:03d}  loss  {:.3f}  acc  {:.3f}%".format(
        epoch + 1, epoch_loss / len(trainloader), accuracy))

print("\nTraining complete.\n")

@Sahel13 Sahel13 reopened this Oct 16, 2021
@giorgiozannini
Copy link
Contributor

This looks like a learning rate problem! I tried running your code with a learning rate of 0.05 and had no problems.

@Sahel13
Copy link
Author

Sahel13 commented Oct 16, 2021

Oh okay. I was using the same parameters for a comparable real-valued network and that worked fine, so maybe that's why I may have missed this. Thank you.

On a side note, have you used your methods to try and construct a quaternion model that performs better than a real-valued counterpart at a classification task, say like the one in Gaudet's paper?

@giorgiozannini
Copy link
Contributor

We did! Actually that's what we are working on right now, as soon as we find a configuration with a noticeable improvement over real NN's we will update the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants