Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use l2l.vision.models.ResNet12? #389

Closed
Jeong-Bin opened this issue Apr 1, 2023 · 4 comments
Closed

How to use l2l.vision.models.ResNet12? #389

Jeong-Bin opened this issue Apr 1, 2023 · 4 comments

Comments

@Jeong-Bin
Copy link

Jeong-Bin commented Apr 1, 2023

Hi, I'm using l2l to create large MAML model.
However, I have a question regarding the usage of l2l.vision.models.ResNet12 or WRN28.
I tried the following 3 methods.

# Method 1
class Lambda(nn.Module):
    def __init__(self, func):
        super().__init__()
        self.func = func
    def forward(self, x):
        return self.func(x)

features = l2l.vision.models.ResNet12(output_size=256)
features = torch.nn.Sequential(features, Lambda(lambda x: x.view(-1, 84)))
features.to(device)
head = torch.nn.Linear(84, ways)
head = l2l.algorithms.MAML(head, lr=fast_lr)
head.to(device)
all_parameters = list(features.parameters()) + list(head.parameters())
optimizer = optim.Adam(all_parameters, meta_lr)

In Method 1, Error RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of size: : [5] occurred.
Also, when I modified code to lambda x: x.view(-1, 256) and torch.nn.Linear(256, ways), RuntimeError: mat1 and mat2 shapes cannot be multiplied (1260x84 and 256x5) occurred.


# Method 2
features = l2l.vision.models.ResNet12(output_size=256)
features = torch.nn.Sequential(features, Lambda(lambda x: x.view(-1, 256)))
features.to(device)
head = l2l.vision.models.MiniImagenetCNN(ways)
head = l2l.algorithms.MAML(head, lr=fast_lr)
head.to(device)
all_parameters = list(features.parameters()) + list(head.parameters())
optimizer = optim.Adam(all_parameters, meta_lr)

Method 2 worked well, but its test accuracy was lower than basic MAML model.
I used following code for basic MAML.

model = l2l.vision.models.MiniImagenetCNN(output_size=ways)
model.to(device)
maml = l2l.algorithms.MAML(model, lr=fast_lr, first_order=False)
optimizer = optim.Adam(maml.parameters(), meta_lr)

# Method 3
model = l2l.vision.models.ResNet12(output_size=ways)
model.to(device)
maml = l2l.algorithms.MAML(model, lr=fast_lr, first_order=False)
optimizer = optim.Adam(maml.parameters(), meta_lr)

Method 3 worked well during training, but I encountered OutOfMemoryError in testing.
(Actually, the training was very slow.)

What is the right way, and what should I modify?
Or is there any other way to make a large MAML model?

I set the training and testing configurations as follows:

# train setting
ways=5
shot=1
adaptation_steps=5
batch_size=4
meta_lr : 1e-3,
fast_lr : 0.01

# test setting
ways=5
shot=15
adaptation_steps=10
batch_size=4
fast_lr : 0.01
@seba-1511
Copy link
Member

Hello @Jeong-Bin,

Method 3 is correct. Try using maml.clone(first_order=True) when testing. Or, you can reduce the number of adaptation steps (at the price of performance).

How much GPU memory do you have? If you have more than 1 GPU, you can use model.features = torch.nn.DataParallel(model.features) to distribute the activations on the GPUs.

@Jeong-Bin
Copy link
Author

Jeong-Bin commented Apr 2, 2023

@seba-1511
Thanks! My GPU is RTX 3090 Ti with 20GB memory.
I'll try your solution.

Additionally, I referred to 'adaptation steps' in MAML paper.
In section A.1. Classification of the paper, authors use 10 gradient steps at test time.
Is the gradient step means adaptation step?

@seba-1511
Copy link
Member

Yes, gradient steps are adaptation steps.

@Jeong-Bin
Copy link
Author

All right, thank you for your help! Have a nice day😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants