Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the MMN network design #24

Closed
laubeke opened this issue Mar 23, 2021 · 3 comments
Closed

Question about the MMN network design #24

laubeke opened this issue Mar 23, 2021 · 3 comments

Comments

@laubeke
Copy link

laubeke commented Mar 23, 2021

Hi, thanks for the nice work and code!

The MMN is a single fully-connected layed defined and used in
https://github.com/microsoft/Cream/blob/main/lib/models/structures/supernet.py#L92
https://github.com/microsoft/Cream/blob/main/lib/models/structures/supernet.py#L127
and then used by the board in
https://github.com/microsoft/Cream/blob/main/lib/models/PrioritizedBoard.py#L32

I do wonder why the MMN size depends on the slice size (i.e. the number of images) and concatenates their class probabilities, instead of using the same weights for each of image?

@Z7zuqer
Copy link
Collaborator

Z7zuqer commented Mar 25, 2021

Hi,

Thanks for your interest in our project!

MMN aims to calculate the matching degree of student and teacher subnets. One image is not abundant enough to reflect their matching degree, thus we use a batch of data containing several images to reflect. So the size of MMN depends on the slice size.

Best,
Hao.

@laubeke
Copy link
Author

laubeke commented Mar 25, 2021

Hi,

thanks for the quick response!
I'm not sure if I'm understood correctly, so let me be more precise:

n = num classes
s = slice size

the MMN net takes a [s, n] shaped vector, reshapes it to [s * n], and then uses a fully connected layer to get an output of size [1]
the alternative is to not reshape (keep [s, n]), but get a [s, 1] sized output, over which one can average to get the [1] sized output

in the second case, neither slice size nor the order of the images matters, and it is also not limited to a single image.

Best,
Kevin

@Z7zuqer
Copy link
Collaborator

Z7zuqer commented Mar 26, 2021

Hi,

Thanks for your interest in our project!

We design the MMN to predict the matching degree of two subnets over mini-batch data. Firstly, it's ok for the second case. However, in the second case, the weight tensor of MMN is only Nx1, theoretically worse than (SxN)x1. Besides, the updating batch size of MMN is fixed during supernet training, so we take the first case.

Best,
Hao.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants