Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between _fixed_layer and _enas_layer in cifar10/micro_child.py #8

Open
bkj opened this issue Apr 3, 2018 · 4 comments
Open

Comments

@bkj
Copy link

bkj commented Apr 3, 2018

There are a number of differences between _fixed_layer and _enas_layer in cifar10/micro_child.py.

  1. layer_base variable scope
  2. strided pooling layers and convolutions
  3. possible _factorized_reduction for output

Are you able to give some insight on why the code works like this? It seems that when a fixed architecture is specified, the resulting model is not necessarily exactly the same as during the RL training. It seems to me like the easiest way to fix the child architecture is to have an alternate "dummy controller", that just keeps normal_arc and reduce_arc fixed at the desired architecture.

Thanks
Ben

@hyhieu
Copy link
Collaborator

hyhieu commented Apr 3, 2018

Hi Ben,

Thanks for the questions. I'll try.

  1. The point of layer_base, which is just a 1x1 convolution, is to standardize the number of output channels to out_filters before performing the main operation in a convolutional cell or a normal cell. In _enas_layer, we do this in final_conv. The effect is almost the same, but we found it easier to implement this way.

  2. I don't understand this point of yours. Both _fixed_layer and _enas_layer use both convolutions and pooling. For fixed_layer, I hope the code is quite straightforward. For _enas_layer, since we need to implement a somewhat dynamic graph, we separate the process into the function _enas_cell.

  3. The purpose of _factorized_reduction is to reduce both spatial dimensions (width and height) by a factor of 2, and potentially to change the number of output filters. Where you mention it, this function is used to make sure that the outputs of all operations in a convolutional cell or a reduction cell will have the same spatial dimensions, so that they can be concatenated along the depth dimension.

The reason why we cannot just fix normal_arc and reduce_arc and use the same code for both the search process and fixed-architecture process is efficiency. Dynamic graphs in TF, at least the way we implement them, are slow and very memory inefficient.

Let us know if you still have more questions 😃

@bkj
Copy link
Author

bkj commented Apr 3, 2018

For number 2, the point was that you're using pooling w/ stride > 1 in the fixed architecture, but a combination of _factorized_reduction and pooling w/ stride = 1 in the ENAS cells.

Makes sense about the dynamic graphs being slow.

Thanks for the quick response. (And thanks for releasing the code! I've been working on a similar project for a little while, so am very excited to compare what I've done to your code.)

~ Ben

@hyhieu
Copy link
Collaborator

hyhieu commented Apr 3, 2018

For number 2, the point was that you're using pooling w/ stride > 1 in the fixed architecture, but a combination of _factorized_reduction and pooling w/ stride = 1 in the ENAS cells.

I think it's just because we couldn't figure out how to syntactically make _factorized_reduction run with the output of a dynamic operation, such as tf.case.

@stanstarks
Copy link

@hyhieu I am wondering if the reduction cell in _fixed_layer and _enas_layer have the same previous layers
result of _factorized_reduction is appended to the layers

If I understand it correctly, to make the previous layers consistent, this line should be

layers = [layers[0], x]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants