-
Notifications
You must be signed in to change notification settings - Fork 844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to derive the architecture? Running the train_search on cifar10, but the architecture is different #32
Comments
|
Hi Thanks very much for the reply! About the answer 1, may I ask how do you split the validation set? In your initial code, it seems that you use the "test" data as the "validation" data when searching. In the commit of a00cad0, you fix this problem. So do you just split the real trainset half-and-half as "train" and "valid" set, and use the performance on the "valid" set (half of the real training set) to select the best architecture? |
The test set was never used even before that commit ( Yes, your understanding about architecture selection is correct. |
Hi And about answer 2, could you please why you want to "allow different edges to have different total strengths"? When you generate the final architecture, you use softmax over the alphas, thus the total strengths of these edges are not used. Is there any reason for that? Thanks! |
To derive the final architecture, we retain the top-2 strongest predecessors for each intermediate node (sect 2.4). The edge strengths are needed here to rank the candidate predecessors. While argmax tells us which op to put on the edge, it does not tell whether that particular edge should be retained. |
Hi DARTS is really a nice work! And I just would like to understand the model better without any offense. |
Sure, I hope the following helps (strength I'm referring to here is slightly different from that in the paper):
In other words, strength means the sum of the mixing probabilities of all non-zero ops on a given edge. Also, assuming you have only a single non-zero op (e.g. conv). Without introducing the zero op, all edges are equally important hence it's impossible to determine the two predecessors for each node. |
Got it. Thanks! |
Hi |
"Overfitting" \alpha to the validation set is precisely what we want. While there could be better selection strategies, exploring them is beyond the scope of this paper. |
Hi, @quark0 , About question 1, I found if we did not enable cuDNN and set num_workers to 0 in the dataloader, we still cannot produce the same searched result with a fixed seed when running for multiple times. what else factors do you think that affect the reproduciblity? |
@PipiZong , have you been able to achieve the same structure as the paper, I found I am not able to reproduce either. |
Hi
I have 2 questions about how to derive the final architecture:
I use the default search script on cifar10
python train_search.py --unrolled --seed 0
to search the architecture. I found the architecture is different with the one provided by the paper (also in the genotypes.py of the repo). If I change the seed, the architecture will also change. In the paper, the authors mentioned the results are obtained by 4 runs. So my questions: Do the 4 runs use the same architecture? Or use 4 different architectures?, and How the illustrated architecture in the paper is selected?On my own run to search the architecture, I found the probability of the zero op is the highest. However, in the paper, the authors mentioned that zero op is not used in the final architecture (Sec.2.4). This is also confirmed in the code. My question is If zero op is not used, why we add a zeros in the searching space? It is really weird since if we do not excluded the zero op, all the ops will be zero ;-(. Does the author have the same problems? For example, the alphas for normal cell is
[[0.1838, 0.0982, 0.081 , 0.1736, 0.1812, 0.0846, 0.091 , 0.1066],
[0.4717, 0.0458, 0.0496, 0.0945, 0.1113, 0.0556, 0.0953, 0.0762],
[0.2946, 0.1425, 0.0855, 0.1768, 0.0837, 0.0735, 0.0731, 0.0704],
[0.3991, 0.0631, 0.0581, 0.1053, 0.1307, 0.0577, 0.1043, 0.0817],
[0.6298, 0.0382, 0.035 , 0.0658, 0.0435, 0.0551, 0.0605, 0.0721],
[0.3526, 0.0974, 0.0693, 0.1346, 0.1245, 0.0697, 0.091 , 0.061 ],
[0.4829, 0.06 , 0.0612, 0.115 , 0.0969, 0.065 , 0.0624, 0.0565],
[0.6591, 0.0303, 0.0282, 0.0558, 0.0578, 0.054 , 0.0581, 0.0568],
[0.7612, 0.0199, 0.0207, 0.0294, 0.0343, 0.0442, 0.0431, 0.0472],
[0.3519, 0.1231, 0.0692, 0.1381, 0.0925, 0.076 , 0.0748, 0.0744],
[0.4767, 0.0781, 0.0679, 0.1216, 0.0679, 0.0701, 0.0548, 0.0629],
[0.6769, 0.032 , 0.0292, 0.0547, 0.0533, 0.0427, 0.0614, 0.0498],
[0.7918, 0.0191, 0.0199, 0.0279, 0.0423, 0.0223, 0.0392, 0.0375],
[0.8325, 0.0153, 0.0158, 0.0199, 0.0284, 0.0255, 0.0313, 0.0313]]
Each row is the probability of ['none', 'max_pool_3x3','avg_pool_3x3','skip_connect','sep_conv_3x3','sep_conv_5x5','dil_conv_3x3','dil_conv_5x5'] for each edge.
The text was updated successfully, but these errors were encountered: