sony · KazukiYoshiyama-sony · Dec 12, 2018 · Dec 7, 2018
diff --git a/NAS/ENAS/README.md b/NAS/ENAS/README.md
@@ -0,0 +1,74 @@
+# Efficient Neural Architecture Search
+
+## Overview
+
+Reproduction of the work, "Efficient Neural Architecture Search via Parameter Sharing" by NNabla. 
+We offer 2 methods proposed by the paper above, Macro Search and Micro Search.
+
+We strongly recommend you to run this code with a decent GPU (at least, NVIDIA GeForce GTX 1080 Ti or better).
+
+### Dataset
+
+By default, this example uses CIFAR-10 dataset, and the dataset will be automatically downloaded when you run the script.
+
+
+### Configuration
+
+In `args.py`, you can find configurations for both architecture search and evaluation. 
+
+
+### Architecture Search
+
+Process of architecture search can be done by a command below,
+
+```python
+python macro_search.py --device-id 0 --context 'cudnn' \
+                           --monitor-path 'search.monitor' \
+                           --recommended-arch <filename-you-want>
+```
+
+If you want to use micro search instead, just replace `macro_search.py` by `micro_search.py`.
+
+It takes about 12 hours using a single Tesla P40. Also, With `--early-stop-over` option you can finish the search early (It terminates the search process once the validation accuracy surpasses the one you set, e.g, 0.80).
+After the architecture search finishes, you will find s `.npy` file which contains the model architecture.
+You can give it an arbitrary name by adding `--recommended-arch <filename-you-want>` option to `macro_search.py` or `micro_search.py`,
+by default, its name is either `macro_arch.npy` or `micro_arch.npy`.
+
+Note that this file does not contain weights parameters. It only has the list(s) which represents the model architecture.
+During the architecture search, no weights parameters are stored.
+However, as a side effect of displaying intermediate training results, the latest records of CNN training, for instance, `Training-loss.series.txt`, are stored in a directory set by `--monitor-path`.
+
+
+### Architecture Evaluation
+
+For re-training the model recommended as a result of architecture search, just run
+
+```python
+python macro_retrain.py --device-id 0 --context 'cudnn' \
+                            --recommended-arch <path to npy file> \
+                            --monitor-path 'result.monitor' \
+                            --monitor-save-path 'result.monitor' 
+```
+
+This time the weights parameters are stored in `--monitor-save-path` along with other training records in `--monitor-path`.
+
+
+### Architecture Derivation
+Besides the architecture recommended by the controller during the search process, other architectures can be sampled by the controller. After the architecture search finishes, you also have `controller_params.h5` in the directory set by `--monitor-path`.
+This contains controller's parameters which generated a architecture with the best validation accuracy. You can get other architectures by using the same script with `--sampling-only` option. If you want 5 more architectures, simply run 
+
+```python
+python macro_search.py --sampling-only True \
+                           --num-sampling 5 
+```
+
+Now you have 5 `sampled_macro_arch_N.npy"`.  These newly sampled architectures can be trained by the evaluation process described above.
+
+
+## NOTE
+- Currently, we observe that the final accuracy when training after architecture search finishes (in short, when training the model recommended by the controller as a result of architecture search) does not reach as high as reported in the paper, however, that is very close to the result obtained by the author's code publicly available on Github. Also, we don't apply Cutout to the input images.
+
+
+## References
+- Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, "Efficient Neural Architecture Search via Parameter Sharing", arXiv:1802.03268
+- https://github.com/melodyguan/enas
diff --git a/NAS/ENAS/args.py b/NAS/ENAS/args.py
@@ -0,0 +1,185 @@
+# Copyright (c) 2017 Sony Corporation. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+def get_macro_args():
+    """
+    Get command line arguments for macro search.
+
+    Arguments set the default values of command line arguments.
+    """
+    import argparse
+    parser = argparse.ArgumentParser()
+
+    # General setting
+    parser.add_argument('--context', '-c', type=str,
+                        default='cudnn', help="Extension path. ex) cpu, cudnn.")
+    parser.add_argument("--device-id", "-d", type=str, default='0',
+                        help='Device ID the training run on. This is only valid if you specify `-c cudnn`.')
+    parser.add_argument("--type-config", "-t", type=str, default='float',
+                        help='Type of computation. e.g. "float", "half".')
+    parser.add_argument("--recommended-arch", type=str, default="macro_arch.npy",
+                        help='name of the npy file which contains the recommended architecture by the trained controller.')
+
+    # Controller-related
+    parser.add_argument("--max-search-iter", "-i", type=int, default=350)
+    parser.add_argument("--num-candidate", "-C", type=int, default=10)
+    parser.add_argument("--early-stop-over", type=float, default=1.0,
+                        help='If valid accuracy is more than this value, architecture search finishes.')
+    parser.add_argument("--lstm-size", type=int, default=32)
+    parser.add_argument("--state-size", type=int, default=32)
+    parser.add_argument("--lstm-layers", type=int, default=2)
+    parser.add_argument("--skip-prob", type=float, default=0.8)
+    parser.add_argument("--skip-weight", type=float, default=1.5)
+    parser.add_argument("--entropy-weight", type=float, default=None)
+    parser.add_argument("--temperature", type=float, default=None)
+    parser.add_argument("--tanh-constant", type=float, default=1.5)
+    parser.add_argument("--baseline-decay", type=float, default=0.999)
+    parser.add_argument("--num-ops", type=int, default=6,
+                        help='change this value only when you add a operation to make the controller choose.')
+    parser.add_argument("--control-lr", type=float, default=0.001)
+    parser.add_argument("--select-strategy", type=str, choices=["best", "last"], default="best",
+                        help='Architecture selection strategy, either "best" or "last".')
+    parser.add_argument("--use-variance-reduction", type=bool, default=False)
+    parser.add_argument("--sampling-only", type=bool, default=False)
+    parser.add_argument("--num-sampling", type=int, default=5)
+
+    # CNN-related
+    # basic config. mainly for CNN training during architecture search.
+    parser.add_argument("--use-sparse", type=bool, default=True,
+                        help='Only for test. If True, no skip connections are made.')
+    parser.add_argument("--batch-size", "-b", type=int, default=64)
+    parser.add_argument("--num-layers", type=int, default=12)
+    parser.add_argument("--output-filter", type=int, default=36,
+                        help='Number of output filters of CNN (used during architecture search), must be even number.')
+    parser.add_argument("--epoch-per-search", "-e", type=int, default=2,
+                        help='Number of epochs used for CNN training during architecture search,')
+
+    parser.add_argument("--additional_filters_on_retrain", "-f", type=int, default=60,
+                        help='Number of additional output filters of CNN (used when CNN retraining), must be even number.')
+    parser.add_argument("--epoch-on-retrain", "-r", type=int, default=350,
+                        help='Number of epochs used for CNN retraining after architecture search.')
+
+    # gradient clip
+    parser.add_argument("--with-grad-clip-on-search", type=bool, default=False)
+    parser.add_argument("--with-grad-clip-on-retrain", type=bool, default=True)
+    parser.add_argument("--grad-clip-value", "-g", type=float, default=5.0)
+
+    # weight_decay
+    parser.add_argument("--weight-decay", "-w", type=float, default=0.00025,
+                        help='Weight decay rate. Weight decay is executed by default. Set it 0 to virtually disable it.')
+
+    # learning rate and its control
+    parser.add_argument("--child-lr", "-clr", type=float, default=0.1)
+    parser.add_argument("--lr-control-on-search", type=bool, default=False,
+                        help='whether or not use learning rate controller on CNN training when architecture search.')
+    parser.add_argument("--lr-control-on-retrain", type=bool, default=True,
+                        help='whether or not use learning rate controller on CNN training when retraining.')
+
+    # misc
+    parser.add_argument("--val-iter", "-j", type=int,
+                        default=100, help='number of the validation.')
+    parser.add_argument("--monitor-path", "-m", type=str,
+                        default='tmp.macro_monitor')
+    parser.add_argument("--model-save-interval", "-s", type=int, default=1000)
+    parser.add_argument("--model-save-path", "-o",
+                        type=str, default='tmp.macro_monitor')
+
+    return parser.parse_args()
+
+
+def get_micro_args():
+    """
+    Get command line arguments for micro search.
+
+    Arguments set the default values of command line arguments.
+    """
+    import argparse
+    parser = argparse.ArgumentParser()
+    # General setting
+    parser.add_argument('--context', '-c', type=str,
+                        default='cudnn', help="Extension path. ex) cpu, cudnn.")
+    parser.add_argument("--device-id", "-d", type=str, default='0',
+                        help='Device ID the training run on. This is only valid if you specify `-c cudnn`.')
+    parser.add_argument("--type-config", "-t", type=str, default='float',
+                        help='Type of computation. e.g. "float", "half".')
+    parser.add_argument("--recommended-arch", type=str, default="micro_arch.npy",
+                        help='name of the npy file which contains the recommended architecture by the trained controller.')
+
+    # Controller-related
+    parser.add_argument("--max-search-iter", "-i", type=int, default=350)
+    parser.add_argument("--num-candidate", "-C", type=int, default=10)
+    parser.add_argument("--early-stop-over", type=float, default=1.0,
+                        help='If valid accuracy is more than this value, architecture search finishes.')
+    parser.add_argument("--lstm-size", type=int, default=32)
+    parser.add_argument("--state-size", type=int, default=32)
+    parser.add_argument("--lstm-layers", type=int, default=2)
+    parser.add_argument("--skip-prob", type=float, default=0.8)
+    parser.add_argument("--skip-weight", type=float, default=1.5)
+    parser.add_argument("--entropy-weight", type=float, default=None)
+    parser.add_argument("--temperature", type=float, default=None)
+    parser.add_argument("--tanh-constant", type=float, default=1.5)
+    parser.add_argument("--op-tanh-reduce", type=float, default=1.0)
+    parser.add_argument("--baseline-decay", type=float, default=0.999)
+    parser.add_argument("--num-ops", type=int, default=5,
+                        help='change this value only when you add a operation to make the controller choose.')
+    parser.add_argument("--control-lr", type=float, default=0.001)
+    parser.add_argument("--select-strategy", type=str, choices=["best", "last"], default="best",
+                        help='Architecture selection strategy, either "best" or "last".')
+    parser.add_argument("--use-variance-reduction", type=bool, default=False)
+    parser.add_argument("--sampling-only", type=bool, default=False)
+    parser.add_argument("--num-sampling", type=int, default=5)
+
+    # CNN-related
+    # basic config. mainly for CNN training during architecture search.
+    parser.add_argument("--batch-size", "-b", type=int, default=64)
+    parser.add_argument("--num-cells", type=int, default=6)
+    parser.add_argument("--num-nodes", type=int, default=7,
+                        help='Number of nodes per cell, must be more than 2.')
+    parser.add_argument("--output-filter", type=int, default=20,
+                        help='Number of output filters of CNN (used during architecture search), must be even number.')
+    parser.add_argument("--epoch-per-search", "-e", type=int, default=2,
+                        help='Number of epochs used for CNN training during architecture search,')
+
+    parser.add_argument("--additional_filters_on_retrain", "-f", type=int, default=60,
+                        help='Number of additional output filters of CNN (used when CNN retraining), must be even number.')
+    parser.add_argument("--epoch-on-retrain", "-r", type=int, default=350,
+                        help='Number of epochs used for CNN retraining after architecture search.')
+
+    # gradient clip
+    parser.add_argument("--with-grad-clip-on-search", type=bool, default=False)
+    parser.add_argument("--with-grad-clip-on-retrain", type=bool, default=True)
+    parser.add_argument("--grad-clip-value", "-g", type=float, default=5.0)
+
+    # weight_decay
+    parser.add_argument("--weight-decay", "-w", type=float, default=0.00025,
+                        help='Weight decay rate. Weight decay is executed by default. Set it 0 to virtually disable it.')
+
+    # learning rate and its control
+    parser.add_argument("--child-lr", "-clr", type=float, default=0.1)
+    parser.add_argument("--lr-control-on-search", type=bool, default=False,
+                        help='whether or not use learning rate controller on CNN training when architecture search.')
+    parser.add_argument("--lr-control-on-retrain", type=bool, default=True,
+                        help='whether or not use learning rate controller on CNN training when retraining.')
+
+    # misc
+    parser.add_argument("--val-iter", "-j", type=int,
+                        default=100, help='number of the validation.')
+    parser.add_argument("--monitor-path", "-m", type=str,
+                        default='tmp.micro_monitor')
+    parser.add_argument("--model-save-interval", "-s", type=int, default=1000)
+    parser.add_argument("--model-save-path", "-o",
+                        type=str, default='tmp.micro_monitor')
+
+    return parser.parse_args()
diff --git a/NAS/ENAS/cifar10_data.py b/NAS/ENAS/cifar10_data.py
@@ -0,0 +1,123 @@
+# Copyright (c) 2017 Sony Corporation. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+'''
+Provide data iterator for CIFAR10 examples.
+'''
+from contextlib import contextmanager
+import numpy as np
+import struct
+import tarfile
+import zlib
+import time
+import os
+import errno
+
+from nnabla.logger import logger
+from nnabla.utils.data_iterator import data_iterator
+from nnabla.utils.data_source import DataSource
+from nnabla.utils.data_source_loader import download, get_data_home
+
+
+class Cifar10DataSource(DataSource):
+    '''
+    Get data directly from cifar10 dataset from Internet(yann.lecun.com).
+    '''
+
+    def _get_data(self, position):
+        image = self._images[self._indexes[position]]
+        label = self._labels[self._indexes[position]]
+        return (image, label)
+
+    def __init__(self, train=True, shuffle=False, rng=None):
+        super(Cifar10DataSource, self).__init__(shuffle=shuffle, rng=rng)
+
+        self._train = train
+        data_uri = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
+        logger.info('Getting labeled data from {}.'.format(data_uri))
+        r = download(data_uri)  # file object returned
+        with tarfile.open(fileobj=r, mode="r:gz") as fpin:
+            # Training data
+            if train:
+                images = []
+                labels = []
+                for member in fpin.getmembers():
+                    if "data_batch" not in member.name:
+                        continue
+                    fp = fpin.extractfile(member)
+                    data = np.load(fp, encoding="bytes")
+                    images.append(data[b"data"])
+                    labels.append(data[b"labels"])
+                self._size = 50000
+                self._images = np.concatenate(
+                    images).reshape(self._size, 3, 32, 32)
+                self._labels = np.concatenate(labels).reshape(-1, 1)
+            # Validation data
+            else:
+                for member in fpin.getmembers():
+                    if "test_batch" not in member.name:
+                        continue
+                    fp = fpin.extractfile(member)
+                    data = np.load(fp, encoding="bytes")
+                    images = data[b"data"]
+                    labels = data[b"labels"]
+                self._size = 10000
+                self._images = images.reshape(self._size, 3, 32, 32)
+                self._labels = np.array(labels).reshape(-1, 1)
+        r.close()
+        logger.info('Getting labeled data from {}.'.format(data_uri))
+
+        self._size = self._labels.size
+        self._variables = ('x', 'y')
+        if rng is None:
+            rng = np.random.RandomState(313)
+        self.rng = rng
+        self.reset()
+
+    def reset(self):
+        if self._shuffle:
+            self._indexes = self.rng.permutation(self._size)
+        else:
+            self._indexes = np.arange(self._size)
+        super(Cifar10DataSource, self).reset()
+
+    @property
+    def images(self):
+        """Get copy of whole data with a shape of (N, 1, H, W)."""
+        return self._images.copy()
+
+    @property
+    def labels(self):
+        """Get copy of whole label with a shape of (N, 1)."""
+        return self._labels.copy()
+
+
+def data_iterator_cifar10(batch_size,
+                          train=True,
+                          rng=None,
+                          shuffle=True,
+                          with_memory_cache=False,
+                          with_parallel=False,
+                          with_file_cache=False):
+    '''
+    Provide DataIterator with :py:class:`Cifar10DataSource`
+    with_memory_cache, with_parallel and with_file_cache option's default value is all False,
+    because :py:class:`Cifar10DataSource` is able to store all data into memory.
+
+    '''
+    return data_iterator(Cifar10DataSource(train=train, shuffle=shuffle, rng=rng),
+                         batch_size,
+                         with_memory_cache,
+                         with_parallel,
+                         with_file_cache)