Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient Neural Architecture Search #63

Merged
merged 1 commit into from Dec 12, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
74 changes: 74 additions & 0 deletions NAS/ENAS/README.md
@@ -0,0 +1,74 @@
# Efficient Neural Architecture Search

## Overview

Reproduction of the work, "Efficient Neural Architecture Search via Parameter Sharing" by NNabla.
We offer 2 methods proposed by the paper above, Macro Search and Micro Search.

We strongly recommend you to run this code with a decent GPU (at least, NVIDIA GeForce GTX 1080 Ti or better).

### Dataset

By default, this example uses CIFAR-10 dataset, and the dataset will be automatically downloaded when you run the script.


### Configuration

In `args.py`, you can find configurations for both architecture search and evaluation.


### Architecture Search

Process of architecture search can be done by a command below,

```python
python macro_search.py --device-id 0 --context 'cudnn' \
--monitor-path 'search.monitor' \
--recommended-arch <filename-you-want>
```

If you want to use micro search instead, just replace `macro_search.py` by `micro_search.py`.

It takes about 12 hours using a single Tesla P40. Also, With `--early-stop-over` option you can finish the search early (It terminates the search process once the validation accuracy surpasses the one you set, e.g, 0.80).
After the architecture search finishes, you will find s `.npy` file which contains the model architecture.
You can give it an arbitrary name by adding `--recommended-arch <filename-you-want>` option to `macro_search.py` or `micro_search.py`,
by default, its name is either `macro_arch.npy` or `micro_arch.npy`.

Note that this file does not contain weights parameters. It only has the list(s) which represents the model architecture.
During the architecture search, no weights parameters are stored.
However, as a side effect of displaying intermediate training results, the latest records of CNN training, for instance, `Training-loss.series.txt`, are stored in a directory set by `--monitor-path`.


### Architecture Evaluation

For re-training the model recommended as a result of architecture search, just run

```python
python macro_retrain.py --device-id 0 --context 'cudnn' \
--recommended-arch <path to npy file> \
--monitor-path 'result.monitor' \
--monitor-save-path 'result.monitor'
```

This time the weights parameters are stored in `--monitor-save-path` along with other training records in `--monitor-path`.


### Architecture Derivation
Besides the architecture recommended by the controller during the search process, other architectures can be sampled by the controller. After the architecture search finishes, you also have `controller_params.h5` in the directory set by `--monitor-path`.
This contains controller's parameters which generated a architecture with the best validation accuracy. You can get other architectures by using the same script with `--sampling-only` option. If you want 5 more architectures, simply run

```python
python macro_search.py --sampling-only True \
--num-sampling 5
```

Now you have 5 `sampled_macro_arch_N.npy"`. These newly sampled architectures can be trained by the evaluation process described above.


## NOTE
- Currently, we observe that the final accuracy when training after architecture search finishes (in short, when training the model recommended by the controller as a result of architecture search) does not reach as high as reported in the paper, however, that is very close to the result obtained by the author's code publicly available on Github. Also, we don't apply Cutout to the input images.


## References
- Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean, "Efficient Neural Architecture Search via Parameter Sharing", arXiv:1802.03268
- https://github.com/melodyguan/enas
185 changes: 185 additions & 0 deletions NAS/ENAS/args.py
@@ -0,0 +1,185 @@
# Copyright (c) 2017 Sony Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


def get_macro_args():
"""
Get command line arguments for macro search.

Arguments set the default values of command line arguments.
"""
import argparse
parser = argparse.ArgumentParser()

# General setting
parser.add_argument('--context', '-c', type=str,
default='cudnn', help="Extension path. ex) cpu, cudnn.")
parser.add_argument("--device-id", "-d", type=str, default='0',
help='Device ID the training run on. This is only valid if you specify `-c cudnn`.')
parser.add_argument("--type-config", "-t", type=str, default='float',
help='Type of computation. e.g. "float", "half".')
parser.add_argument("--recommended-arch", type=str, default="macro_arch.npy",
help='name of the npy file which contains the recommended architecture by the trained controller.')

# Controller-related
parser.add_argument("--max-search-iter", "-i", type=int, default=350)
parser.add_argument("--num-candidate", "-C", type=int, default=10)
parser.add_argument("--early-stop-over", type=float, default=1.0,
help='If valid accuracy is more than this value, architecture search finishes.')
parser.add_argument("--lstm-size", type=int, default=32)
parser.add_argument("--state-size", type=int, default=32)
parser.add_argument("--lstm-layers", type=int, default=2)
parser.add_argument("--skip-prob", type=float, default=0.8)
parser.add_argument("--skip-weight", type=float, default=1.5)
parser.add_argument("--entropy-weight", type=float, default=None)
parser.add_argument("--temperature", type=float, default=None)
parser.add_argument("--tanh-constant", type=float, default=1.5)
parser.add_argument("--baseline-decay", type=float, default=0.999)
parser.add_argument("--num-ops", type=int, default=6,
help='change this value only when you add a operation to make the controller choose.')
parser.add_argument("--control-lr", type=float, default=0.001)
parser.add_argument("--select-strategy", type=str, choices=["best", "last"], default="best",
help='Architecture selection strategy, either "best" or "last".')
parser.add_argument("--use-variance-reduction", type=bool, default=False)
parser.add_argument("--sampling-only", type=bool, default=False)
parser.add_argument("--num-sampling", type=int, default=5)

# CNN-related
# basic config. mainly for CNN training during architecture search.
parser.add_argument("--use-sparse", type=bool, default=True,
help='Only for test. If True, no skip connections are made.')
parser.add_argument("--batch-size", "-b", type=int, default=64)
parser.add_argument("--num-layers", type=int, default=12)
parser.add_argument("--output-filter", type=int, default=36,
help='Number of output filters of CNN (used during architecture search), must be even number.')
parser.add_argument("--epoch-per-search", "-e", type=int, default=2,
help='Number of epochs used for CNN training during architecture search,')

parser.add_argument("--additional_filters_on_retrain", "-f", type=int, default=60,
help='Number of additional output filters of CNN (used when CNN retraining), must be even number.')
parser.add_argument("--epoch-on-retrain", "-r", type=int, default=350,
help='Number of epochs used for CNN retraining after architecture search.')

# gradient clip
parser.add_argument("--with-grad-clip-on-search", type=bool, default=False)
parser.add_argument("--with-grad-clip-on-retrain", type=bool, default=True)
parser.add_argument("--grad-clip-value", "-g", type=float, default=5.0)

# weight_decay
parser.add_argument("--weight-decay", "-w", type=float, default=0.00025,
help='Weight decay rate. Weight decay is executed by default. Set it 0 to virtually disable it.')

# learning rate and its control
parser.add_argument("--child-lr", "-clr", type=float, default=0.1)
parser.add_argument("--lr-control-on-search", type=bool, default=False,
help='whether or not use learning rate controller on CNN training when architecture search.')
parser.add_argument("--lr-control-on-retrain", type=bool, default=True,
help='whether or not use learning rate controller on CNN training when retraining.')

# misc
parser.add_argument("--val-iter", "-j", type=int,
default=100, help='number of the validation.')
parser.add_argument("--monitor-path", "-m", type=str,
default='tmp.macro_monitor')
parser.add_argument("--model-save-interval", "-s", type=int, default=1000)
parser.add_argument("--model-save-path", "-o",
type=str, default='tmp.macro_monitor')

return parser.parse_args()


def get_micro_args():
"""
Get command line arguments for micro search.

Arguments set the default values of command line arguments.
"""
import argparse
parser = argparse.ArgumentParser()
# General setting
parser.add_argument('--context', '-c', type=str,
default='cudnn', help="Extension path. ex) cpu, cudnn.")
parser.add_argument("--device-id", "-d", type=str, default='0',
help='Device ID the training run on. This is only valid if you specify `-c cudnn`.')
parser.add_argument("--type-config", "-t", type=str, default='float',
help='Type of computation. e.g. "float", "half".')
parser.add_argument("--recommended-arch", type=str, default="micro_arch.npy",
help='name of the npy file which contains the recommended architecture by the trained controller.')

# Controller-related
parser.add_argument("--max-search-iter", "-i", type=int, default=350)
parser.add_argument("--num-candidate", "-C", type=int, default=10)
parser.add_argument("--early-stop-over", type=float, default=1.0,
help='If valid accuracy is more than this value, architecture search finishes.')
parser.add_argument("--lstm-size", type=int, default=32)
parser.add_argument("--state-size", type=int, default=32)
parser.add_argument("--lstm-layers", type=int, default=2)
parser.add_argument("--skip-prob", type=float, default=0.8)
parser.add_argument("--skip-weight", type=float, default=1.5)
parser.add_argument("--entropy-weight", type=float, default=None)
parser.add_argument("--temperature", type=float, default=None)
parser.add_argument("--tanh-constant", type=float, default=1.5)
parser.add_argument("--op-tanh-reduce", type=float, default=1.0)
parser.add_argument("--baseline-decay", type=float, default=0.999)
parser.add_argument("--num-ops", type=int, default=5,
help='change this value only when you add a operation to make the controller choose.')
parser.add_argument("--control-lr", type=float, default=0.001)
parser.add_argument("--select-strategy", type=str, choices=["best", "last"], default="best",
help='Architecture selection strategy, either "best" or "last".')
parser.add_argument("--use-variance-reduction", type=bool, default=False)
parser.add_argument("--sampling-only", type=bool, default=False)
parser.add_argument("--num-sampling", type=int, default=5)

# CNN-related
# basic config. mainly for CNN training during architecture search.
parser.add_argument("--batch-size", "-b", type=int, default=64)
parser.add_argument("--num-cells", type=int, default=6)
parser.add_argument("--num-nodes", type=int, default=7,
help='Number of nodes per cell, must be more than 2.')
parser.add_argument("--output-filter", type=int, default=20,
help='Number of output filters of CNN (used during architecture search), must be even number.')
parser.add_argument("--epoch-per-search", "-e", type=int, default=2,
help='Number of epochs used for CNN training during architecture search,')

parser.add_argument("--additional_filters_on_retrain", "-f", type=int, default=60,
help='Number of additional output filters of CNN (used when CNN retraining), must be even number.')
parser.add_argument("--epoch-on-retrain", "-r", type=int, default=350,
help='Number of epochs used for CNN retraining after architecture search.')

# gradient clip
parser.add_argument("--with-grad-clip-on-search", type=bool, default=False)
parser.add_argument("--with-grad-clip-on-retrain", type=bool, default=True)
parser.add_argument("--grad-clip-value", "-g", type=float, default=5.0)

# weight_decay
parser.add_argument("--weight-decay", "-w", type=float, default=0.00025,
help='Weight decay rate. Weight decay is executed by default. Set it 0 to virtually disable it.')

# learning rate and its control
parser.add_argument("--child-lr", "-clr", type=float, default=0.1)
parser.add_argument("--lr-control-on-search", type=bool, default=False,
help='whether or not use learning rate controller on CNN training when architecture search.')
parser.add_argument("--lr-control-on-retrain", type=bool, default=True,
help='whether or not use learning rate controller on CNN training when retraining.')

# misc
parser.add_argument("--val-iter", "-j", type=int,
default=100, help='number of the validation.')
parser.add_argument("--monitor-path", "-m", type=str,
default='tmp.micro_monitor')
parser.add_argument("--model-save-interval", "-s", type=int, default=1000)
parser.add_argument("--model-save-path", "-o",
type=str, default='tmp.micro_monitor')

return parser.parse_args()
123 changes: 123 additions & 0 deletions NAS/ENAS/cifar10_data.py
@@ -0,0 +1,123 @@
# Copyright (c) 2017 Sony Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

'''
Provide data iterator for CIFAR10 examples.
'''
from contextlib import contextmanager
import numpy as np
import struct
import tarfile
import zlib
import time
import os
import errno

from nnabla.logger import logger
from nnabla.utils.data_iterator import data_iterator
from nnabla.utils.data_source import DataSource
from nnabla.utils.data_source_loader import download, get_data_home


class Cifar10DataSource(DataSource):
'''
Get data directly from cifar10 dataset from Internet(yann.lecun.com).
'''

def _get_data(self, position):
image = self._images[self._indexes[position]]
label = self._labels[self._indexes[position]]
return (image, label)

def __init__(self, train=True, shuffle=False, rng=None):
super(Cifar10DataSource, self).__init__(shuffle=shuffle, rng=rng)

self._train = train
data_uri = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
logger.info('Getting labeled data from {}.'.format(data_uri))
r = download(data_uri) # file object returned
with tarfile.open(fileobj=r, mode="r:gz") as fpin:
# Training data
if train:
images = []
labels = []
for member in fpin.getmembers():
if "data_batch" not in member.name:
continue
fp = fpin.extractfile(member)
data = np.load(fp, encoding="bytes")
images.append(data[b"data"])
labels.append(data[b"labels"])
self._size = 50000
self._images = np.concatenate(
images).reshape(self._size, 3, 32, 32)
self._labels = np.concatenate(labels).reshape(-1, 1)
# Validation data
else:
for member in fpin.getmembers():
if "test_batch" not in member.name:
continue
fp = fpin.extractfile(member)
data = np.load(fp, encoding="bytes")
images = data[b"data"]
labels = data[b"labels"]
self._size = 10000
self._images = images.reshape(self._size, 3, 32, 32)
self._labels = np.array(labels).reshape(-1, 1)
r.close()
logger.info('Getting labeled data from {}.'.format(data_uri))

self._size = self._labels.size
self._variables = ('x', 'y')
if rng is None:
rng = np.random.RandomState(313)
self.rng = rng
self.reset()

def reset(self):
if self._shuffle:
self._indexes = self.rng.permutation(self._size)
else:
self._indexes = np.arange(self._size)
super(Cifar10DataSource, self).reset()

@property
def images(self):
"""Get copy of whole data with a shape of (N, 1, H, W)."""
return self._images.copy()

@property
def labels(self):
"""Get copy of whole label with a shape of (N, 1)."""
return self._labels.copy()


def data_iterator_cifar10(batch_size,
train=True,
rng=None,
shuffle=True,
with_memory_cache=False,
with_parallel=False,
with_file_cache=False):
'''
Provide DataIterator with :py:class:`Cifar10DataSource`
with_memory_cache, with_parallel and with_file_cache option's default value is all False,
because :py:class:`Cifar10DataSource` is able to store all data into memory.

'''
return data_iterator(Cifar10DataSource(train=train, shuffle=shuffle, rng=rng),
batch_size,
with_memory_cache,
with_parallel,
with_file_cache)