Skip to content
Permalink
Browse files

Added command line arguments for Horovod knob environment variables, …

…config file, and new knobs for autotuning (#1345)
  • Loading branch information...
tgaddair committed Aug 27, 2019
1 parent 6efd5dd commit 356ff698ac6d5072872d63b335f24cc1d65c83a0
@@ -94,6 +94,26 @@ example below:
Other MPI RDMA implementations may or may not benefit from disabling multithreading, so please consult vendor
documentation.

Horovod Parameter Knobs
-----------------------

Many of the configurable parameters available as command line arguments to ``horovodrun`` can be used with ``mpirun``
through the use of environment variables.

Tensor Fusion:

.. code-block:: bash
$ mpirun -x HOROVOD_FUSION_THRESHOLD=33554432 -x HOROVOD_CYCLE_TIME=3.5 ... python train.py
Timeline:

.. code-block:: bash
$ mpirun -x HOROVOD_TIMELINE=/path/to/timeline.json -x HOROVOD_TIMELINE_MARK_CYCLES=1 ... python train.py
Note that when using ``horovodrun``, any command line arguments will override values set in the environment.

Hangs due to non-routed network interfaces
------------------------------------------

@@ -16,25 +16,22 @@ one reduction operation. The algorithm of Tensor Fusion is as follows:
5. Copy data from the fusion buffer into the output tensors.
6. Repeat until there are no more tensors to reduce in this cycle.

The fusion buffer size can be tweaked using the ``HOROVOD_FUSION_THRESHOLD`` environment variable:
The fusion buffer size can be adjusted using the ``--fusion-threshold-mb`` command line argument to ``horovodrun``:

.. code-block:: bash
$ HOROVOD_FUSION_THRESHOLD=33554432 horovodrun -np 4 python train.py
$ horovodrun -np 4 --fusion-threshold-mb 32 python train.py
Setting the ``HOROVOD_FUSION_THRESHOLD`` environment variable to zero disables Tensor Fusion:
Setting ``--fusion-threshold-mb`` to zero disables Tensor Fusion:

.. code-block:: bash
$ HOROVOD_FUSION_THRESHOLD=0 horovodrun -np 4 python train.py
$ horovodrun -np 4 --fusion-threshold-mb 0 python train.py
You can tweak time between cycles (defined in milliseconds) using the ``HOROVOD_CYCLE_TIME`` environment variable:
You can tweak time between cycles (defined in milliseconds) using the ``--cycle-time-ms`` command line argument:

.. code-block:: bash
$ HOROVOD_CYCLE_TIME=3.5 horovodrun -np 4 python train.py
$ horovodrun -np 4 --cycle-time-ms 3.5 python train.py
.. inclusion-marker-end-do-not-remove
@@ -9,12 +9,12 @@ Horovod has the ability to record the timeline of its activity, called Horovod T
:alt: Horovod Timeline


To record a Horovod Timeline, set the ``HOROVOD_TIMELINE`` environment variable to the location of the timeline
To record a Horovod Timeline, set the ``--timeline-filename`` command line argument to the location of the timeline
file to be created. This file is only recorded on rank 0, but it contains information about activity of all workers.

.. code-block:: bash
$ HOROVOD_TIMELINE=/path/to/timeline.json horovodrun -np 4 python train.py
$ horovodrun -np 4 --timeline-filename /path/to/timeline.json python train.py
You can then open the timeline file using the ``chrome://tracing`` facility of the `Chrome <https://www.google.com/chrome/browser/>`__ browser.
@@ -49,13 +49,10 @@ Horovod performs work in cycles. These cycles are used to aid `Tensor Fusion <h
:alt: Cycle Markers


Since this information makes timeline view very crowded, it is not enabled by default. To add cycle markers to the timeline, set the ``HOROVOD_TIMELINE_MARK_CYCLES`` environment variable to ``1``:
Since this information makes timeline view very crowded, it is not enabled by default. To add cycle markers to the timeline, set the ``--timeline-mark-cycles`` flag:

.. code-block:: bash
$ HOROVOD_TIMELINE=/path/to/timeline.json HOROVOD_TIMELINE_MARK_CYCLES=1 \
horovodrun -np 4 python train.py
$ horovodrun -np 4 --timeline-filename /path/to/timeline.json --timeline-mark-cycles python train.py
.. inclusion-marker-end-do-not-remove
@@ -63,6 +63,10 @@ namespace common {
#define HOROVOD_TIMELINE_MARK_CYCLES "HOROVOD_TIMELINE_MARK_CYCLES"
#define HOROVOD_AUTOTUNE "HOROVOD_AUTOTUNE"
#define HOROVOD_AUTOTUNE_LOG "HOROVOD_AUTOTUNE_LOG"
#define HOROVOD_AUTOTUNE_WARMUP_SAMPLES "HOROVOD_AUTOTUNE_WARMUP_SAMPLES"
#define HOROVOD_AUTOTUNE_STEPS_PER_SAMPLE "HOROVOD_AUTOTUNE_STEPS_PER_SAMPLE"
#define HOROVOD_AUTOTUNE_BAYES_OPT_MAX_SAMPLES "HOROVOD_AUTOTUNE_BAYES_OPT_MAX_SAMPLES"
#define HOROVOD_AUTOTUNE_GAUSSIAN_PROCESS_NOISE "HOROVOD_AUTOTUNE_GAUSSIAN_PROCESS_NOISE"
#define HOROVOD_FUSION_THRESHOLD "HOROVOD_FUSION_THRESHOLD"
#define HOROVOD_CYCLE_TIME "HOROVOD_CYCLE_TIME"
#define HOROVOD_STALL_CHECK_DISABLE "HOROVOD_STALL_CHECK_DISABLE"
@@ -20,14 +20,15 @@
#include <limits>

#include "logging.h"
#include "utils/env_parser.h"

namespace horovod {
namespace common {

#define WARMUPS 3
#define CYCLES_PER_SAMPLE 10
#define BAYES_OPT_MAX_SAMPLES 20
#define GAUSSIAN_PROCESS_NOISE 0.8
#define DEFAULT_WARMUPS 3
#define DEFAULT_STEPS_PER_SAMPLE 10
#define DEFAULT_BAYES_OPT_MAX_SAMPLES 20
#define DEFAULT_GAUSSIAN_PROCESS_NOISE 0.8

Eigen::VectorXd CreateVector(double x1, double x2) {
Eigen::VectorXd v(2);
@@ -38,23 +39,28 @@ Eigen::VectorXd CreateVector(double x1, double x2) {

// ParameterManager
ParameterManager::ParameterManager() :
warmups_(GetIntEnvOrDefault(HOROVOD_AUTOTUNE_WARMUP_SAMPLES, DEFAULT_WARMUPS)),
steps_per_sample_(GetIntEnvOrDefault(HOROVOD_AUTOTUNE_STEPS_PER_SAMPLE, DEFAULT_STEPS_PER_SAMPLE)),
hierarchical_allreduce_(CategoricalParameter<bool>(std::vector<bool>{false, true})),
hierarchical_allgather_(CategoricalParameter<bool>(std::vector<bool>{false, true})),
cache_enabled_(CategoricalParameter<bool>(std::vector<bool>{false, true})),
joint_params_(BayesianParameter(
std::vector<BayesianVariableConfig>{
{ BayesianVariable::fusion_buffer_threshold_mb, std::pair<double, double>(0, 64) },
{ BayesianVariable::cycle_time_ms, std::pair<double, double>(1, 100) }
}, std::vector<Eigen::VectorXd>{
},
std::vector<Eigen::VectorXd>{
CreateVector(4, 5),
CreateVector(32, 50),
CreateVector(16, 25),
CreateVector(8, 10)
})),
},
GetIntEnvOrDefault(HOROVOD_AUTOTUNE_BAYES_OPT_MAX_SAMPLES, DEFAULT_BAYES_OPT_MAX_SAMPLES),
GetDoubleEnvOrDefault(HOROVOD_AUTOTUNE_GAUSSIAN_PROCESS_NOISE, DEFAULT_GAUSSIAN_PROCESS_NOISE))),
parameter_chain_(std::vector<ITunableParameter*>{&joint_params_, &hierarchical_allreduce_, &hierarchical_allgather_,
&cache_enabled_}),
active_(false),
warmup_remaining_(WARMUPS),
warmup_remaining_(warmups_),
sample_(0),
rank_(-1),
root_rank_(0),
@@ -80,7 +86,7 @@ void ParameterManager::Initialize(int32_t rank, int32_t root_rank,

void ParameterManager::SetAutoTuning(bool active) {
if (active != active_) {
warmup_remaining_ = WARMUPS;
warmup_remaining_ = warmups_;
}
active_ = active;
};
@@ -140,8 +146,8 @@ bool ParameterManager::Update(const std::vector<std::string>& tensor_names,
}

for (const std::string& tensor_name : tensor_names) {
int32_t cycle = tensor_counts_[tensor_name]++;
if (cycle >= (sample_ + 1) * CYCLES_PER_SAMPLE) {
int32_t step = tensor_counts_[tensor_name]++;
if (step >= (sample_ + 1) * steps_per_sample_) {
auto now = std::chrono::steady_clock::now();
double duration = std::chrono::duration_cast<std::chrono::microseconds>(now - last_sample_start_).count();
scores_[sample_] = total_bytes_ / duration;
@@ -391,10 +397,14 @@ void ParameterManager::CategoricalParameter<T>::ResetState() {
// BayesianParameter
ParameterManager::BayesianParameter::BayesianParameter(
std::vector<BayesianVariableConfig> variables,
std::vector<Eigen::VectorXd> test_points) :
std::vector<Eigen::VectorXd> test_points,
int max_samples,
double gaussian_process_noise) :
TunableParameter<Eigen::VectorXd>(test_points[0]),
variables_(variables),
test_points_(test_points),
max_samples_(max_samples),
gaussian_process_noise_(gaussian_process_noise),
iteration_(0) {
ResetBayes();
Reinitialize(FilterTestPoint(0));
@@ -453,7 +463,7 @@ void ParameterManager::BayesianParameter::OnTune(double score, Eigen::VectorXd&
}

bool ParameterManager::BayesianParameter::IsDoneTuning() const {
return iteration_ > BAYES_OPT_MAX_SAMPLES;
return iteration_ > max_samples_;
}

void ParameterManager::BayesianParameter::ResetState() {
@@ -474,7 +484,7 @@ void ParameterManager::BayesianParameter::ResetBayes() {
}
}

bayes_.reset(new BayesianOptimization(bounds, GAUSSIAN_PROCESS_NOISE));
bayes_.reset(new BayesianOptimization(bounds, gaussian_process_noise_));
}

Eigen::VectorXd ParameterManager::BayesianParameter::FilterTestPoint(int i) {
@@ -185,7 +185,8 @@ class ParameterManager {
// A set of numerical parameters optimized jointly using Bayesian Optimization.
class BayesianParameter : public TunableParameter<Eigen::VectorXd> {
public:
BayesianParameter(std::vector<BayesianVariableConfig> variables, std::vector<Eigen::VectorXd> test_points);
BayesianParameter(std::vector<BayesianVariableConfig> variables, std::vector<Eigen::VectorXd> test_points,
int max_samples, double gaussian_process_noise);

void SetValue(BayesianVariable variable, double value, bool fixed);
double Value(BayesianVariable variable) const;
@@ -201,6 +202,9 @@ class ParameterManager {

std::vector<BayesianVariableConfig> variables_;
std::vector<Eigen::VectorXd> test_points_;
int max_samples_;
double gaussian_process_noise_;

uint32_t iteration_;

struct EnumClassHash {
@@ -215,6 +219,9 @@ class ParameterManager {
std::unordered_map<BayesianVariable, int32_t, EnumClassHash> index_;
};

int warmups_;
int steps_per_sample_;

CategoricalParameter<bool> hierarchical_allreduce_;
CategoricalParameter<bool> hierarchical_allgather_;
CategoricalParameter<bool> cache_enabled_;
@@ -236,7 +243,6 @@ class ParameterManager {
int32_t root_rank_;
std::ofstream file_;
bool writing_;

};

} // namespace common
@@ -154,5 +154,10 @@ int GetIntEnvOrDefault(const char* env_variable, int default_value) {
return env_value != nullptr ? std::strtol(env_value, nullptr, 10) : default_value;
}

double GetDoubleEnvOrDefault(const char* env_variable, double default_value) {
auto env_value = std::getenv(env_variable);
return env_value != nullptr ? std::strtod(env_value, nullptr) : default_value;
}

} // namespace common
}
@@ -41,6 +41,8 @@ void SetIntFromEnv(const char* env, int& val);

int GetIntEnvOrDefault(const char* env_variable, int default_value);

double GetDoubleEnvOrDefault(const char* env_variable, double default_value);

} // namespace common
} // namespace horovod

0 comments on commit 356ff69

Please sign in to comment.
You can’t perform that action at this time.