[WIP] model averaging benchmark #83

lgarithm · 2019-06-09T13:14:18Z

No description provided.

lgarithm · 2019-06-10T11:58:52Z

srcs/cpp/src/tensorflow/ops/peer_to_peer.cpp

-            other_flt      = 0.5 * (input.flat<float>() + other.flat<float>());
-            std::copy(other.tensor_data().begin(), other.tensor_data().end(),
-                      const_cast<char *>(input.tensor_data().begin()));
+            // FIXME: don't write to input tensor


@luomai @andrei3131 I think the memory bandwidth consumed by AverageAssign is 2/11 of the original implementation.

AverageAssign performs a single std::transform which operates on 2 tensors.

The original performs two std::copy, each operates on 2 tensors,
and I assume the expression other_flt = 0.5 * (input.flat<float>() + other.flat<float>()); opereates on 2 + 2 + 3 tensors in total.

I noticed that this implementation gives a performance benefit on platypus2 on ResNet-32 when compared to the GPU version, but when run on 8 V100 with ResNet-50 it is worse than GPU averaging (using TF operators executed on GPU).

This operator is register on CPU only, will it run on GPU?

@andrei3131 means doing averaging using TensorFlow operators.

@andrei3131 Could you point Guo to the model averaging (GPU) operator?

@lgarithm We are not mainly using this model averaging operator for experiments right now as we find out that using CPU for averaging cannot let ResNet-50 to converge while the averaging through GPU can.

@luomai I see, that will be definitely better if the GPU is powerful.
But I think this commit would still help improve the CPU operator.

* cleanup model store cherrypicked from #83

lgarithm added 5 commits June 9, 2019 20:14

init fake_model_averaging.py

5f2a256

barrier

05761b5

copy *C.char into go string before lambda

bfba76a

request_mode='async', peer_selection_strategy='roundrobin'

7aa866b

simplify ModelStore

3107149

lgarithm mentioned this pull request Jun 9, 2019

Synchronous P2P Request Latency Too Large #67

Closed

inplace AverageAssign

6a0dd73

lgarithm commented Jun 10, 2019

View reviewed changes

lgarithm added a commit that referenced this pull request Jun 11, 2019

* copy *C.char into go string before entering goroutine

4abe5fc

* cleanup model store cherrypicked from #83

lgarithm added a commit that referenced this pull request Jun 11, 2019

* copy *C.char into go string before entering goroutine (#85)

f1868ba

* cleanup model store cherrypicked from #83

merge master

d898aab

lgarithm added a commit that referenced this pull request Jun 14, 2019

cherrypick fix from #83

3611d63

lgarithm closed this Oct 19, 2019

lgarithm deleted the lg-model-ave branch October 19, 2019 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] model averaging benchmark #83

[WIP] model averaging benchmark #83

lgarithm commented Jun 9, 2019

lgarithm Jun 10, 2019

andrei3131 Jun 10, 2019

lgarithm Jun 10, 2019

luomai Jun 10, 2019

luomai Jun 10, 2019

lgarithm Jun 10, 2019

[WIP] model averaging benchmark #83

[WIP] model averaging benchmark #83

Conversation

lgarithm commented Jun 9, 2019

lgarithm Jun 10, 2019

Choose a reason for hiding this comment

andrei3131 Jun 10, 2019

Choose a reason for hiding this comment

lgarithm Jun 10, 2019

Choose a reason for hiding this comment

luomai Jun 10, 2019

Choose a reason for hiding this comment

luomai Jun 10, 2019

Choose a reason for hiding this comment

lgarithm Jun 10, 2019

Choose a reason for hiding this comment