Add SyncBatchNormalization layer for TensorFlow. #2075

romerojosh · 2020-07-02T17:37:34Z

Checklist before submitting

Did you read the contributor guide?
Did you update the docs?
Did you write any tests to validate this change?
Did you update the CHANGELOG, if this change affects users?

Description

This PR adds a SyncBatchNormalization layer implementation using Horovod for TensorFlow.

Fixes #2066.

Review process to land

All tests and other checks must succeed.
At least one member of the technical steering committee must review and approve.
If any member of the technical steering committee requests changes, they must be addressed.

romerojosh · 2020-07-08T16:50:14Z

@tgaddair As discussed, would we be open to removing support for the very old TF release in our tests (v1.6) to unblock this PR?

tgaddair · 2020-07-08T17:16:16Z

Sure, I can take a stab at that today.

Signed-off-by: Josh Romero <joshr@nvidia.com>

tgaddair

LGTM! I will create an issue around the comment I suggested. Feel free to merge in if you're ready.

tgaddair · 2020-07-13T20:06:58Z

horovod/tensorflow/sync_batch_norm.py

+    worker_mean, worker_variance = super(SyncBatchNormalization, self)._moments(
+      inputs, reduction_axes, keep_dims=keep_dims)
+
+    if size() > 1:


We may want to make this work with dynamic worker count in a follow-up PR.

Yes, sounds good to me. Thanks for pointing that out and opening the issue.

weiminggao · 2020-07-27T09:45:06Z

I test batch norm in tf 1.14 with graph mode , it does not work. Why

romerojosh requested a review from tgaddair July 2, 2020 17:37

romerojosh force-pushed the tf_syncbn branch from a5bd1e9 to c195e5c Compare July 9, 2020 06:45

romerojosh added 2 commits July 13, 2020 10:59

Add SyncBatchNormalization layer for TensorFlow.

e3f88a8

Signed-off-by: Josh Romero <joshr@nvidia.com>

Fixes for TF2.

f8d7e69

Signed-off-by: Josh Romero <joshr@nvidia.com>

romerojosh force-pushed the tf_syncbn branch from c195e5c to f8d7e69 Compare July 13, 2020 17:59

tgaddair approved these changes Jul 13, 2020

View reviewed changes

tgaddair mentioned this pull request Jul 13, 2020

TensorFlow sync batch norm elastic compatibility #2100

Open

romerojosh merged commit 1104a5f into horovod:master Jul 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SyncBatchNormalization layer for TensorFlow. #2075

Add SyncBatchNormalization layer for TensorFlow. #2075

romerojosh commented Jul 2, 2020

romerojosh commented Jul 8, 2020

tgaddair commented Jul 8, 2020

tgaddair left a comment

tgaddair Jul 13, 2020

romerojosh Jul 13, 2020

weiminggao commented Jul 27, 2020

Add SyncBatchNormalization layer for TensorFlow. #2075

Add SyncBatchNormalization layer for TensorFlow. #2075

Conversation

romerojosh commented Jul 2, 2020

Checklist before submitting

Description

Review process to land

romerojosh commented Jul 8, 2020

tgaddair commented Jul 8, 2020

tgaddair left a comment

Choose a reason for hiding this comment

tgaddair Jul 13, 2020

Choose a reason for hiding this comment

romerojosh Jul 13, 2020

Choose a reason for hiding this comment

weiminggao commented Jul 27, 2020