Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SyncBatchNormalization layer for TensorFlow. #2075

Merged
merged 2 commits into from Jul 13, 2020

Conversation

romerojosh
Copy link
Collaborator

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

This PR adds a SyncBatchNormalization layer implementation using Horovod for TensorFlow.

Fixes #2066.

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

@romerojosh romerojosh requested a review from tgaddair July 2, 2020 17:37
@romerojosh
Copy link
Collaborator Author

@tgaddair As discussed, would we be open to removing support for the very old TF release in our tests (v1.6) to unblock this PR?

@tgaddair
Copy link
Collaborator

tgaddair commented Jul 8, 2020

Sure, I can take a stab at that today.

Signed-off-by: Josh Romero <joshr@nvidia.com>
Signed-off-by: Josh Romero <joshr@nvidia.com>
Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I will create an issue around the comment I suggested. Feel free to merge in if you're ready.

worker_mean, worker_variance = super(SyncBatchNormalization, self)._moments(
inputs, reduction_axes, keep_dims=keep_dims)

if size() > 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to make this work with dynamic worker count in a follow-up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good to me. Thanks for pointing that out and opening the issue.

@weiminggao
Copy link

I test batch norm in tf 1.14 with graph mode , it does not work. Why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Sync Batch Norm for Tensorflow
3 participants