Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support backward_passes_per_step > 1 for TF Keras Eager Execution #2371

Merged
merged 5 commits into from
Oct 16, 2020

Conversation

aaron276h
Copy link
Contributor

@aaron276h aaron276h commented Oct 13, 2020

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

This PR is a follow up #2346. This PR adds support for backward_passes_per_step > 1 for TF Keras optimizers executing in eager mode. This is one of the features that we have built into Determined AI's fork of Horovod that we would like to upstream.

@aaron276h aaron276h changed the title [WIP] Support backward_passes_per_step > 1 for TF Keras Eager Execution Support backward_passes_per_step > 1 for TF Keras Eager Execution Oct 13, 2020
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@aaron276h
Copy link
Contributor Author

PR is currently failing: test-cpu-gloo-py3_8-tf2_3_0-keras2_3_1-torch1_6_0-mxnet1_5_0-pyspark3_0_1 but doesn't seem related to the changes, likely a py3.8 issue.

@tgaddair
Copy link
Collaborator

Hey @aaron276h, landed a fix for the Python 3.8n issue, please rebase and force push.

Signed-off-by: aaron276h <aaron@determined.ai>
Signed-off-by: aaron276h <aaron@determined.ai>
@aaron276h
Copy link
Contributor Author

Thanks @tgaddair!

Signed-off-by: aaron276h <aaron@determined.ai>
Signed-off-by: aaron276h <aaron@determined.ai>
@github-actions
Copy link

Unit Test Results

   395 files     395 suites   2h 47m 29s ⏱️
   508 tests    446 ✔️      60 💤   2 ✖️
7 731 runs  5 807 ✔️ 1 911 💤 13 ✖️

results for commit 6a08ba9

@github-actions

This comment has been minimized.

Signed-off-by: aaron276h <aaron@determined.ai>
@github-actions
Copy link

Unit Test Results

   521 files  -     60     521 suites  -60   4h 24m 19s ⏱️ - 5m 23s
   508 tests +       1     482 ✔️ +    1       26 💤 ±    0  0 ✖️ ±0 
9 876 runs  -1 190  7 873 ✔️ -896  2 003 💤 -294  0 ✖️ ±0 

results for commit 9b3b2cf ± comparison against base commit 9aaeb06

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgaddair tgaddair merged commit 5f9df54 into horovod:master Oct 16, 2020
@github-actions

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants