Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Horovod Timeline and add Cycle Markers #782

Merged
merged 10 commits into from Jan 26, 2019

Conversation

4 participants
@alsrgv
Copy link
Collaborator

alsrgv commented Jan 25, 2019

This change introduces a separate thread for writing timeline to disk that uses a lock-free queue to receive timeline records from the main thread.

With this change, tensor accounting during negotiation is taking 1.0-1.2 microsecond per tensor x rank, while it used to take 3+ microseconds.

@alsrgv alsrgv requested review from abditag2 , tgaddair and sblotner Jan 25, 2019

@alsrgv alsrgv self-assigned this Jan 25, 2019

@abditag2
Copy link
Collaborator

abditag2 left a comment

Looks great.

for (unsigned rc = 0; rc < horovod_global.size; rc++) {
for (unsigned ec = 0; ec < entries.size(); ec++) {
for (int rc = 0; rc < horovod_global.size; rc++) {
for (size_t ec = 0; ec < entries.size(); ec++) {

This comment has been minimized.

@abditag2

abditag2 Jan 25, 2019

Collaborator

any reason you use int in one loop and size_t in the other?

This comment has been minimized.

@alsrgv

alsrgv Jan 25, 2019

Author Collaborator

Underlying types are different. horovod_global.size is int, while entries.size() are size_t. This change is to avoid compilation warnings.


def test_timeline(self):
with tempfile.NamedTemporaryFile() as t:
os.environ['HOROVOD_TIMELINE'] = t.name

This comment has been minimized.

@tgaddair

tgaddair Jan 25, 2019

Collaborator

Safer thing to do would be to use:

from unittest.mock import patch
...
with patch.dict('os.environ', {'HOROVOD_TIMELINE': t.name, 'HOROVOD_TIMELINE_MARK_CYCLES': '1'}):
   ...
@sblotner
Copy link
Collaborator

sblotner left a comment

Minor doc edit, otherwise looks good


![Cycle Markers](https://user-images.githubusercontent.com/16640218/51659458-64806100-1f5f-11e9-9a27-ba934ceec75f.png)

Since this information makes timeline view very crowded it is not enabled by default.

This comment has been minimized.

@sblotner

sblotner Jan 25, 2019

Collaborator

add a comma after "crowded"

alsrgv added some commits Jan 25, 2019

@alsrgv alsrgv force-pushed the cycle_marker branch 4 times, most recently from ba04ef0 to d6a150a Jan 26, 2019

@alsrgv alsrgv force-pushed the cycle_marker branch from d6a150a to 4fe2ee1 Jan 26, 2019

@alsrgv alsrgv merged commit c314949 into master Jan 26, 2019

3 checks passed

License Compliance All checks passed.
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details

@alsrgv alsrgv deleted the cycle_marker branch Jan 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.