Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance analysis to hvd #412

Merged
merged 10 commits into from Aug 19, 2020
Merged

Conversation

zilbermanor
Copy link
Contributor

@zilbermanor zilbermanor commented Aug 19, 2020

Added easy performance analysis and tuning capabilities to MLRun's MPIJob runtime.

Horovod Autotune

<mpi-function>.with_autotune(log_file_path: str, warmup_samples: int, steps_per_sample: int, bayes_opt_max_samples: int, 
gaussian_process_noise: float)

Adds an Autotuner to help optimize Horovod's Parameters for better performence. The autotuner will collect metrics and tune horovod's parameters while running using Bayesian optimiation. This may affect the performence of the run initially but after arriving to the best parameters should increase performence.

Since autotuning imposes a tradeoff between early performence for better performence later on, It's advised to enable it when both:

  • Training should take a long timeout
  • Scaling efficiency was found lacking with the default settings

Horovod Timeline
<mpi-function>.with_tracing(log_file_path: str, enable_cycle_markers: bool)
Add Horovod Timeline activity tracking to the job to analyse its performence. The data will be saved as JSON to {log_file_path}. It can then be viewed via a trace viewer like chrome or edge's edge://tracing.

Copy link
Contributor

@Hedingber Hedingber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one little typo

mlrun/runtimes/mpijob/abstract.py Outdated Show resolved Hide resolved
@Hedingber Hedingber merged commit bc992e9 into development Aug 19, 2020
@zilbermanor zilbermanor deleted the add-performance-analysis-to-hvd branch November 30, 2020 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants