generated from google/docsy-example
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
130 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
--- | ||
title: "OSMI" | ||
linkTitle: "OSMI" | ||
weight: 100 | ||
description: > | ||
We explore the relationship between certain network configurations and the performance of distributed Machine | ||
Learning systems. We build upon the Open Surrogate Model Inference (OSMI) Benchmark, a distributed inference | ||
benchmark for analyzing the performance of machine-learned surrogate models | ||
--- | ||
|
||
## Heading | ||
|
||
|
||
Edit this template to create your new page. | ||
|
||
|
||
|
||
We explore the relationship between certain network configurations and | ||
the performance of distributed Machine Learning systems. We build upon | ||
the Open Surrogate Model Inference (OSMI) Benchmark, a distributed | ||
inference benchmark for analyzing the performance of machine-learned | ||
surrogate models developed by Wes Brewer et. Al. We focus on analyzing | ||
distributed machine-learning systems, via machine-learned surrogate | ||
models, across varied hardware environments. By deploying the OSMI | ||
Benchmark on platforms like Rivanna HPC, WSL, and Ubuntu, we offer a | ||
comprehensive study of system performance under different | ||
configurations. The paper presents insights into optimizing | ||
distributed machine learning systems, enhancing their scalability and | ||
efficiency. We also develope a framework for automating the OSMI | ||
benchmark. | ||
|
||
|
||
## Introdcution | ||
|
||
|
||
With the proliferation of machine learning as a tool for science, the | ||
need for efficient and scalable systems is paramount. This paper | ||
explores the Open Surrogate Model Inference (OSMI) Benchmark, a tool | ||
for testing the performance of machine-learning systems via | ||
machine-learned surrogate models. The OSMI Benchmark, originally | ||
created by Wes Brewer and colleagues, serves to evaluate various | ||
configurations and their impact on system performance. | ||
|
||
Our research pivots around the deployment and analysis of the OSMI | ||
Benchmark across various hardware platforms, including the | ||
high-performance computing (HPC) system Rivanna, Windows Subsystem for | ||
Linux (WSL), and Ubuntu environments. | ||
|
||
In each experiment, there are a variable number of TensorFlow model | ||
server instances, overseen by a HAProxy load balancer that distributes | ||
inference requests among the servers. Each server instance operates on | ||
a dedicated GPU, choosing between the V100 or A100 GPUs available on | ||
Rivanna. This setup mirrors real-world scenarios where load balancing | ||
is crucial for system efficiency. | ||
|
||
On the client side, we initiate a variable number of concurrent | ||
clients executing the OSMI benchmark to simulate different levels of | ||
system load and analyze the corresponding inference throughput. | ||
|
||
On top of the original OSMI-Bench, we implemented an object-oriented | ||
interface in Python for running experiments with ease, streamlining | ||
the process of benchmarking and analysis. The experiments rely on | ||
custom-built images based on NVIDIA's tensorflow image. The code works | ||
on several hardwares, assuming the proper images are built. | ||
|
||
Additionally, We develop a script for launching simultaneous | ||
experiments with permutations of pre-defined parameters with Cloudmesh | ||
Experiment-Executor. The Experiment Executor is a tool that automates | ||
the generation and execution of experiment variations with different | ||
parameters. This automation is crucial for conducting tests across a | ||
spectrum of scenarios. | ||
|
||
Finally, we analyze the inference throughput and total time for each | ||
experiment. By graphing and examining these results, we draw critical | ||
insights into the performand | ||
ce dynamics of distributed machine learning | ||
systems. | ||
|
||
In summary, a comprehensive examination of the OSMI Benchmark in | ||
diverse distributed ML systems is provided. We aim to contribute to | ||
the optimization of these systems, by providing a framework for | ||
finding the best performant system configuration for a given use | ||
case. Our findings pave the way for more efficient and scalable | ||
distributed computing environments. | ||
|
||
[^1][^2] | ||
|
||
## References | ||
|
||
[^1]: Brewer, Wesley, Daniel Martinez, Mathew Boyer, Dylan Jude, Andy | ||
Wissink, Ben Parsons, Junqi Yin, and Valentine Anantharaj. "Production | ||
Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC." In | ||
2021 IEEE/ACM Workshop on Machine Learning in High Performance | ||
Computing Environments (MLHPC), pp. 21-32. IEEE, | ||
2021. <https://ieeexplore.ieee.org/abstract/document/9652868>. Note | ||
that OSMI-Bench differs from SMI-Bench described in the paper only in | ||
that the models that are used in OSMI are trained on synthetic data, | ||
whereas the models in SMI were trained using data from proprietary CFD | ||
simulations. Also, the OSMI medium and large models are very similar | ||
architectures as the SMI medium and large models, but not identical. | ||
|
||
|
||
[^2]: Gregor von Laszewski, J. P. Fleischer, and Geoffrey | ||
C. Fox. 2022. Hybrid Reusable Computational Analytics Workflow | ||
Management with Cloudmesh. https://doi.org/10.48550/ARXIV.2210.16941 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
title: "Cosmoflow" | ||
linkTitle: "Cosmoflow" | ||
weight: 15 | ||
description: > | ||
The CosmoFlow training application benchmark from the MLPerf HPC v0.5 benchmark suite. It involves training a 3D convolutional neural network on N-body cosmology simulation data to predict physical parameters of the universe. | ||
resources: | ||
- src: "**.{png,jpg}" | ||
title: "Image #:counter" | ||
--- | ||
|
||
## Overview | ||
|
||
This application is based on the original CosmoFlow paper presented at SC18 and continued by the ExaLearn project, and adopted as a benchmark in the MLPerf HPC suite. It involves training a 3D convolutional neural network on N-body cosmology simulation data to predict physical parameters of the universe. The reference implementation for MLPerf HPC v0.5 CosmoFlow uses TensorFlow with the Keras API and Horovod for data-parallel distributed training. The dataset comes from simulations run by ExaLearn, with universe volumes split into cubes of size 128x128x128 with 4 redshift bins. The total dataset volume preprocessed for MLPerf HPC v0.5 in TFRecord format is 5.1 TB. The target objective in MLPerf HPC v0.5 is to train the model to a validation mean-average-error < 0.124. However, the problem size can be scaled down and the training throughput can be used as the primary objective for a small scale or shorter timescale benchmark.[^1][^2][^3] | ||
|
||
|
||
## References | ||
|
||
|
||
[^1]: <https://proxyapps.exascaleproject.org/app/mlperf-cosmoflow/> | ||
|
||
[^2]: <https://github.com/sparticlesteve/cosmoflow-benchmark> | ||
|
||
[^3]: <https://github.com/sparticlesteve/cosmoflow-benchmark/blob/master/README.md> |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.