MLAgility Tools User Guide

The MLAgility Benchmarking and Tools package provides a CLI, benchit, and Python API for benchmarking machine learning and deep learning models. This document reviews the functionality provided by MLAgility. If you are looking for repo and code organization, you can find that here.

For a hands-on learning approach, check out the benchit CLI tutorials.

MLAgility's tools currently support the following combinations of runtimes and devices:

Device Type	Device arg	Runtime	Runtime arg	Specific Devices
Nvidia GPU	nvidia	TensorRT^†	trt	Any Nvidia GPU supported by TensorRT
x86 CPU	x86	ONNX Runtime^‡, Pytorch Eager^§, Pytoch 2.x Compiled^*§	ort, torch-eager, torch-compiled	Any Intel or AMD CPU supported by the runtime
Groq	groq	GroqFlow	groq	GroqChip1

^† Requires TensorRT >= 8.5.2
^‡ Requires ONNX Runtime >= 1.13.1
^* Requires Pytorch >= 2.0.0
^§ Only available on local backend

Just Benchmark It

The simplest way to get started with MLAgility's tools is to use our benchit command line interface (CLI), which can take any python script that instantiates and calls PyTorch model(s) and benchmark them on any supported device and runtime.

On your command line:

pip install mlagility
benchit your_script.py --device x86

Example Output:

> Performance of YourModel on device Intel® Xeon® Platinum 8380 is:
> latency: 0.033 ms
> throughput: 21784.8 ips

Where your_script.py is a Python script that instantiates and executes a PyTorch model named YourModel. The benchmarking results are also saved to a build directory in the MLAgility cache (see Build).

The benchit CLI performs the following steps:

Analysis: profile the Python script to identify the PyTorch models within
Build: call the benchmark_script() API to prepare each model for benchmarking
Benchmark: call the benchmark_model() API on each model to gather performance statistics

Note: The benchmarking methodology is defined here. If you are looking for more detailed instructions on how to install mlagility, you can find that here.

For a detailed example, see the CLI Hello World tutorial.

benchit can also benchmark ONNX files with a command like benchit your_model.onnx. See the CLI ONNX tutorial for details. However, the majority of this document focuses on the use case of passing .py scripts as input to benchit.

The MLAgility API

Most of the functionality provided by the benchit CLI is also available in the MLAgility API:

mlagility.benchmark_script() provides the same benchmarking functionality as the benchit CLI: it takes a script and target device, and returns performance results.
mlagility.benchmark_model() provides a subset of this functionality: it takes a model and its inputs, and returns performance results.
- The main difference is that benchmark_model() does not include the Analysis feature, and benchmark_script() does.

Generally speaking, the benchit CLI is a command line interface for the benchmark_script() API, which internally calls benchmark_model(). You can read more about this code organization here.

For example, the following script:

from mlagility import benchmark_model

model = YourModel()
results = model(**inputs)
perf = benchmark_model(model, inputs)

Will print an output like this:

> Performance of YourModel on device Intel® Xeon® Platinum 8380 is:
> latency: 0.033 ms
> throughput: 21784.8 ips

benchmark_model() returns a MeasuredPerformance object that includes members:

latency_units: unit of time used for measuring latency, which is set to milliseconds (ms).
mean_latency: average benchmarking latency, measured in latency_units.
throughput_units: unit used for measuring throughput, which is set to inferences per second (IPS).
throughput: average benchmarking throughput, measured in throughput_units.

Note: The benchmarking methodology is defined here.

Definitions

MLAgility uses the following definitions throughout.

Model

A model is a PyTorch (torch.nn.Module) instance that has been instantiated in a Python script.

Examples: BERT-Base, ResNet-50, etc.

Device

A device is a piece of hardware capable of running a model.

Examples: Nvidia A100 40GB, Intel Xeon Platinum 8380, Groq GroqChip1

Runtime

A runtime is a piece of software that executes a model on a device.

Different runtimes can produce different performance results on the same device because:
- Runtimes often optimize the model prior to execution.
- The runtime is responsible for orchestrating data movement, device invocation, etc.
Examples: ONNX Runtime, TensorRT, PyTorch Eager Execution, etc.

Analysis

Analysis is the process by which benchmark_script() inspects a Python script and identifies the PyTorch models within.

benchmark_script() performs analysis by running and profiling your script. When a model object (see Model is encountered, it is inspected to gather statistics (such as the number of parameters in the model) and/or pass it to the benchmark_model() API for benchmarking.

Note: the benchit CLI and benchmark_script() API both run your entire script. Please ensure that your script is safe to run, especially if you got it from the internet.

See the Multiple Models per Script tutorial for a detailed example of how analysis can discover multiple models from a single script.

Model Hashes

Each model in a script is identified by a unique hash. The analysis phase of benchmark_script() will display the hash for each model. The build phase will save exported models to into the cache according to the naming scheme {script_name}_{hash}.

For example:

benchit example.py --analyze-only

> pytorch_model (executed 1x - 0.15s)
>        Model Type:     Pytorch (torch.nn.Module)
>        Class:          SmallModel (<class 'linear_auto.SmallModel'>)
>        Location:       linear_auto.py, line 19
>        Parameters:     55 (<0.1 MB)
>        Hash:           479b1332

Labels

Each script may have one or more labels which correspond to a set of key-value pairs that can be used as attributes of that given script. Labels must be in the first line of a .py file and are identified by the pragma #labels. Keys are separated from values by :: and each label key may have one or more label values as shown in the example below:

For example:

#labels domain::nlp author::google task::question_answering,translation

Once a script has been benchmarked, all labels that correspond to that script will also be stored as part of the cache folder.

Build

Build is the process by which the benchmark_model() API consumes a model and produces ONNX files, Groq executables, and other artifacts needed for benchmarking.

We refer to this collection of artifacts as the build directory and store each build in the MLAgility cache for later use.

We leverage ONNX files because of their broad compatibility with model frameworks (PyTorch, Keras, etc.), software (ONNX Runtime, TensorRT, Groq Compiler, etc.), and devices (CPUs, GPUs, GroqChip processors, etc.). You can learn more about ONNX here.

The build functionality of benchmark_model() includes the following steps:

Take a model object and a corresponding set of inputs*.
Check the cache for a successful build we can load. If we get a cache hit, the build is done. If no build is found, or the build in the cache is stale**, continue.
Pass the model and inputs to the ONNX exporter corresponding to the model's framework (e.g., PyTorch models use torch.onnx.export()).
Use ONNX Runtime and ONNX ML tools to optimize the model and convert it to float16, respectively.
[If the build's device type is groq] Pass the optimized float16 ONNX file to Groq Compiler and Assembler to produce a Groq executable.
Save the successful build to the cache for later use.

*Note: Each build corresponds to a set of static input shapes. inputs are passed into the benchmark_model() API to provide those shapes.

**Note: A cached build can be stale because of any of the following changes since the last build:

The model changed

The shape of the inputs changed

The arguments to benchmark_model() changed

MLAgility was updated to a new, incompatible version

Benchmark

Benchmark is the process by which the benchmark_model() API collects performance statistics about a model. Specifically, benchmark_model() takes a build of a model and executes it on a target device using target runtime software (see Devices and Runtimes).

By default, benchmark_model() will run the model 100 times to collect the following statistics:

Mean Latency, in milliseconds (ms): the average time it takes the runtime/device combination to execute the model/inputs combination once. This includes the time spent invoking the device and transferring the model's inputs and outputs between host memory and the device (when applicable).
Throughput, in inferences per second (IPS): the number of times the model/inputs combination can be executed on the runtime/device combination per second.
- Note: benchmark_model() is not aware of whether inputs is a single input or a batch of inputs. If your inputs is actually a batch of inputs, you should multiply benchmark_model()'s reported IPS by the batch size.

Devices and Runtimes

MLAgility can be used to benchmark a model across a variety of runtimes and devices, as long as the device is available and the device/runtime combination is supported by MLAgility.

Available Devices

MLAgility supports benchmarking on both locally installed devices (including x86 CPUs / NVIDIA GPUs), as well as devices on remote machines (e.g., remote VMs).

If you are using a remote machine, it must:

turned on
be available via SSH
include the target device
have miniconda, python>=3.8, and docker>=20.10 installed

When you call benchit CLI or benchmark_model(), the following actions are performed on your behalf:

Perform a build, which exports all models from the script to ONNX and prepares for benchmarking.
- If the device type selected is groq, this step also compiles the ONNX file into a Groq executable.
[Remote mode only] ssh into the remote machine and transfer the build.
Set up the benchmarking environment by loading a container and/or setting up a conda environment.
Run the benchmarks.
[Remote mode only] Transfer the results back to your local machine.

Arguments

The following arguments are used to configure benchit and the APIs to target a specific device and runtime:

Devices

Specify a device type that will be used for benchmarking.

Usage:

benchit benchmark INPUT_FILES --device TYPE
- Benchmark the model(s) in INPUT_FILES on a locally installed device of type TYPE (eg, a locally installed Nvidia device).

Valid values of TYPE include:

x86 (default): Intel and AMD x86 CPUs.
nvidia: Nvidia GPUs.
groq: Groq GroqChip processors.

Note: MLAgility is flexible with respect to which specific devices can be used, as long as they meet the requirements in the Devices and Runtimes table.

The benchit() API will simply use whatever device, of the given TYPE, is available on the machine.

For example, if you specify --device nvidia on a machine with an Nvidia A100 40GB installed, then MLAgility will use that Nvidia A100 40GB device.

Also available as API arguments:

benchmark_script(device=...)
benchmark_model(device=...).

For a detailed example, see the CLI Nvidia tutorial.

Backend (future feature)

Future feature, not yet supported.

Indicates whether the device is installed on the local machine or a remote machine. Device on a remote machine is accessed over SFT SSH.

Usage:

benchit benchmark INPUT_FILES --backend BACKEND

Valid values:

Defaults to local, indicating the device is installed on the local machine.
This can also be set to remote, indicating the target device is installed on a remote machine.

Note: while --backend remote is implemented, and we use it for our own purposes, it has some limitations and we do not recommend using it. The limitations are:

Currently requires Okta SFT authentication, which not everyone will have.

Not covered by our automatic testing yet.

Not all runtimes may be supported when using the remote backend as discussed in the runtime section.

Also available as API arguments:

benchmark_script(backend=...)
benchmark_model(backend=...)

Runtimes

Indicates which software runtime(s) should be used for the benchmark (e.g., ONNX Runtime vs. TensorRT for a GPU benchmark).

Usage:

benchit benchmark INPUT_FILES --runtime SW

Each device type has its own default runtime, as indicated below.

Valid runtimes for x86 device
- ort: ONNX Runtime (default).
- torch-eager: PyTorch eager execution.
- torch-compiled: PyTorch 2.x-style compiled graph execution using TorchInductor.
Valid runtimes for nvidia device
- trt: Nvidia TensorRT (default).
Valid runtimes for groq device
- groq: GroqFlow (default).

This feature is also be available as an API argument:

benchmark_script(runtimes=[...])
benchmark_model(runtime=...)

Note: torch-eager and torch-compiled are not available whe using the remote backend.

Note: Inputs to torch-eager and torch-compiled are not downcasted to FP16 by default. Downcast inputs before benchmarking for a fair comparison between runtimes.

Additional Commands and Options

benchit and the APIs provide a variety of additional commands and options for users.

The default usage of benchit is to directly provide it with a python script, for example benchit example.py --device x86. However, benchit also supports the usage benchit COMMAND, to accomplish some additional tasks.

Note: Some of these tasks have to do with the MLAgility cache, which stores the build directories (see Build).

The commands are:

benchmark (default command): benchmark the model(s) in one or more scripts
cache list: list the available builds in the cache
cache print: print the state of a build from the cache
cache delete: delete one or more builds from the cache
cache report: print a report in .csv format summarizing the results of all builds in a cache
version: print the benchit version number

You can see the options available for any command by running benchit COMMAND --help.

`benchmark` Command

The benchmark command supports the arguments from Devices and Runtimes, as well as:

Input Files

Name of one or more script (.py) or ONNX (.onnx) files to be benchmarked.

Examples:

benchit models/selftest/linear.py
benchit models/selftest/linear.py models/selftest/twolayer.py
benchit examples/cli/onnx/sample.onnx

Support for .py scripts is also available as an API argument:

benchmark_script(input_scripts=...)

You may also use Bash regular expressions to locate the scripts you want to benchmark.

Examples:

benchit *.py
- Benchmark all scripts which can be found at the current working directory.
benchit models/*/*.py
- Benchmark the entire corpora of MLAgility models.
benchit *.onnx
- Benchmark all ONNX files which can be found at the current working directory.

See the Benchmark Multiple Scripts tutorial for a detailed example.

You can also leverage model hashes (see Model Hashes) to filter which models in a script will be acted on, in the following manner:

benchit example.py::hash_0 will only benchmark the model corresponding to hash_0.
You can also supply multiple hashes, for example benchit example.py::hash_0,hash_1 will benchmark the models corresponding to both hash_0 and hash_1.

Note: Using bash regular expressions and filtering model by hashes are mutually exclusive. To filter models by hashes, provide the full path of the Python script rather than a regular expression.

See the Filtering Model Hashes tutorial for a detailed example.

Additionally, you can leverage labels (see Labels) to filter which models in a script will be acted on, in the following manner:

benchit *.py --labels test_group::a will only benchmark the scripts labels with test_group::a.
You can also supply multiple labels, for example benchit *.py --labels test_group::a domain::nlp only benchmark scripts that have both test_group::a, and domain::nlp labels.

Note: Using bash regular expressions and filtering model by hashes are mutually exclusive. To filter models by hashes, provide the full path of the Python script rather than a regular expression.

Note: ONNX file input currently supports only models of size less than 2 GB. ONNX files passed directly into benchit *.onnx are benchmarked as-is without applying any additional build stages.

Use Slurm

Execute the build(s) and benchmark(s) on Slurm instead of using local compute resources. Each input runs in its own Slurm job.

Usage:

benchit benchmark INPUT_FILES --use-slurm
- Use Slurm to run benchit on INPUT_FILES.
benchit benchmark SEARCH_DIR/*.py --use-slurm
- Use Slurm to run benchit on all scripts in the search directory. Each script is evaluated as its on Slurm job (ie, all scripts can be evaluated in parallel on a sufficiently large Slurm cluster).

Available as an API argument:

benchmark_script(use_slurm=True/False) (default False)

Note: Requires setting up Slurm as shown here.

Note: while --use-slurm is implemented, and we use it for our own purposes, it has some limitations and we do not recommend using it. Currently, benchit requires Slurm to be configured the same way that it is configured at Groq, which not everyone will have. Please contact the developers by filing an issue if you need Slurm support for your project.

Note: Slurm mode applies a timeout to each job, and will cancel the job move if the timeout is exceeded. See Set the Timeout

Process Isolation

Evaluate each benchit input in its own isolated subprocess. This option allows the main process to continue on to the next input if the current input fails for any reason (e.g., a bug in the input script, the operating system running out of memory, incompatibility between a model and the selected benchmarking runtime, etc.).

Usage:

benchit benchmark INPUT_FILES --process-isolation

Also available as an API argument:

benchmark_script(process_isolation=True/False) (default False)

Note: Process isolation mode applies a timeout to each subprocess, and will move on to the next input if the timeout is exceeded. See Set the Timeout

Note: Process isolation mode is mutually exclusive with:

Slurm mode.

Passing a Sequence instance to benchmark_script() (sequence files are still allowed, see Sequence File).

Cache Directory

-d CACHE_DIR, --cache-dir CACHE_DIR MLAgility build cache directory where the resulting build directories will be stored (defaults to ~/.cache/mlagility).

Also available as API arguments:

benchmark_script(cache_dir=...)
benchmark_model(cache_dir=...)

See the Cache Directory tutorial for a detailed example.

Lean Cache

--lean-cache Delete all build artifacts except for log files after the build.

Also available as API arguments:

benchmark_script(lean_cache=True/False, ...) (default False)
benchmark_model(lean_cache=True/False, ...) (default False)

Note: useful for benchmarking many models, since the build artifacts from the models can take up a significant amount of hard drive space.

See the Lean Cache tutorial for a detailed example.

Rebuild Policy

--rebuild REBUILD Sets a cache policy that decides whether to load or rebuild a cached build.

Takes one of the following values:

Default: "if_needed" will use a cached model if available, build one if it is not available,and rebuild any stale builds.
Set "always" to force benchit to always rebuild your model, regardless of whether it is available in the cache or not.
Set "never" to make sure benchit never rebuilds your model, even if it is stale. benchitwill attempt to load any previously built model in the cache, however there is no guarantee it will be functional or correct.

Also available as API arguments:

benchmark_script(rebuild=...)
benchmark_model(rebuild=...)

See the GroqFlow rebuild examples to learn more.

Sequence File

Replaces the default build sequence in benchmark_model() with a custom build sequence, defined in a Python script.

Usage:

benchit benchmark INPUT_FILES --sequence-file FILE

This script must define a function, get_sequence(), that returns an instance of onnxflow.common.stage.Sequence. See examples/extras/example_sequence.py for an example of a sequence file.

Also available as API arguments:

benchmark_script(sequence=...)
benchmark_model(sequence=...)

Note: the sequence argument to benchmark_script() can be either a sequence file or a Sequence instance. The sequence argument to benchmark_model() must be a Sequence instance.

See the Sequence File tutorial for a detailed example.

Set Script Arguments

Sets command line arguments for the input script. Useful for customizing the behavior of the input script, for example sweeping parameters such as batch size. Format these as a comma-delimited string.

Usage:

benchit benchmark INPUT_FILES --script-args="--batch_size=8,--max_seq_len=128"
- This will evaluate the input script with the arguments --batch_size=8 and --max_seq_len=128 passed into the input script.

Also available as an API argument:

benchmark_script(script_args=...)

See the Parameters documentation for a detailed example.

Maximum Analysis Depth

Depth of sub-models to inspect within the script. Default value is 0, indicating to only analyze models at the top level of the script. Depth of 1 would indicate to analyze the first level of sub-models within the top-level models.

Usage:

benchit benchmark INPUT_FILES --max-depth DEPTH

Also available as an API argument:

benchmark_script(max_depth=...) (default 0)

Note: --max-depth values greater than 0 are only supported for PyTorch models.

See the Maximum Analysis Depth tutorial for a detailed example.

ONNX Opset

ONNX opset to be used when creating ONNX files, for example when calling torch.onnx.export.

Usage:

benchit benchmark INPUT_FILES --onnx-opset 16

Also available as API arguments:

benchmark_script(onnx_opset=...)
benchmark_models(onnx_opset=...)

Note: ONNX opset can also be set by an environment variable. The --onnx-opset argument takes precedence over the environment variable. See MLAGILITY_ONNX_OPSET.

Analyze Only

Instruct benchit or benchmark_model() to only run the Analysis phase of the benchmark command.

Usage:

benchit benchmark INPUT_FILES --analyze-only
- This discovers models within the input script and prints information about them, but does not perform any build or benchmarking.

Note: any build- or benchmark-specific options will be ignored, such as --backend, --device, --groqview, etc.

Also available as an API argument:

benchmark_script(analyze_only=True/False) (default False)

See the Analyze Only tutorial for a detailed example.

Build Only

Instruct benchit, benchmark_script(), or benchmark_model() to only run the Analysis and Build phases of the benchmark command.

Usage:

benchit benchmark INPUT_FILES --build-only
- This builds the models within the input script, however does not run any benchmark.

Note: any benchmark-specific options will be ignored, such as --backend.

Also available as API arguments:

benchmark_script(build_only=True/False) (default False)
benchmark_model(build_only=True/False) (default False)

See the Build Only tutorial for a detailed example.

Export Only

Instruct benchit, benchmark_script(), or benchmark_model() to only run the Analysis and Build phases of the benchmark command, and to stop the Build phase after exporting the ONNX file. Similar to Build Only, except that no optimization Stages will be applied to the ONNX file.

Usage:

benchit benchmark INPUT_FILES --export-only
- This exports ONNX files for the models within the input script, however does not optimize those ONNX files nor run any benchmark.

Note: any benchmark-specific options will be ignored, such as --backend.

Also available as API arguments:

benchmark_script(export_only=True/False) (default False)
benchmark_model(export_only=True/False) (default False)

Resume

Instruct benchit or benchmark_script() to skip over any input scripts that have been previously attempted.

For example:

benchit benchmark INPUT_FILES will benchmark everything in INPUT_FILES, regardless of whether benchmarking those scripts has been attempted previously.
benchit benchmark INPUT_FILES --resume will benchmark everything INPUT_FILES that has not been previously attempted.

The --resume behavior is useful for when you are benchmarking a large corpus of models, and one of the models crashes your run. If you repeat the same command, but with the --resume argument, then the new run will pick up where the last run left off, including skipping over any input scripts that crashed previously.

Note: if --resume is skipping over any input scripts that you do want to evaluate, you have two options:

Manually build the input script with benchit benchmark INPUT_SCRIPT without setting --resume

Start a new cache directory with --cache-dir NEW_CACHE_DIR or export MLAGILITY_CACHE_DIR=NEW_CACHE_DIR

Also available as an API argument:

benchmark_script(resume=True/False) (default False)

Groq-Specific Arguments

The following options are specific to Groq builds and benchmarks, and are passed into the GroqFlow build tool. Learn more about them in the GroqFlow user guide.

--groq-compiler-flags COMPILER_FLAGS [COMPILER_FLAGS ...] Sets the groqit(compiler_flags=...) arg within the GroqFlow build tool (default behavior is to use groqit()'s default compiler flags)
- Also available as API arguments: benchmark_script(groq_compiler_flags=...), benchmark_model(groq_compiler_flags=...).
--groq-assembler-flags ASSEMBLER_FLAGS [ASSEMBLER_FLAGS ...] Sets the groqit(assembler_flags=...) arg within the GroqFlow build tool (default behavior is to use groqit()'s default assembler flags)
- Also available as API arguments: benchmark_script(groq_assembler_flags=...), benchmark_model(groq_assembler_flags=...).
--groq-num-chips NUM_CHIPS Sets the groqit(num_chips=...) arg (default behavior is to let groqit() automatically select the number of chips)
- Also available as API arguments: benchmark_script(groq_num_chips=...), benchmark_model(groq_num_chips=...).
--groqview Enables GroqView for the build(s)
- Also available as API arguments: benchmark_script(groqview=True/False,), benchmark_model(groqview=True/False,).

Cache Commands

The cache commands help you manage the mlagility cache and get information about the builds and benchmarks within it.

`cache list` Command

benchit cache list prints the names of all of the builds in a build cache. It presents the following options:

-d CACHE_DIR, --cache-dir CACHE_DIR Search path for builds (defaults to ~/.cache/mlagility)

Note: cache list is not available as an API.

See the Cache Commands tutorial for a detailed example.

`cache stats` Command

benchit cache stats prints out the selected the build's state.yaml file, which contains useful information about that build. The state command presents the following options:

build_name Name of the specific build whose stats are to be printed, within the cache directory
-d CACHE_DIR, --cache-dir CACHE_DIR Search path for builds (defaults to ~/.cache/mlagility)

Note: cache stats is not available as an API.

See the Cache Commands tutorial for a detailed example.

`cache delete` Command

benchit cache delete deletes one or more builds from a build cache. It presents the following options:

build_name Name of the specific build to be deleted, within the cache directory
-d CACHE_DIR, --cache-dir CACHE_DIR Search path for builds (defaults to ~/.cache/mlagility)
--all Delete all builds in the cache directory

Note: cache delete is not available as an API.

See the Cache Commands tutorial for a detailed example.

`cache clean` Command

benchit cache clean removes the build artifacts from one or more builds from a build cache. It presents the following options:

build_name Name of the specific build to be cleaned, within the cache directory
-d CACHE_DIR, --cache-dir CACHE_DIR Search path for builds (defaults to ~/.cache/mlagility)
--all Clean all builds in the cache directory

Note: cache clean is not available as an API.

`cache report` Command

benchit cache report analyzes the state of all builds in a build cache and saves the result to a CSV file. It presents the following options:

-d CACHE_DIR, --cache-dir CACHE_DIR Search path for builds (defaults to ~/.cache/mlagility)
-r REPORT_DIR, --report-dir REPORT_DIR Path to folder where report will be saved (defaults to current working directory)

Note: cache report is not available as an API.

`cache location` Command

benchit cache location prints out the location of the default cache directory.

Note: Also available programmatically, with mlagility.filesystem.DEFAULT_CACHE_DIR

Models Commands

The models commands help you work with the models in the MLAgility benchmark.

`locate models` Command

benchit models location prints out the location of the models directory, which contains the MLAgility benchmark with over 1000 models. It presents the following options:

--quiet Command output will only include the directory path

Note: Also available programmatically, with mlagility.filesystem.MODELS_DIR

`version` Command

benchit version prints the version number of the installed mlagility package.

version does not have any options.

Note: version is not available as an API.

Environment Variables

There are some environment variables that can control the behavior of the MLAgility tools.

Overwrite the Cache Location

By default, the MLAgility tools will use ~/.cache/mlagility as the MLAgility cache location. You can override this cache location with the --cache-dir and cache_dir= arguments for the CLI and APIs, respectively.

However, you may want to override cache location for future runs without setting those arguments every time. This can be accomplished with the MLAGILITY_CACHE_DIR environment variable. For example:

export MLAGILITY_CACHE_DIR=~/a_different_cache_dir

Show Traceback

By default, benchit and benchmark_script() will display the traceback for any exceptions caught during model build. However, you may sometimes want a cleaner output on your terminal. To accomplish this, set the MLAGILITY_TRACEBACK environment variable to False, which will catch any exceptions during model build and benchmark and display a simple error message like Status: Unknown benchit error: {e}.

For example:

export MLAGILITY_TRACEBACK=False

Preserve Terminal Outputs

By default, benchit and benchmark_script() will erase the contents of the terminal in order to present a clean status update for each script and model evaluated.

However, you may want to see everything that is being printed to the terminal. You can accomplish this by setting the MLAGILITY_DEBUG environment variable to True. For example:

export MLAGILITY_DEBUG=True

Set the ONNX Opset

By default, benchit, benchmark_script(), and benchmark_model() will use the default ONNX opset defined in onnxflow.common.build.DEFAULT_ONNX_OPSET. You can set a different default ONNX opset by setting the MLAGILITY_ONNX_OPSET environment variable.

For example:

export MLAGILITY_ONNX_OPSET=16

Set the Timeout

benchit and benchmark_script() apply a timeout, mlagility.cli.spawn.DEFAULT_TIMEOUT_SECONDS, when evaluating each input script when in Slurm or process isolation modes. If the timeout is exceeded, evaluation of the current input script is terminated and the program moves on to the next input script.

This timeout can be overridden by setting the MLAGILITY_TIMEOUT_SECONDS environment variable.

For example:

export MLAGILITY_TIMEOUT_SECONDS=1800

would set the timeout to 1800 seconds (30 minutes).

Files

tools_user_guide.md

Latest commit

History

tools_user_guide.md

File metadata and controls

MLAgility Tools User Guide

Table of Contents

Just Benchmark It

The MLAgility API

Definitions

Model

Device

Runtime

Analysis

Model Hashes

Labels

Build

Benchmark

Devices and Runtimes

Available Devices

Arguments

Devices

Backend (future feature)

Runtimes

Additional Commands and Options

benchmark Command

Input Files

Use Slurm

Process Isolation

Cache Directory

Lean Cache

Rebuild Policy

Sequence File

Set Script Arguments

Maximum Analysis Depth

ONNX Opset

Analyze Only

Build Only

Export Only

Resume

Groq-Specific Arguments

Cache Commands

cache list Command

cache stats Command

cache delete Command

cache clean Command

cache report Command

cache location Command

Models Commands

locate models Command

version Command

Environment Variables

Overwrite the Cache Location

Show Traceback

Preserve Terminal Outputs

Set the ONNX Opset

Set the Timeout

`benchmark` Command

`cache list` Command

`cache stats` Command

`cache delete` Command

`cache clean` Command

`cache report` Command

`cache location` Command

`locate models` Command

`version` Command