Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure report only account for enabled benches, handle missing bench more clearly #205

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions docs/process.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
Request For proposal
====================

Preparing
---------

1. Make sure milabench support the targetted hardware

* NVIDIA
* AMD

2. Create a milabench configuration for your RFP
Milabench comes with a wide variety of benchmarks.
You should select and weight each benchmarks according to your
target hardware.

.. code-block:: yaml

include:
- base.yaml

llama:
enabled: true
weight: 1.0

resnet50:
enabled: true
weight: 1.0


.. code-block:: yaml

milabench resolve myconfig.yaml > RFP.yaml


3. Prepare a container for your RFP


.. code-block::

FROM milabench:cuda-v1.2.3

COPY RFP.yaml .../RFP.yaml

ENV MILABENCH_CONFIG=".../RFP.yaml

CMD milabench run


4. Hot fixes

* Disable a benchmarks
* update container


Vendor Instructions
-------------------

1. Vendor needs to create a system configuration that will
specify the different compute nodes that will be used by milabench

.. code-block::

system:
sshkey: <privatekey>
arch: cuda
docker_image: ghcr.io/mila-iqia/milabench:cuda-nightly

nodes:
- name: node1
ip: 192.168.0.25
main: true
port: 8123
user: <username>

- name: node2
ip: 192.168.0.26
main: false
user: <username>


2. Run milabench

.. code-block::

export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly

# create ...
mkdir -p configs
mkdir -p results

# put your vendor specific configuration
vi configs/system.yaml

#
docker pull $MILABENCH_IMAGE

# run milabench
docker run -it --rm --gpus all --network host --ipc=host --privileged \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v $(pwd)/results:/milabench/envs/runs \
-v $(pwd)/configs:/milabench/envs/configs \
$MILABENCH_IMAGE \
milabench run --system /milabench/envs/configs/system.yaml
6 changes: 3 additions & 3 deletions milabench/_version.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""This file is generated, do not modify"""

__tag__ = "v0.0.6-41-g932e30e"
__commit__ = "932e30e79513fdd2448cedaf98a003bb4b5b9148"
__date__ = "2024-01-17 14:33:14 -0500"
__tag__ = "v0.0.6-47-g015ce01"
__commit__ = "015ce01e6ccec87e60c8f240f4b090d550ff62bc"
__date__ = "2024-02-27 18:15:05 +0000"
4 changes: 4 additions & 0 deletions milabench/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from .slurm import cli_slurm_system
from .sql import cli_sqlsetup
from .summary import cli_summary
from .resolve import cli_resolve


class Main:
Expand Down Expand Up @@ -82,6 +83,9 @@ def sqlsetup():
def write_report_to_pr():
cli_write_report_to_pr()

def resolve():
cli_resolve()


def main(argv=None):
sys.path.insert(0, os.path.abspath(os.curdir))
Expand Down
88 changes: 88 additions & 0 deletions milabench/cli/resolve.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
from dataclasses import dataclass

from coleo import Option, tooled

from milabench.config import _config_layers, merge


# fmt: off
@dataclass
class Arguments:
config : str
lean: bool = False
# fmt: on


@tooled
def arguments():
# The name of the benchmark to develop
config: Option & str
lean: Option & bool = False

return Arguments(config, lean)


@tooled
def cli_resolve(args=None):
"""Generate a configuration"""

if args is None:
args = arguments()

overrides = {}
configs = [args.config, overrides]

config = {}
for layer in _config_layers(configs):
config = merge(config, layer)

wip_config = {}
parents = []

#
# Only keep enabled benchmarks
#
for benchname, benchconfig in config.items():
is_enabled = benchconfig.get("enabled", False)
is_weighted = benchconfig.get("weight", 0)

parent = benchconfig.get("inherits", None)

if parent:
parents.append(parent)

condition = is_enabled
if args.lean:
condition = is_enabled and is_weighted

if condition:
wip_config[benchname] = benchconfig

#
# Keep the parents as well
#
parents = set(parents)
for parent in parents:
wip_config[parent] = config[parent]

#
# Remove resolved fields
#
resolved = ["dirs", "config_file", "config_base"]
for benchname, benchconfig in wip_config.items():
for field in resolved:
benchconfig.pop(field, None)

#
# Finished
#

import yaml

print(yaml.dump(wip_config))


if __name__ == "__main__":
args = Arguments("/workspaces/milabench/config/standard.yaml")

cli_resolve(args)
12 changes: 6 additions & 6 deletions milabench/log.py
Original file line number Diff line number Diff line change
Expand Up @@ -300,9 +300,9 @@ def on_data(self, entry, data, row):
load = int(data.get("load", 0) * 100)
currm, totalm = data.get("memory", [0, 0])
temp = int(data.get("temperature", 0))
row[
f"gpu:{gpuid}"
] = f"{load}% load | {currm:.0f}/{totalm:.0f} MB | {temp}C"
row[f"gpu:{gpuid}"] = (
f"{load}% load | {currm:.0f}/{totalm:.0f} MB | {temp}C"
)
row["gpu_load"] = f"{load}%"
row["gpu_mem"] = f"{currm:.0f}/{totalm:.0f} MB"
row["gpu_temp"] = f"{temp}C"
Expand Down Expand Up @@ -376,9 +376,9 @@ def on_data(self, entry, data, row):
load = int(data.get("load", 0) * 100)
currm, totalm = data.get("memory", [0, 0])
temp = int(data.get("temperature", 0))
row[
f"gpu:{gpuid}"
] = f"{load}% load | {currm:.0f}/{totalm:.0f} MB | {temp}C"
row[f"gpu:{gpuid}"] = (
f"{load}% load | {currm:.0f}/{totalm:.0f} MB | {temp}C"
)
else:
task = data.pop("task", "")
units = data.pop("units", "")
Expand Down
1 change: 0 additions & 1 deletion milabench/merge.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
"""Utilities to merge dictionaries and other data structures."""


from collections import deque
from functools import reduce
from typing import Union
Expand Down
Loading
Loading