Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools - Add runner for sys info and update docs #532

Merged
merged 17 commits into from
Jun 29, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
48 changes: 48 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,54 @@ sb run --no-docker --host-list localhost --config-override \
superbench.enable=kernel-launch superbench.env.SB_MICRO_PATH=/path/to/superbenchmark
```

### `sb node-info`

```bash title="SB CLI"
sb node-info [--docker-image]
[--docker-password]
[--docker-username]
[--host-file]
[--host-list]
[--host-password]
[--host-username]
[--no-image-pull]
[--output-dir]
[--private-key]
```

#### Optional arguments

| Name | Default | Description |
|-----------------------|-------------------------|-----------------------------------------------------------------------------------|
| `--docker-image` `-i` | `superbench/superbench` | Docker image URI, [here](./user-tutorial/container-images.mdx) listed all images. |
| `--docker-password` | `None` | Docker registry password if authentication is needed. |
| `--docker-username` | `None` | Docker registry username if authentication is needed. |
| `--host-file` `-f` | `None` | Path to Ansible inventory host file. |
| `--host-list` `-l` | `None` | Comma separated host list. |
| `--host-password` | `None` | Host password or key passphase if needed. |
| `--host-username` | `None` | Host username if needed. |
| `--no-image-pull` | `False` | Skip pull and use local Docker image. |
| `--output-dir` | `None` | Path to output directory, outputs/{datetime} will be used if not specified. |
| `--private-key` | `None` | Path to private key if needed. |

#### Global arguments

| Name | Default | Description |
|---------------|---------|--------------------|
| `--help` `-h` | N/A | Show help message. |

#### Examples

Collect system info on local GPU node:
```bash title="SB CLI"
sb node-info
```

Collect system info on all nodes in `./host.ini`:
```bash title="SB CLI"
sb node-info --host-file ./host.ini
```

### `sb version`

Print the current SuperBench CLI version.
Expand Down
18 changes: 17 additions & 1 deletion docs/user-tutorial/system-config.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ id: system-config

# System Config Info

This tool is to collect the system information automatically on the tested GPU nodes including the following hardware categories:

- [System](#system)
- [Memory](#memory)
- [CPU](#cpu)
Expand All @@ -12,7 +14,21 @@ id: system-config
- [Accelerator](#accelerator)
- [PCIe](#pcie)

## Parameter amd Details
## Usage

1. [Install SuperBench](../getting-started/installation.mdx) on the local machine.

2. Prepare the host file of the tested GPU nodes using [Ansible Inventory](../getting-started/configuration.md#ansible-inventory) on the local machine.

3. After installing the Superbnech and the host file is ready, you can start to collect the sys info automatically using `sb node-info` command. The detailed command can be found from [SuperBench CLI](../cli.md).

```
sb run-info -f host.ini --output-dir ${output-dir}
yukirora marked this conversation as resolved.
Show resolved Hide resolved
```

4. After the command finished, you can find the output system info json file `sys-info.json` of each node under \${output_dir}/nodes/${node_name}.

## Parameter and Details

### System

Expand Down
1 change: 1 addition & 0 deletions superbench/cli/_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def load_command_table(self, args):
g.command('deploy', 'deploy_command_handler')
g.command('exec', 'exec_command_handler')
g.command('run', 'run_command_handler')
g.command('node-info', 'info_command_handler')
with CommandGroup(self, 'benchmark', 'superbench.cli._benchmark_handler#{}') as g:
g.command('list', 'benchmark_list_command_handler')
g.command('list-parameters', 'benchmark_list_params_command_handler')
Expand Down
64 changes: 64 additions & 0 deletions superbench/cli/_handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

"""SuperBench CLI command handler."""

import json
import sys
from pathlib import Path
from importlib_metadata import version, PackageNotFoundError
Expand All @@ -14,6 +15,7 @@
from superbench.runner import SuperBenchRunner
from superbench.executor import SuperBenchExecutor
from superbench.common.utils import create_sb_output_dir, get_sb_config
from superbench.tools import SystemInfo


def check_argument_file(name, file):
Expand Down Expand Up @@ -319,3 +321,65 @@ def run_command_handler(
runner.run()
if runner.get_failure_count() != 0:
sys.exit(runner.get_failure_count())


def info_command_handler(
docker_image='superbench/superbench',
docker_username=None,
docker_password=None,
no_image_pull=False,
host_file=None,
host_list=None,
host_username=None,
host_password=None,
output_dir=None,
private_key=None
):
"""Collect the system info on all given nodes.

Args:
docker_image (str, optional): Docker image URI. Defaults to superbench/superbench:latest.
docker_username (str, optional): Docker registry username if authentication is needed. Defaults to None.
docker_password (str, optional): Docker registry password if authentication is needed. Defaults to None.
no_image_pull (bool, optional): Skip pull and use local Docker image. Defaults to False.
host_file (str, optional): Path to Ansible inventory host file. Defaults to None.
host_list (str, optional): Comma separated host list. Defaults to None.
host_username (str, optional): Host username if needed. Defaults to None.
host_password (str, optional): Host password or key passphase if needed. Defaults to None.
output_dir (str, optional): Path to output directory. Defaults to None.
private_key (str, optional): Path to private key if needed. Defaults to None.

Raises:
CLIError: If input arguments are invalid.
"""
# local
abuccts marked this conversation as resolved.
Show resolved Hide resolved
if not (host_file or host_list):
try:
output_dir = create_sb_output_dir(output_dir)
info = SystemInfo().get_all()
output_dir_path = Path(output_dir)
with open(output_dir_path / 'sys_info.json', 'w') as f:
json.dump(info, f)
except Exception as ex:
raise RuntimeError('Failed to get node info.') from ex
return

# remote
docker_config, ansible_config, sb_config, sb_output_dir = process_runner_arguments(
docker_image=docker_image,
docker_username=docker_username,
docker_password=docker_password,
no_docker=False,
no_image_pull=no_image_pull,
host_file=host_file,
host_list=host_list,
host_username=host_username,
host_password=host_password,
output_dir=output_dir,
private_key=private_key,
)

runner = SuperBenchRunner(sb_config, docker_config, ansible_config, sb_output_dir)
runner.run_sys_info()
abuccts marked this conversation as resolved.
Show resolved Hide resolved
if runner.get_failure_count() != 0:
sys.exit(runner.get_failure_count())
10 changes: 10 additions & 0 deletions superbench/cli/_help.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,16 @@
--config-override superbench.enable=kernel-launch superbench.env.SB_MICRO_PATH=/path/to/superbenchmark
""".format(cli_name=CLI_NAME)

helps['node-info'] = """
type: command
short-summary: Collect the system info distributedly.
examples:
- name: Collect system info on local GPU node
text: {cli_name} node-info
- name: Collect system info on all nodes in ./host.ini"
text: {cli_name} node-info --host-file ./host.ini
""".format(cli_name=CLI_NAME)

helps['benchmark'] = """
type: group
short-summary: Commands to manage benchmarks.
Expand Down
18 changes: 18 additions & 0 deletions superbench/runner/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,24 @@ def deploy(self): # pragma: no cover
)
self._ansible_client.run(self._ansible_client.get_playbook_config('deploy.yaml', extravars=extravars))

def run_sys_info(self):
"""Run the system info on all nodes."""
self.check_env()

logger.info('Runner is going to run node info.')
yukirora marked this conversation as resolved.
Show resolved Hide resolved

fcmd = "docker exec sb-workspace bash -c '{command}'"
if self._docker_config.skip:
fcmd = "bash -c 'cd $SB_WORKSPACE && {command}'"
ansible_runner_config = self._ansible_client.get_shell_config(
fcmd.format(command='sb node-info --output-dir {output_dir}'.format(output_dir=self._sb_output_dir))
)
ansible_rc = self._ansible_client.run(ansible_runner_config, sudo=(not self._docker_config.skip))

if ansible_rc != 0:
self.cleanup()
self.fetch_results()

def check_env(self): # pragma: no cover
"""Check SuperBench environment."""
logger.info('Checking SuperBench environment.')
Expand Down