Skip to content

Make BenchmarkConfig.compute support instance configuration in YAML #586

@R-Palazzo

Description

@R-Palazzo

Problem Description

The BenchmarkConfig supports defining and validating the configuration used to launch benchmark runs on remote compute services.

Currently, the compute section only requires a service key, with values such as gcp or aws. Based on that service, the instance configuration is resolved from Python-side defaults that are currently hardcoded in config_utils.py

We want to update the compute section so that instance-level configuration is defined directly in the YAML file.
As part of this change, we should define provider-agnostic terminology for the YAML schema and continue supporting provider-specific aliases internally. This will make it clearer which values users are expected to provide while still allowing the implementation to map those fields to each compute service.

Expected behavior

  • Move the current configuration defined in config_utils.py to the benchmark_base.yaml:
  • Define the compute instance settings explicitly in YAML, for example:
compute:
  service: gcp
  instance_type: n1-highmem-16
  boot_image: projects/deeplearning-platform-release/global/images/family/common-cu128-ubuntu-2204-nvidia-570
  root_disk_gb: 300
  gpu_type: nvidia-tesla-t4
  gpu_count: 1
  swap_gb: 64
  name_prefix: sdgym-run
  • Validate the new compute schema as part of BenchmarkConfig
  • Update BenchmarkLauncher and benchmark methods to work with the new structure.

Additional context

The config should use canonical keys such as:

  • instance_type
  • boot_image
  • root_disk_gb
  • gpu_type
  • gpu_count

These keys can then be translated internally to provider-specific names. For example:

  • instance_typemachine_type on GCP, instance_type on AWS
  • boot_imagesource_image on GCP, ami on AWS
  • root_disk_gbdisk_size_gb on GCP, volume_size_gb on AWS

Metadata

Metadata

Assignees

Labels

feature requestRequest for a new featureinternalThe issue doesn't change the API or functionality

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions