Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Introduce TrialRunner Abstraction #720

Draft
wants to merge 106 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
616f44e
do not pass the optimizer into _run()
motus Feb 21, 2024
33e332a
mypy fixes
motus Feb 21, 2024
0247259
start splitting the optimization loop into two
motus Feb 22, 2024
483e378
first complete version of the optimization loop (not tested yet)
motus Feb 23, 2024
addd5a4
Merge branch 'main' into sergiym/run/2loops
motus Feb 23, 2024
e97266f
allow running mlos_bench.run._main directly from unit tests + add a u…
motus Feb 23, 2024
64771fd
move in-process launch to a separate unit test file
motus Feb 23, 2024
bd7c55e
add is_warm_up flag to the optimization step
motus Feb 23, 2024
387722a
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/2loops
motus Feb 23, 2024
9f15aee
in-process optimizaiton loop invocation works!
motus Feb 23, 2024
65cd072
add multi-iteration optimization to in-process test; fix the mlos_cor…
motus Feb 24, 2024
c010d95
make in-process launcerh tests pass
motus Feb 24, 2024
7cfef3a
remove unnecessary local variables to make pylint happy
motus Feb 24, 2024
7233180
move trial_config_repeat_count checks to the launcher
motus Feb 24, 2024
be7dcec
make experiment.load() return trial_ids and use them in the optimizat…
motus Feb 24, 2024
3c52e03
use proper last_trial_id in the main loop; fix the unit tests
motus Feb 24, 2024
0d9dc97
update launcher tests with the new output patterns
motus Feb 24, 2024
4e171e0
remove unused variable
motus Feb 24, 2024
ab69fa0
Merge branch 'main' into sergiym/run/2loops
motus Feb 26, 2024
52adab8
better naming for functions in the optimization loop
motus Feb 26, 2024
df893d9
start implementing the scheduler class
motus Feb 27, 2024
5aca764
change the default value for is_warm_up parameter to False
motus Feb 27, 2024
4d183df
Merge branch 'sergiym/run/2loops' into sergiym/run/scheduler
motus Feb 27, 2024
309e10c
started to implement teh start() method of the sync scheduler
motus Feb 27, 2024
9a72b40
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…
motus Feb 27, 2024
ffe23e1
implement proper Scheduler constructor
motus Feb 27, 2024
cb863e0
more clean-ups to the base scheduler
motus Feb 27, 2024
990b019
minor pylint fixes
motus Feb 28, 2024
2ac0520
add _add_trial_to_queue() method
motus Feb 28, 2024
b95100a
better handling of warm-up phase (no redundant code)
motus Feb 28, 2024
e15033d
split the sccheduler implementation into the base class and the sync …
motus Feb 28, 2024
6eab1b0
use the new scheduler in _main()
motus Feb 28, 2024
9c7f2cc
add scheduler config parameters that can be overridden from global co…
motus Feb 28, 2024
479a5ed
add todo comments
motus Feb 28, 2024
50dad9f
update the scores for launcher unit tests + fix teh regexps
motus Feb 28, 2024
220ece1
add logging to the sync optimization loop
motus Feb 28, 2024
29cec19
add more logging to the scheduler class
motus Feb 28, 2024
6f8bb2c
move (sync) implementation of the run_trial() to SyncScheduler; other…
motus Feb 29, 2024
6adb2d0
wip
bpkroth Mar 4, 2024
41a0c37
start tracking which trial runner a trial is assigned to
bpkroth Mar 5, 2024
2453427
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…
motus Mar 5, 2024
d8d8dfb
Merge branch 'sergiym/run/scheduler' of github.com:motus/MLOS into se…
motus Mar 5, 2024
8a32e5a
wip: adding trial runner
bpkroth Mar 5, 2024
57dc4c3
Merge remote-tracking branch 'sergiy/sergiym/run/scheduler' into para…
bpkroth Mar 6, 2024
e55f33e
wip: integrating trial runner to merged branch
bpkroth Mar 6, 2024
7df0770
Roll back forceful assignment of PATH when invoking a local process
motus Mar 8, 2024
da55c5e
instantiate Scheduler from JSON config in the launcher (no JSON schem…
motus Mar 8, 2024
f6eb5ef
fix unit tests
motus Mar 8, 2024
97438e7
add test for Launcher scheduler load in test_load_cli_config_examples…
motus Mar 8, 2024
715fab9
Merge branch 'sergiym/local_exec/env' into sergiym/run/scheduler_load
motus Mar 8, 2024
034aef9
fix the way launcher handles trial_config_repeat_count
motus Mar 9, 2024
629236f
minor type fixes
motus Mar 9, 2024
049fdb6
add required_keys for base Scheduler
motus Mar 9, 2024
094155c
remove unnecessary type annotation
motus Mar 9, 2024
a6a7283
typo in pylint exception
motus Mar 9, 2024
0a94a37
make all unit tests run
motus Mar 9, 2024
cf42730
add a missing import
motus Mar 11, 2024
6f31a2d
add ConfigSchema.SCHEDULER (not defined yet)
motus Mar 11, 2024
e6ceb5c
fix the teardown property propagation issue
motus Mar 11, 2024
3121fb0
proper ordering of launcher properties initialization
motus Mar 11, 2024
5951544
fix last unit tests
motus Mar 11, 2024
e3f515c
more unit test fixes
motus Mar 11, 2024
86f155e
add Scheduler JSON config schema
motus Mar 11, 2024
928ceff
validate scheduler JSON schema
motus Mar 11, 2024
1511c6e
add an example config for sync scheduler
motus Mar 11, 2024
38ab457
fix the instantiation of scheduler config from JSON file
motus Mar 11, 2024
9323a1c
minor logging improvements in the Scheduler
motus Mar 11, 2024
6b35444
fix the trial_config_repeat_count default values for CLI
motus Mar 11, 2024
b242f23
roll back some unnecessary test fixes
motus Mar 11, 2024
208c393
temporarily rollback the --max_iterations 9 setting in unit test
motus Mar 11, 2024
303c25f
roll back another small fix to minimize the diff
motus Mar 11, 2024
16ea2cb
undo a fix to LocalExecService that is in a separate PR
motus Mar 11, 2024
5ad4b74
keep minimizing the diff
motus Mar 11, 2024
e0845ea
minimize diff
motus Mar 11, 2024
ed95295
Merge branch 'main' into sergiym/run/scheduler_load
motus Mar 13, 2024
45a9293
Merge branch 'main' of github.com:microsoft/MLOS into sergiym/run/sch…
motus Mar 13, 2024
be0106a
Merge branch 'sergiym/run/scheduler_load' of github.com:motus/MLOS in…
motus Mar 13, 2024
71e3ced
Merge branch 'main' into sergiym/run/scheduler_load
motus Mar 13, 2024
ca9b3a1
Merge branch 'main' into sergiym/run/scheduler_load
motus Mar 14, 2024
52352dc
Merge remote-tracking branch 'upstream/main' into parallel-async-tria…
bpkroth Mar 15, 2024
1aa0e4b
Merge branch 'main' of github.com:motus/MLOS into sergiym/run/schedul…
motus Mar 15, 2024
bbf7922
Merge branch 'sergiym/run/scheduler_load' of github.com:motus/MLOS in…
motus Mar 15, 2024
b204ebc
Fix some storage schema related tests
bpkroth Mar 15, 2024
63da0e0
make local edits scheduler schema aware
bpkroth Mar 15, 2024
ba59035
include the scheduler schema in the global config
bpkroth Mar 15, 2024
2ca34cd
fixup relative paths
bpkroth Mar 15, 2024
946b0c4
basic schema testing
bpkroth Mar 15, 2024
58e8609
Merge branch 'main' into sergiym/run/scheduler_load
bpkroth Mar 15, 2024
7985a3e
add another test case
bpkroth Mar 15, 2024
8070c30
Update mlos_bench/mlos_bench/launcher.py
motus Mar 15, 2024
f395531
pylint
bpkroth Mar 15, 2024
678f4c5
Merge remote-tracking branch 'sergiy/sergiym/run/scheduler_load' into…
bpkroth Mar 15, 2024
969e496
remove async status changes for now - future PR
bpkroth Mar 15, 2024
6f4928f
wip
bpkroth Mar 15, 2024
92382dc
wip: refactor running of a trial to a separate class so we can do the…
bpkroth Mar 19, 2024
f00c975
Merge branch 'main' into trial-runner-abstraction
bpkroth Mar 19, 2024
e91b744
comments
bpkroth Mar 19, 2024
64e7575
consistency
bpkroth Mar 19, 2024
8a9e29e
Merge branch 'main' into trial-runner-abstraction
motus Mar 19, 2024
32c01c0
fixup
bpkroth Mar 20, 2024
0e89e25
schema tests
bpkroth Mar 20, 2024
5549925
spelling
bpkroth Mar 20, 2024
7feba3a
make sure trial_runner_id shows up by default
bpkroth Mar 20, 2024
cc7ed4d
wip: fixups
bpkroth Mar 20, 2024
8d794f1
fixme comments
bpkroth Mar 20, 2024
967b6e2
Launcher args fixups
bpkroth Mar 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

"config": {
"trial_config_repeat_count": 3,
"max_trials": -1, // Limited only in hte Optimizer logic/config.
"max_trials": -1, // Limited only in the Optimizer logic/config.
"teardown": false
}
}
7 changes: 7 additions & 0 deletions mlos_bench/mlos_bench/config/schemas/cli/cli-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,13 @@
"examples": [3, 5]
},

"num_trial_runners": {
"description": "Number of trial runner instances to use to execute benchmark environments. Individual TrialRunners can be identified in configs with $trial_runner_id and optionally run in parallel.",
"type": "integer",
"minimum": 1,
"examples": [1, 3, 5, 10]
},

"storage": {
"description": "Path to the json config describing the storage backend to use.",
"$ref": "#/$defs/json_config_path"
Expand Down
33 changes: 30 additions & 3 deletions mlos_bench/mlos_bench/environments/base_environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,14 @@ class Environment(metaclass=abc.ABCMeta):
"""
An abstract base of all benchmark environments.
"""
# Should be provided by the runtime.
_COMMON_CONST_ARGS = {
"trial_runner_id",
}
_COMMON_REQ_ARGS = {
"experiment_id",
"trial_id",
}

@classmethod
def new(cls,
Expand Down Expand Up @@ -113,6 +121,12 @@ def __init__(self,
An optional service object (e.g., providing methods to
deploy or reboot a VM/Host, etc.).
"""
global_config = global_config or {}
# Make some usual runtime arguments available for tests.
for arg in self._COMMON_CONST_ARGS:
global_config.setdefault(arg, None)
for arg in self._COMMON_REQ_ARGS:
global_config.setdefault(arg, None)
self._validate_json_config(config, name)
self.name = name
self.config = config
Expand All @@ -132,7 +146,7 @@ def __init__(self,

groups = self._expand_groups(
config.get("tunable_params", []),
(global_config or {}).get("tunable_params_map", {}))
global_config.get("tunable_params_map", {}))
_LOG.debug("Tunable groups for: '%s' :: %s", name, groups)

self._tunable_params = tunables.subgroup(groups)
Expand All @@ -142,8 +156,9 @@ def __init__(self,
set(config.get("required_args", [])) -
set(self._tunable_params.get_param_values().keys())
)
req_args.update(self._COMMON_CONST_ARGS)
merge_parameters(dest=self._const_args, source=global_config, required_keys=req_args)
self._const_args = self._expand_vars(self._const_args, global_config or {})
self._const_args = self._expand_vars(self._const_args, global_config)

self._params = self._combine_tunables(self._tunable_params)
_LOG.debug("Parameters for '%s' :: %s", name, self._params)
Expand Down Expand Up @@ -307,6 +322,18 @@ def tunable_params(self) -> TunableGroups:
"""
return self._tunable_params

@property
def const_args(self) -> Dict[str, TunableValue]:
"""
Get the constant arguments for this Environment.

Returns
-------
parameters : Dict[str, TunableValue]
Key/value pairs of all environment const_args parameters.
"""
return self._const_args.copy()

@property
def parameters(self) -> Dict[str, TunableValue]:
"""
Expand All @@ -318,7 +345,7 @@ def parameters(self) -> Dict[str, TunableValue]:
parameters : Dict[str, TunableValue]
Key/value pairs of all environment parameters (i.e., `const_args` and `tunable_params`).
"""
return self._params
return self._params.copy()

def setup(self, tunables: TunableGroups, global_config: Optional[dict] = None) -> bool:
"""
Expand Down
46 changes: 36 additions & 10 deletions mlos_bench/mlos_bench/launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from mlos_bench.tunables.tunable import TunableValue
from mlos_bench.tunables.tunable_groups import TunableGroups
from mlos_bench.environments.base_environment import Environment
from mlos_bench.schedulers.trial_runner import TrialRunner

from mlos_bench.optimizers.base_optimizer import Optimizer
from mlos_bench.optimizers.mock_optimizer import MockOptimizer
Expand Down Expand Up @@ -54,6 +55,8 @@ class Launcher:

def __init__(self, description: str, long_text: str = "", argv: Optional[List[str]] = None):
# pylint: disable=too-many-statements
# pylint: disable=too-complex
Comment on lines 57 to +58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# pylint: disable=too-many-statements
# pylint: disable=too-complex
# pylint: disable=too-many-statements,too-complex

# pylint: disable=too-many-locals
_LOG.info("Launch: %s", description)
epilog = """
Additional --key=value pairs can be specified to augment or override values listed in --globals.
Expand Down Expand Up @@ -95,11 +98,13 @@ def __init__(self, description: str, long_text: str = "", argv: Optional[List[st

self._parent_service: Service = LocalExecService(parent=self._config_loader)

args_dict = vars(args)
self.global_config = self._load_config(
config.get("globals", []) + (args.globals or []),
(args.config_path or []) + config.get("config_path", []),
args_rest,
{key: val for (key, val) in config.items() if key not in vars(args)},
# Prime the global config with the command line args and the config file.
{key: val for (key, val) in config.items() if key not in args_dict or args_dict[key] is None},
)
# experiment_id is generally taken from --globals files, but we also allow overriding it on the CLI.
# It's useful to keep it there explicitly mostly for the --help output.
Expand All @@ -108,6 +113,11 @@ def __init__(self, description: str, long_text: str = "", argv: Optional[List[st
# trial_config_repeat_count is a scheduler property but it's convenient to set it via command line
if args.trial_config_repeat_count:
self.global_config["trial_config_repeat_count"] = args.trial_config_repeat_count
self.global_config.setdefault("num_trial_runners", 1)
if args.num_trial_runners:
self.global_config["num_trial_runners"] = args.num_trial_runners
if self.global_config["num_trial_runners"] <= 0:
raise ValueError(f"Invalid num_trial_runners: {self.global_config['num_trial_runners']}")
# Ensure that the trial_id is present since it gets used by some other
# configs but is typically controlled by the run optimize loop.
self.global_config.setdefault('trial_id', 1)
Expand All @@ -127,12 +137,21 @@ def __init__(self, description: str, long_text: str = "", argv: Optional[List[st
" Run `mlos_bench --help` and consult `README.md` for more info.")
self.root_env_config = self._config_loader.resolve_path(env_path)

self.environment: Environment = self._config_loader.load_environment(
self.root_env_config, TunableGroups(), self.global_config, service=self._parent_service)
_LOG.info("Init environment: %s", self.environment)

# NOTE: Init tunable values *after* the Environment, but *before* the Optimizer
self.trial_runners: List[TrialRunner] = []
for trial_runner_id in range(0, self.global_config["num_trial_runners"]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for trial_runner_id in range(0, self.global_config["num_trial_runners"]):
for trial_runner_id in range(self.global_config["num_trial_runners"]):

# Create a new global config for each Environment with a unique trial_runner_id for it.
env_global_config = self.global_config.copy()
env_global_config["trial_runner_id"] = trial_runner_id
env = self._config_loader.load_environment(
self.root_env_config, TunableGroups(), env_global_config, service=self._parent_service)
self.trial_runners.append(TrialRunner(trial_runner_id, env))
_LOG.info("Init %d trial runners for environments: %s",
len(self.trial_runners), list(trial_runner.environment for trial_runner in self.trial_runners))

# NOTE: Init tunable values *after* the Environment(s), but *before* the Optimizer
# TODO: should we assign the same or different tunables for all TrialRunner Environments?
self.tunables = self._init_tunable_values(
self.trial_runners[0].environment,
args.random_init or config.get("random_init", False),
config.get("random_seed") if args.random_seed is None else args.random_seed,
config.get("tunable_values", []) + (args.tunable_values or [])
Expand Down Expand Up @@ -208,6 +227,11 @@ def _parse_args(parser: argparse.ArgumentParser, argv: Optional[List[str]]) -> T
'--trial_config_repeat_count', '--trial-config-repeat-count', required=False, type=int,
help='Number of times to repeat each config. Default is 1 trial per config, though more may be advised.')

parser.add_argument(
'--num_trial_runners', '--num-trial-runners', required=False, type=int,
help='Number of TrialRunners to use for executing benchmark Environments. '
+ 'Individual TrialRunners can be identified in configs with $trial_runner_id and optionally run in parallel.')

parser.add_argument(
'--scheduler', required=False,
help='Path to the scheduler configuration file. By default, use' +
Expand Down Expand Up @@ -314,13 +338,13 @@ def _load_config(self,
global_config["config_path"] = config_path
return global_config

def _init_tunable_values(self, random_init: bool, seed: Optional[int],
def _init_tunable_values(self, env: Environment, random_init: bool, seed: Optional[int],
args_tunables: Optional[str]) -> TunableGroups:
"""
Initialize the tunables and load key/value pairs of the tunable values
from given JSON files, if specified.
"""
tunables = self.environment.tunable_params
tunables = env.tunable_params
_LOG.debug("Init tunables: default = %s", tunables)

if random_init:
Expand All @@ -329,6 +353,8 @@ def _init_tunable_values(self, random_init: bool, seed: Optional[int],
config={"start_with_defaults": False, "seed": seed}).suggest()
_LOG.debug("Init tunables: random = %s", tunables)

# TODO: should we assign the same or different tunables for all TrialRunner Environments?

if args_tunables is not None:
for data_file in args_tunables:
values = self._config_loader.load_config(data_file, ConfigSchema.TUNABLE_VALUES)
Expand Down Expand Up @@ -402,7 +428,7 @@ def _load_scheduler(self, args_scheduler: Optional[str]) -> Scheduler:
"teardown": self.teardown,
},
global_config=self.global_config,
environment=self.environment,
trial_runners=self.trial_runners,
optimizer=self.optimizer,
storage=self.storage,
root_env_config=self.root_env_config,
Expand All @@ -412,7 +438,7 @@ def _load_scheduler(self, args_scheduler: Optional[str]) -> Scheduler:
return self._config_loader.build_scheduler(
config=class_config,
global_config=self.global_config,
environment=self.environment,
trial_runners=self.trial_runners,
optimizer=self.optimizer,
storage=self.storage,
root_env_config=self.root_env_config,
Expand Down