Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preCommand option to support configuring experimental environment by user #2875

Merged
merged 6 commits into from Sep 21, 2020
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/en_US/TrainingService/RemoteMachineMode.md
Expand Up @@ -107,3 +107,79 @@ Files in `codeDir` will be uploaded to remote machines automatically. You can ru
```bash
nnictl create --config examples/trials/mnist-annotation/config_remote.yml
```

### Configure python environment

By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use __preCommand__ to specify a python environment on your remote machine.

Use `examples/trials/mnist-tfv2` as the example. Below is content of `examples/trials/mnist-tfv2/config_remote.yml`:

```yaml
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: ${replace_to_your_remote_machine_ip}
username: ${replace_to_your_remote_machine_username}
sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
```

The __preCommand__ will be executed before the remote machine executes other commands. So you can configure python environment path like this:

```yaml
# Linux remote machine
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
# Windows remote machine
preCommand: set path=${replace_to_python_environment_path_in_your_remote_machine};%path%
```

Or if you want to activate the `virtualenv` environment:

```yaml
# Linux remote machine
preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# Windows remote machine
preCommand: ${replace_to_absolute_path_recommended_here}\\scripts\\activate
```

Or if you want to activate the `conda` environment:

```yaml
# Linux remote machine
preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
# Windows remote machine
preCommand: call activate ${replace_to_conda_env_name}
```

If there are multiple commands want to execute, you can use `&&` to connect these commands:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If there are multiple commands want to execute" -> "If you want multiple commands to be executed"


```yaml
preCommand: command1 && command2 && command3
```

__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doc is very clear!

15 changes: 15 additions & 0 deletions docs/en_US/Tutorial/ExperimentConfig.md
Expand Up @@ -58,6 +58,7 @@ This document describes the rules to write the config file, and provides some ex
- [gpuIndices](#gpuindices-3)
- [maxTrialNumPerGpu](#maxtrialnumpergpu-1)
- [useActiveGpu](#useactivegpu-1)
- [preCommand](#preCommand)
+ [kubeflowConfig](#kubeflowconfig)
- [operator](#operator)
- [storage](#storage)
Expand Down Expand Up @@ -583,6 +584,14 @@ Optional. Bool. Default: false.

Used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no other active process in the GPU. If __useActiveGpu__ is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.

#### preCommand

Optional. String.

Specifies the pre-command that will be executed before the remote machine executes other commands. Users can configure the experimental environment on remote machine by setting __preCommand__. If there are multiple commands need to execute, use `&&` to connect them, such as `preCommand: command1 && command2 && ...`.

__Note__: Because __preCommand__ will execute before other commands each time, it is strongly not recommended to set __preCommand__ that will make changes to system, i.e. `mkdir` or `touch`.

### kubeflowConfig

#### operator
Expand Down Expand Up @@ -795,6 +804,12 @@ If run trial jobs in remote machine, users could specify the remote machine info
username: test
sshKeyPath: /nni/sshkey
passphrase: qwert
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
```

### PAI mode
Expand Down
32 changes: 32 additions & 0 deletions examples/trials/mnist-tfv2/config_remote.yml
@@ -0,0 +1,32 @@
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
searchSpacePath: search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: ${replace_to_your_remote_machine_ip}
username: ${replace_to_your_remote_machine_username}
sshKeyPath: ${replace_to_your_remote_machine_sshKeyPath}
# Pre-command will be executed before the remote machine executes other commands.
# Below is an example of specifying python environment.
# If you want to execute multiple commands, please use "&&" to connect them.
# preCommand: source ${replace_to_absolute_path_recommended_here}/bin/activate
# preCommand: source ${replace_to_conda_path}/bin/activate ${replace_to_conda_env_name}
preCommand: export PATH=${replace_to_python_environment_path_in_your_remote_machine}:$PATH
3 changes: 2 additions & 1 deletion src/nni_manager/rest_server/restValidationSchemas.ts
Expand Up @@ -17,7 +17,8 @@ export namespace ValidationSchemas {
passphrase: joi.string(),
gpuIndices: joi.string(),
maxTrialNumPerGpu: joi.number(),
useActiveGpu: joi.boolean()
useActiveGpu: joi.boolean(),
preCommand: joi.string()
})),
local_config: joi.object({ // eslint-disable-line @typescript-eslint/camelcase
gpuIndices: joi.string(),
Expand Down
Expand Up @@ -123,11 +123,19 @@ class LinuxCommands extends OsCommands {
if (isFile) {
command = `bash '${script}'`;
} else {
script = script.replace('"', '\\"');
script = script.replace(/"/g, '\\"');
command = `bash -c "${script}"`;
}
return command;
}

public addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined{
if (command === undefined || command === '' || preCommand === undefined || preCommand === ''){
return command;
} else {
return `${preCommand} && ${command}`;
}
}
}

export { LinuxCommands };
Expand Up @@ -46,7 +46,7 @@ class WindowsCommands extends OsCommands {
}

public generateGpuStatsScript(scriptFolder: string): string {
return `powershell -command $env:METRIC_OUTPUT_DIR='${scriptFolder}';$app = Start-Process -FilePath python -NoNewWindow -passthru -ArgumentList '-m nni_gpu_tool.gpu_metrics_collector' -RedirectStandardOutput ${scriptFolder}\\scriptstdout -RedirectStandardError ${scriptFolder}\\scriptstderr;Write $PID ^| Out-File ${scriptFolder}\\pid -NoNewline -encoding utf8;wait-process $app.ID`;
return `powershell -command $env:Path=If($env:prePath){$env:prePath}Else{$env:Path};$env:METRIC_OUTPUT_DIR='${scriptFolder}';$app = Start-Process -FilePath python -NoNewWindow -passthru -ArgumentList '-m nni_gpu_tool.gpu_metrics_collector' -RedirectStandardOutput ${scriptFolder}\\scriptstdout -RedirectStandardError ${scriptFolder}\\scriptstderr;Write $PID ^| Out-File ${scriptFolder}\\pid -NoNewline -encoding utf8;wait-process $app.ID`;
}

public createFolder(folderName: string, sharedFolder: boolean = false): string {
Expand Down Expand Up @@ -122,6 +122,14 @@ class WindowsCommands extends OsCommands {
const command = `${script}`;
return command;
}

public addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined{
if (command === undefined || command === '' || preCommand === undefined || preCommand === ''){
return command;
} else {
return `${preCommand} && set prePath=%path% && ${command}`;
}
}
}

export { WindowsCommands };
Expand Up @@ -28,6 +28,7 @@ abstract class OsCommands {
public abstract killChildProcesses(pidFileName: string, killSelf: boolean): string;
public abstract extractFile(tarFileName: string, targetFolder: string): string;
public abstract executeScript(script: string, isFile: boolean): string;
public abstract addPreCommand(preCommand: string | undefined, command: string | undefined): string | undefined;

public joinPath(...paths: string[]): string {
let dir: string = paths.filter((path: any) => path !== '').join(this.pathSpliter);
Expand Down
Expand Up @@ -23,6 +23,7 @@ export class RemoteMachineMeta {
//TODO: initialize varialbe in constructor
public occupiedGpuIndexMap?: Map<number, number>;
public readonly useActiveGpu?: boolean = false;
public readonly preCommand?: string;
}

/**
Expand Down
Expand Up @@ -32,6 +32,7 @@ class ShellExecutor {
private tempPath: string = "";
private isWindows: boolean = false;
private channelDefaultOutputs: string[] = [];
private preCommand: string | undefined;

constructor() {
this.log = getLogger();
Expand All @@ -47,6 +48,7 @@ class ShellExecutor {
username: rmMeta.username,
tryKeyboard: true,
};
this.preCommand = rmMeta.preCommand;
this.name = `${rmMeta.username}@${rmMeta.ip}:${rmMeta.port}`;
if (rmMeta.passwd !== undefined) {
connectConfig.password = rmMeta.passwd;
Expand Down Expand Up @@ -349,6 +351,9 @@ class ShellExecutor {
let exitCode: number;

const commandIndex = randomInt(10000);
if(this.osCommands !== undefined){
command = this.osCommands.addPreCommand(this.preCommand, command);
}
this.log.debug(`remoteExeCommand(${commandIndex}): [${command}]`);

// Windows always uses shell, and it needs to disable to get it works.
Expand Down
Expand Up @@ -36,6 +36,7 @@ async function getRemoteFileContentLoop(executor: ShellExecutor): Promise<void>

describe('ShellExecutor test', () => {
let skip: boolean = false;
let isWindows: boolean;
let rmMeta: any;
try {
rmMeta = JSON.parse(fs.readFileSync('../../.vscode/rminfo.json', 'utf8'));
Expand Down Expand Up @@ -86,4 +87,28 @@ describe('ShellExecutor test', () => {
await getRemoteFileContentLoop(executor);
await executor.close();
});

it('Test preCommand-1', async () => {
if (skip) {
return;
}
const executor: ShellExecutor = new ShellExecutor();
await executor.initialize(rmMeta);
const result = await executor.executeScript("ver", false, false);
isWindows = result.exitCode == 0 && result.stdout.search("Windows") > -1;
await executor.close();
});

it('Test preCommand-2', async () => {
if (skip) {
return;
}
const executor: ShellExecutor = new ShellExecutor();
rmMeta.preCommand = isWindows ? "set TEST_PRE_COMMAND=test_pre_command" : "export TEST_PRE_COMMAND=test_pre_command";
await executor.initialize(rmMeta);
const command = isWindows ? "python -c \"import os; print(os.environ.get(\'TEST_PRE_COMMAND\'))\"" : "python3 -c \"import os; print(os.environ.get(\'TEST_PRE_COMMAND\'))\"";
const result = (await executor.executeScript(command, false, false)).stdout.replace(/[\ +\r\n]/g, "");
chai.expect(result).eq("test_pre_command");
await executor.close();
});
});
Expand Up @@ -25,8 +25,8 @@ describe('Unit Test for RemoteMachineTrainingService', () => {
Default/.vscode/rminfo.json, whose content looks like:
{
"ip": "10.172.121.40",
"user": "user1",
"password": "mypassword"
"username": "user1",
"passwd": "mypassword"
}
*/
let skip: boolean = false;
Expand Down
6 changes: 4 additions & 2 deletions tools/nni_cmd/config_schema.py
Expand Up @@ -382,7 +382,8 @@ def validate(self, data):
Optional('passphrase'): setType('passphrase', str),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
Optional('maxTrialNumPerGpu'): setType('maxTrialNumPerGpu', int),
Optional('useActiveGpu'): setType('useActiveGpu', bool)
Optional('useActiveGpu'): setType('useActiveGpu', bool),
Optional('preCommand'): setType('preCommand', str)
},
{
'ip': setType('ip', str),
Expand All @@ -391,7 +392,8 @@ def validate(self, data):
'passwd': setType('passwd', str),
Optional('gpuIndices'): Or(int, And(str, lambda x: len([int(i) for i in x.split(',')]) > 0), error='gpuIndex format error!'),
Optional('maxTrialNumPerGpu'): setType('maxTrialNumPerGpu', int),
Optional('useActiveGpu'): setType('useActiveGpu', bool)
Optional('useActiveGpu'): setType('useActiveGpu', bool),
Optional('preCommand'): setType('preCommand', str)
})]
}

Expand Down