Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi trial jobs on same GPU #1109

Merged
merged 93 commits into from May 28, 2019

Conversation

SparkSnail
Copy link
Contributor

No description provided.

private designatedGpuIndices!: Set<number>;
private log: Logger;
private localTrailConfig?: TrialConfig;
private localConfig?: LocalConfig;
private isMultiPhase: boolean = false;
private jobStreamMap: Map<string, ts.Stream>;
private maxTrialNumPerGPU: number = 1;
private useActiveGPU: boolean = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about init them in constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, moved to constructor.

if(trialJobDetail === undefined) {
throw new Error(`could not get trialJobDetail by id ${trialJobId}`);
}
if (trialJobDetail.rmMeta !== undefined && trialJobDetail.rmMeta.occupiedGpuIndexMap !== undefined && trialJobDetail.gpuIndices !== undefined && trialJobDetail.gpuIndices.length > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to split this line to four lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, split to four lines.

if(rmMeta.occupiedGpuIndexMap !== undefined) {
let num = rmMeta.occupiedGpuIndexMap.get(gpuInfo.index);
let maxTrialNumPerGPU: number = rmMeta.maxTrialNumPerGPU? rmMeta.maxTrialNumPerGPU: 1;
if((num === undefined && (!rmMeta.useActiveGPU && gpuInfo.activeProcessNum === 0 || !rmMeta.useActiveGPU)) || (num !== undefined && num < maxTrialNumPerGPU)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same, better to use multiple lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest 2 lines for this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, split to 2 lines.

}
rmMeta.occupiedGpuIndexMap.set(gpuInfo.index, num + 1);
}else {
rmMeta.occupiedGpuIndexMap = new Map<number, number>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also reconsider this logic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, initialize in constructor.

if request_data['local_config']:
if request_data['local_config'].get('gpuIndices') and isinstance(request_data['local_config'].get('gpuIndices'), int):
request_data['local_config']['gpuIndices'] = str(request_data['local_config'].get('gpuIndices'))
if request_data['local_config'].get('maxTrialNumOnEachGPU'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

if request_data['local_config'].get('maxTrialNumOnEachGPU'):
request_data['local_config']['maxTrialNumOnEachGPU'] = request_data['local_config'].get('maxTrialNumOnEachGPU')
if request_data['local_config'].get('useActiveGPU'):
request_data['local_config']['useActiveGPU'] = request_data['local_config'].get('useActiveGPU')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU? -> Gpu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


* __useActiveGpu__

__useActiveGpu__ is used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no another active process in the GPU, if __useActiveGpu__ is set to true, NNI will use the GPU regardless of another processes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to state that this option is not applicable for NNI on Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

* __useActiveGpu__

__useActiveGpu__ is used to specify whether to use a GPU if there is another process. By default, NNI will use the GPU only if there is no another active process in the GPU, if __useActiveGpu__ is set to true, NNI will use the GPU regardless of another processes. This field is not applicable for NNI on Windows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Job!
I test it on my local machine and it works successfully.
It seems that it also works in remote mode when I catch a sight of the code.

I suggest put one example code here.

localConfig:
  maxTrialNumPerGpu:  5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, I've added an configuration in cifar10-pytorch-example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants