Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[automl] Memory Aware Config Tuning #1257

Merged
merged 4 commits into from
Aug 20, 2021

Conversation

ANarayan
Copy link
Collaborator

This PR adds a new capability to the automl module which tightens the hyperparameter search space ranges in a Ludwig config file in order to avoid OOM issues. These OOM issues may be caused by implausible parameter assignments (given memory constraints) that can be sampled in the hyperparameter optimization experiment with the provided parameter search space ranges.

@ANarayan ANarayan requested a review from tgaddair August 16, 2021 16:51
Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing! Just a few suggestions.

import ray

try:
import GPUtil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this to the requirements_ray.txt instead of asking the user to install this by hand?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move the import ray in here, too, given the error message.

def get_remote_gpu():
gpus = GPUtil.getGPUs()
total_mem = gpus[0].memory_total
return total_mem * BYTES_TO_MB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This calculation is a little bit confusing, maybe use total_mem_mb * BYTES_PER_MB so it's clear we're converting from MB to bytes.

}


BYTES_TO_MB = 1e6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvidia-smi (which is what GPUtil is using under the hood) reports in "MiB" (which is actually the correct way to measure MB, but I digress). So the correct conversion is:

BYTES_TO_MB = 1024 * 1024

return current_param_values


def memory_tune_config(config, dataset):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If would suggest running this with ray.remote if ray is available, just because TensorFlow has a tendency to have some issues with state corruption when creating models and potentially allocating GPU memory.

Copy link
Collaborator

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

import ray

try:
import GPUtil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably move the import ray in here, too, given the error message.

@tgaddair tgaddair merged commit 510455f into ludwig-ai:master Aug 20, 2021
ShreyaR pushed a commit that referenced this pull request Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants