-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[automl] Memory Aware Config Tuning #1257
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing! Just a few suggestions.
import ray | ||
|
||
try: | ||
import GPUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this to the requirements_ray.txt
instead of asking the user to install this by hand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably move the import ray
in here, too, given the error message.
ludwig/automl/auto_tune_config.py
Outdated
def get_remote_gpu(): | ||
gpus = GPUtil.getGPUs() | ||
total_mem = gpus[0].memory_total | ||
return total_mem * BYTES_TO_MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This calculation is a little bit confusing, maybe use total_mem_mb * BYTES_PER_MB
so it's clear we're converting from MB to bytes.
ludwig/automl/auto_tune_config.py
Outdated
} | ||
|
||
|
||
BYTES_TO_MB = 1e6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvidia-smi
(which is what GPUtil is using under the hood) reports in "MiB" (which is actually the correct way to measure MB, but I digress). So the correct conversion is:
BYTES_TO_MB = 1024 * 1024
return current_param_values | ||
|
||
|
||
def memory_tune_config(config, dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If would suggest running this with ray.remote
if ray is available, just because TensorFlow has a tendency to have some issues with state corruption when creating models and potentially allocating GPU memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
import ray | ||
|
||
try: | ||
import GPUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably move the import ray
in here, too, given the error message.
This PR adds a new capability to the
automl
module which tightens the hyperparameter search space ranges in a Ludwig config file in order to avoid OOM issues. These OOM issues may be caused by implausible parameter assignments (given memory constraints) that can be sampled in the hyperparameter optimization experiment with the provided parameter search space ranges.