New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[automl] Memory Aware Config Tuning #1257

Merged

tgaddair merged 4 commits into ludwig-ai:master from ANarayan:automl-memory-tuning

Aug 20, 2021

Collaborator

ANarayan commented Aug 16, 2021

This PR adds a new capability to the automl module which tightens the hyperparameter search space ranges in a Ludwig config file in order to avoid OOM issues. These OOM issues may be caused by implausible parameter assignments (given memory constraints) that can be sampled in the hyperparameter optimization experiment with the provided parameter search space ranges.


          [feat] automl memory tune module

5bb2e62

ANarayan requested a review from tgaddair

August 16, 2021 16:51

tgaddair reviewed

View reviewed changes

Collaborator

tgaddair left a comment

This is amazing! Just a few suggestions.

ludwig/automl/auto_tune_config.py

+              import ray
+              try:
+                  import GPUtil

Collaborator

tgaddair Aug 16, 2021

Can we add this to the requirements_ray.txt instead of asking the user to install this by hand?

Collaborator

tgaddair Aug 19, 2021

We should probably move the import ray in here, too, given the error message.

ludwig/automl/auto_tune_config.py Outdated

+                      def get_remote_gpu():
+                          gpus = GPUtil.getGPUs()
+                          total_mem = gpus[0].memory_total
+                          return total_mem * BYTES_TO_MB

Collaborator

tgaddair Aug 16, 2021

This calculation is a little bit confusing, maybe use total_mem_mb * BYTES_PER_MB so it's clear we're converting from MB to bytes.

ludwig/automl/auto_tune_config.py Outdated

		}


		BYTES_TO_MB = 1e6

Collaborator

tgaddair Aug 16, 2021

nvidia-smi (which is what GPUtil is using under the hood) reports in "MiB" (which is actually the correct way to measure MB, but I digress). So the correct conversion is:

BYTES_TO_MB = 1024 * 1024

ludwig/automl/auto_tune_config.py

		return current_param_values


		def memory_tune_config(config, dataset):

Collaborator

tgaddair Aug 16, 2021

If would suggest running this with ray.remote if ray is available, just because TensorFlow has a tendency to have some issues with state corruption when creating models and potentially allocating GPU memory.

ANarayan added 3 commits

August 18, 2021 08:50


          [add] GPUtil to ray requirements

046d615


          [fix] MiB to bytes conversion factor

dd12578


          [add] remote ray call wrapper around memory_tune_config call

544191d

tgaddair approved these changes

View reviewed changes

Collaborator

tgaddair left a comment

LGTM!

ludwig/automl/auto_tune_config.py

+              import ray
+              try:
+                  import GPUtil

Collaborator

tgaddair Aug 19, 2021

We should probably move the import ray in here, too, given the error message.

tgaddair merged commit 510455f into ludwig-ai:master

ShreyaR pushed a commit that referenced this pull request


          [automl] Memory Aware Config Tuning (#1257)

aad3eb5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment