Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune/core] serialization debugging utility #12142

Merged
merged 22 commits into from
Dec 2, 2020

Conversation

richardliaw
Copy link
Contributor

@richardliaw richardliaw commented Nov 19, 2020

Why are these changes needed?

A helper method for debugging serialization issues with Ray.

import threading

lock = threading.Lock()

def test():
    print(lock)
    
class TestClass:
    def __init__(self):
        print(lock)
        
    def __eq__(self, *args, **kwargs):
        print(lock)

inspect_serializability(test, "test")

(base) ➜  ray git:(serialization-check) ✗ RAY_PICKLE_VERBOSE_DEBUG=1 python cloudpickle/_test.py
File descriptor limit 256 is too low for production servers and may result in connection errors. At least 8192 is recommended. --- Fix with 'ulimit -n 8192'
2020-11-25 01:57:26,546	INFO services.py:1210 -- View the Ray dashboard at http://127.0.0.1:8265
2020-11-25 01:57:28,466	WARNING function_runner.py:540 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
================================================================================
Checking Serializability of <class 'ray.tune.function_runner.wrap_function.<locals>.ImplicitFunc'>
================================================================================
!!! FAIL serialization: can't pickle _thread.lock objects
    Serializing '__init__' <function Trainable.__init__ at 0x7ffb40596680>...
    Serializing '_close_logfiles' <function Trainable._close_logfiles at 0x7ffb40596f80>...
    Serializing '_create_logger' <function Trainable._create_logger at 0x7ffb40596e60>...
    Serializing '_export_model' <function Trainable._export_model at 0x7ffb4059bb00>...
    Serializing '_is_overridden' <function Trainable._is_overridden at 0x7ffb4059bb90>...
    Serializing '_log_result' <function Trainable._log_result at 0x7ffb4059b950>...
    Serializing '_open_logfiles' <function Trainable._open_logfiles at 0x7ffb40596ef0>...
    Serializing '_report_thread_runner_error' <function FunctionRunner._report_thread_runner_error at 0x7ffb405f2dd0>...
    Serializing '_restore' <function Trainable._restore at 0x7ffb4059b710>...
    Serializing '_save' <function Trainable._save at 0x7ffb4059b5f0>...
    Serializing '_setup' <function Trainable._setup at 0x7ffb4059b830>...
    Serializing '_start' <function FunctionRunner._start at 0x7ffb405f28c0>...
    Serializing '_stop' <function Trainable._stop at 0x7ffb4059ba70>...
    Serializing '_train' <function Trainable._train at 0x7ffb4059b4d0>...
    Serializing '_trainable_func' <function wrap_function.<locals>.ImplicitFunc._trainable_func at 0x7ffb105dc8c0>...
    !!! FAIL serialization: can't pickle _thread.lock objects
    Detected 3 global variables. Checking serializability...
        Serializing 'partial' <class 'functools.partial'>...
        Serializing 'inspect' <module 'inspect' from '/Users/rliaw/miniconda3/lib/python3.7/inspect.py'>...
        Serializing 'RESULT_DUPLICATE' __duplicate__...
    Detected 3 nonlocal variables. Checking serializability...
        Serializing 'train_func' <function test at 0x7ffb50214560>...
        !!! FAIL serialization: can't pickle _thread.lock objects
        Detected 2 global variables. Checking serializability...
            Serializing 'lock' <unlocked _thread.lock object at 0x7ffb500c8f00>...
            !!! FAIL serialization: can't pickle _thread.lock objects
    Serializing '_close_logfiles' <function Trainable._close_logfiles at 0x7ffb40596f80>...
================================================================================
Variable:

	lock [obj=<unlocked _thread.lock object at 0x7ffb500c8f00>, parent=<function test at 0x7ffb50214560>]

TODO:

  • Add basic check test for function, class, object
  • make sure this isn't triggered without environment variable.

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw richardliaw changed the title [wip] serialization debugging utility serialization debugging utility Nov 19, 2020
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can document this method in our serialization page?

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw richardliaw changed the title serialization debugging utility [tune/core[ serialization debugging utility Nov 30, 2020
@richardliaw richardliaw changed the title [tune/core[ serialization debugging utility [tune/core] serialization debugging utility Nov 30, 2020
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@rkooo567 rkooo567 self-assigned this Dec 1, 2020
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This would be really nice for users to debug serialization issue with this tool! I only have some minor comments.

python/ray/util/check_serialize.py Outdated Show resolved Hide resolved
python/ray/cloudpickle/__init__.py Show resolved Hide resolved
doc/source/serialization.rst Show resolved Hide resolved
python/ray/util/check_serialize.py Outdated Show resolved Hide resolved
python/ray/util/check_serialize.py Outdated Show resolved Hide resolved
name=None,
parent=None,
depth=3,
_failure_set=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a type annotation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

python/ray/util/check_serialize.py Show resolved Hide resolved
python/ray/util/check_serialize.py Outdated Show resolved Hide resolved
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good so far, just a minor question

Comment on lines +96 to +97
if found:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we break here? Shouldn't we try to identify all unserializable members?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, should we only check other members (from line 100) if we didn't find anything before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We break because it's a bit difficult to print out something legible if we identify all (multiple objects can reference the same underlying culprit).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a DFS approach

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
richardliaw and others added 3 commits December 1, 2020 12:57
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw richardliaw merged commit a21523c into ray-project:master Dec 2, 2020
@richardliaw richardliaw deleted the serialization-check branch December 2, 2020 08:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants