[fixed] Task status messages #1625

riga · 2016-04-01T09:07:39Z

This PR is a fixed version of PR #1621 which has a faulty commit history.

This PR adds status messages to tasks which are also visible on the scheduler GUI.

Examples:

Status messages are meant to change during the run method in an opportunistic way. Especially for long-running non-Hadoop tasks, the ability to read those messages directly in the scheduler GUI is quite helpful (at least for us). Internally, message changes are propagated to the scheduler via a _status_message_callback which is - in contrast to tracking_url_callback - not passed to the run method, but set by the TaskProcess.

Usage example:

class MyTask(luigi.Task):
    ...
    def run(self):
        for i in range(100):
            # do some hard work here
            if i % 10 == 0:
                self.set_status_message("Progress: %s / 100" % i)

I know that you don't like PR's that affect core code (which is reasonable =) ), but imho this feature is both lightweight and really helpful.

riga · 2016-04-01T09:10:42Z

Todo's from discussion in #1621:

~~Remove status_message cache variable~~ done
~~Add pydocs~~ done

Tarrasch · 2016-04-01T09:46:14Z

luigi/task.py

+    def set_status_message(self, message):
+        """
+        Sets the status message of the task to message, i.e., invokes _status_message_callback if it
+        is a callable. This propagates the message down to the scheduler.


In the docs here, can you add a reference the _Task.set_status_message: section? I think there are a few example in this file you can copy paste.

Tarrasch · 2016-04-01T09:47:56Z

Cool. This looks totally merge-worthy. Feel free to fix the minor doc-comment.

riga · 2016-04-01T09:50:45Z

Done ;)

Tarrasch · 2016-04-01T10:16:26Z

Nice!

erikbern · 2016-04-01T13:23:05Z

can you resolve the failing tests? https://travis-ci.org/spotify/luigi/jobs/120027335

riga · 2016-04-01T14:26:22Z

Ok, this is a tough one.

update_status_message is created at worker-level because it needs information on the scheduler. It is then passed to the task process where it is stored as _status_message_callback in the task itself. This is the reason why it can't be pickled.

This is not the case with tracking_url_callback since it's not saved in the task, but rather passed to its run method in TaskProcess._run_get_new_deps. I afraid that the status message callback should use the same mechanism here.

Here is a suggestion for changes in worker.py that would cleanup this mechanism in order to cope for arbitrary callbacks (like update_status_message) as well (and would solve the "problem" that @Tarrasch mentioned here):

from inspect import getargspec

class TaskProcess(multiprocessing.Process):

    def __init__(self, <current kw/args except tracking_url_callback>, run_callbacks=None):
        ...
        if run_callbacks is None:
            run_callbacks = {}
        self.run_callbacks = run_callbacks

def _run_get_new_deps(self):
    run_spec = getargspec(self.task.run)
    if run_spec.keywords:
        run_kwargs = self.run_callbacks.copy()
    else:
        run_kwargs = {key: cb for key, cb in self.run_callbacks.items() if key in run_sepc.args}

    task_gen = self.task.run(**run_kwargs)

    if not isinstance(task_gen, types.GeneratorType):
        return None
    ...

This way, the code in Worker._create_task_process becomes:

def _create_task_process(self, task):
    ...
    run_callbacks = {
        "tracking_url_callback": update_tracking_url,
        "status_message_callback": update_status_message
    }

    return TaskProcess(
        task, self._id, self._task_result_queue,
        random_seed=bool(self.worker_processes > 1),
        worker_timeout=self._config.timeout,
        run_callbacks=run_callbacks
    )

I'm not a fan of using inspect too much myself, but in this case it's quite robust. I can add this to the PR if you want.

erikbern · 2016-04-01T14:32:12Z

I remember running into similar issues before

What's the reason we pickle the Tasks in the first place?

riga · 2016-04-01T14:39:55Z

I think parallel scheduling requires that tasks can be pickled. At least it's mentioned in the docs at "parallel-scheduling".

Many contrib tasks also make using of it, e.g. in contrib/spark.py.

erikbern · 2016-04-01T14:52:04Z

Hm right makes sense. Tasks are in theory interchangeable given the same name and parameters so instead of pickling you could just de/serialize it using that mechanism (that's how the assistant works). It would be quite easy to fix in worker.py

But you are right that some tasks use pickling (eg Hadoop mapreduce) so it would be mess to avoid pickling everywhere.

Wouldn't your suggestion with inspect run into the same issue when contrib tasks use pickle?

riga · 2016-04-01T15:10:24Z

Wouldn't your suggestion with inspect run into the same issue when contrib tasks use pickle?

The run_callbacks would be owned by the TaskProcess instance, the task itself still doesn't know anything about the callbacks. I just ran the tests, looks good.

riga · 2016-04-01T16:09:34Z

Actually the use of inspect has another advantage. Users might use decorators on Task.run, which are called twice in the current implementation if tracking_url_callback is missing in the signature of run.

dlstadther · 2016-04-04T14:31:44Z

Due to the changes in worker.py, do any of its tests need to be added/updated?

riga · 2016-04-04T14:49:13Z

LGTM. test_tracking_url and test_type_error_in_tracking_run seem to be the only tests that make use of tracking_url_callback. The changes to TaskProcess and Worker should be pretty much covered by many other test cases.

Tarrasch · 2016-04-05T02:21:55Z

test/task_status_message_test.py

+luigi.notifications.DEBUG = True
+
+
+class TaskStatusMessageTest(LuigiTestCase):


Actually we do have a test suite which is run in two modes automatically (in-memory scheduler and rpc to external process scheduler). Can you make sure that the rpc code path is also tested?

Tarrasch · 2016-04-05T02:22:10Z

We do also have a set of scheduler tests (they are disabled in Travis for flakiness reasons), but can you add a that case there to and make sure it passes locally?

Tarrasch · 2016-04-05T02:23:38Z

doc/tasks.rst

+            for i in range(100):
+                # do some hard work here
+                if i % 10 == 0:
+                    status_message_callback("Progress: %d / 100" % i)


Is this still correct?! It thought you pass a dict now

Keyword argument unpacking also works for arguments, so

def func(a, b): print(a, b) func(**{"b": "bVal", "a": "aVal"}) # -> ("aVal", "bVal")

Tarrasch · 2016-04-05T02:34:54Z

I appreciate that you introduce inspect to make @decorators not be called twice. But may feel free to skip that change for a different PR if that simplifies things. Such a change for example requires tests that indeed show that decorators are only called once.

what about making the dict callable as I suggested in an inline comment? We would keep it to only one positional parameter for run then. Less magic, no inspection of argument-name.

riga · 2016-04-06T07:24:55Z

Ok. How about this solution:

class TaskProcess(multiprocessing.Process):

    def __init__(self, ..., tracking_url_callback=None, status_message_callback=None):
        ...
        self.tracking_url_callback = tracking_url_callback
        self.status_message_callback = status_message_callback
        ...

    def _run_get_new_deps(self):
        self.task.set_tracking_url = self.tracking_url_callback
        self.task.set_status_message = self.status_message_callback

        def deprecated_tracking_url_callback(*args, **kwargs):
            warnings.warn("tracking_url_callback in run() args is deprecated, use "
                          "set_tracking_url instead.", DeprecationWarning)
            self.tracking_url_callback(*args, **kwargs)

        run_again = False
        try:
            task_gen = self.task.run(tracking_url_callback=deprecated_tracking_url_callback)
        except TypeError as ex:
            if 'unexpected keyword argument' not in str(ex):
                raise
            run_again = True
        if run_again:
            task_gen = self.task.run()

        self.task.set_tracking_url = None
        self.task.set_status_message = None
        ...

The callbacks are set right before run() is called, and then reset. This way pickling is not affected, it's backward compatible, and at some point the try-except block can be refactored.

I just ran the tests and everything looks good.

Tarrasch · 2016-04-06T08:01:51Z

Sure. But maybe lets hear from @daveFNbuck first, perhaps there were other reasons he opted for the extra-arg in run().

Though I'm very positive to this. It feels a bit hasted now to have introduced that extra arg in run().

Also, of course you'll have to change the places in the code-base that uses run(tracking_url_callback) already, but it seems like your new suggested interface is pretty easy to use. :)

riga · 2016-04-06T09:14:19Z

Also, of course you'll have to change the places in the code-base that uses run(tracking_url_callback) already, but it seems like your new suggested interface is pretty easy to use. :)

Yep, already done that, mainly in hadoop, hadoop_jar, hive and scalding) =) I'm preparing the commits right now.

Update: working on the failed tests ...

riga · 2016-04-06T12:58:19Z

I changed hadoop.JobTask.dump to disregard callbacks, as they're not need in the deserialized job anyway. E.g. the tracking url is not set actively but parsed from the hadoop job stderr.

All tests pass now.

daveFNbuck · 2016-04-06T18:47:33Z

This looks like a good alternative to my solution, and more scalable. I see that the example attached here is of a progress percentage. If this is the main use case, I've been thinking it would be nice to add an optional progress bar that can be automatically displayed in the visualizer. That would be a bit better than having a popup to show a single number.

riga · 2016-04-06T18:57:48Z

it would be nice to add an optional progress bar that can be automatically displayed in the visualizer

Good idea. Maybe I can add an additional PR that implements a progress bar. However, I think there are more use cases than just sending the progress. We use it for important intermediate output and to some extent for a summary of final results. Maybe one can parse the status message with a regexp (e.g. new RegExp('^progress\:\s(\d+)\%.*$', 'i')) to decide whether to show a progress bar or a pop up.

Tarrasch · 2016-04-07T02:08:22Z

test/worker_test.py

+                return self.has_run
+
+            def run(self):
+                if self.set_tracking_url is not None:


Wait. Do the user really have to do this check before using the tracking_url? Can't we somehow guarantee that it's always present?

Tarrasch · 2016-04-07T02:10:00Z

This looks very good. Thank you so much for doing this! Just see my inline comment/worry.

Other than that. This is good to merge right?

riga · 2016-04-07T05:47:27Z

Can't we somehow guarantee that it's always present?

Yep, Worker._create_task_process is the only place where TaskProcesses are created, and the callbacks are always present/created here. I made some changes to reflect that. Of course, outside the run() scope, users will get an exception when a callback is used, but I think that's fine/wanted.

Tarrasch · 2016-04-07T06:22:01Z

@riga, ok, is this "fine to merge" now you think?

riga · 2016-04-07T06:22:17Z

Yes =)

Tarrasch · 2016-04-07T06:23:17Z

Ok. Let's wait for test results and then I'll merge. :)

riga · 2016-04-07T07:24:36Z

@Tarrasch Looks good, glad I could help.

This fixes a bug resulting from the interference of PR #1625 (Task status messages) and PR #1631 (Add explicit whitelist of RPC commands for luigid) task status messages. To fix this, I simply added ``set_task_status_message`` and ``get_task_status_message`` to the white-listing.

Tarrasch · 2016-04-11T04:47:14Z

luigi/scheduler.py

@@ -870,6 +871,7 @@ def _serialize_task(self, task_id, include_deps=True, deps=None):
            'priority': task.priority,
            'resources': task.resources,
            'tracking_url': getattr(task, "tracking_url", None),
+            'status_message': task.status_message


This line is causing a stack-trace like this:

Traceback (most recent call last): File "/zserver/Python-3.4.3/lib/python3.4/site-packages/tornado/web.py", line 1443, in _execute result = method(*self.path_args, **self.path_kwargs) File "/zserver/apps/luigi/code-for-luigid/luigi/server.py", line 101, in get result = getattr(self._scheduler, method)(**arguments) File "/zserver/apps/luigi/code-for-luigid/luigi/scheduler.py", line 986, in task_list serialized = self._serialize_task(task.id, False) File "/zserver/apps/luigi/code-for-luigid/luigi/scheduler.py", line 868, in _serialize_task 'status_message': task.status_message

For anyone using an old pickled state-file.

Do we need to do anything to fix this?

I think you can copy the line above. Make it like

getattr(task, "status_message", None) instead of task.status_message. Then a couple of months later we're changing it back to task.status_message. Do you see how it'll work? Do you mind submitting a PR? (you can use the online editor)

I created PR #1645 for this issue.

michcio1234 · 2017-12-19T15:09:07Z

@riga @Tarrasch
Hey, sorry for notifying you in something that old. I don't want to open a new issue since I'm not sure if anything is wrong but...
where can I see a status message that I set using self.set_status_message('msg')? Can't find it in the scheduler GUI...

riga · 2017-12-19T15:47:17Z

Hi @michcio1234 ,
the status messages are visible in the scheduler GUI. You just have to click on the chat/message icon in the "Actions" column in the task table. If a task has no status message, that icon won't be visible.

michcio1234 · 2017-12-19T17:27:27Z

Then I'm probably doing something wrong since I can't see this icon. So it seems that setting a message is not as simple as stated in the documentation.
Okay, thank you anyway for your response. If I can't make it work, I'll open a new issue.

riga added 3 commits April 1, 2016 10:58

Add status messages to tasks, propagate to task process and scheduler.

89de74d

Display task status messages on scheduler GUI.

c3ca48b

Add test case for tasks status messages.

96dba20

riga mentioned this pull request Apr 1, 2016

Task status messages #1621

Closed

riga added 2 commits April 1, 2016 11:25

Remove status_message cache variable.

75afeb9

Add set_status_message to task docs.

0ceb18d

Tarrasch reviewed Apr 1, 2016
View reviewed changes

Add set_status_message ref to pydocs in doc string.

be18161

riga added 3 commits April 4, 2016 12:05

Add run_callbacks to TaskProcess, add status_message_callback.

de658d0

Update status message test to account for status_message_callback.

f0f389c

Update task status message section in docs.

d4b82bc

Tarrasch reviewed Apr 5, 2016
View reviewed changes

riga added 6 commits April 6, 2016 11:33

Refactor callback mechanism for Task.run, deprecate argument usage.

d8cdd85

Use callback mechanism in contrib packages.

5682085

Update docs.

e932582

Update test cases.

a7a564f

Fix missign warnings imports.

0d7026c

Skip callbacks in hadoop.JobTask dumping.

d2ff9d5

Tarrasch reviewed Apr 7, 2016
View reviewed changes

Reflect presence of callbacks in TaskProcess initialization.

8c36ba6

Account for TaskProcess signature in tests.

232de8e

Tarrasch merged commit f46efc1 into spotify:master Apr 7, 2016

riga mentioned this pull request Apr 7, 2016

Add status message endpoints to RPC handler list. #1635

Merged

Tarrasch reviewed Apr 11, 2016
View reviewed changes

riga mentioned this pull request Apr 11, 2016

Make status_message optional in task serialization. #1645

Merged

		luigi.notifications.DEBUG = True


		class TaskStatusMessageTest(LuigiTestCase):

[fixed] Task status messages #1625

[fixed] Task status messages #1625

Conversation

riga commented Apr 1, 2016

riga commented Apr 1, 2016

Choose a reason for hiding this comment

Tarrasch commented Apr 1, 2016

riga commented Apr 1, 2016

Tarrasch commented Apr 1, 2016

erikbern commented Apr 1, 2016

riga commented Apr 1, 2016

erikbern commented Apr 1, 2016

riga commented Apr 1, 2016

erikbern commented Apr 1, 2016

riga commented Apr 1, 2016

riga commented Apr 1, 2016

dlstadther commented Apr 4, 2016

riga commented Apr 4, 2016

Choose a reason for hiding this comment

Tarrasch commented Apr 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Tarrasch commented Apr 5, 2016

riga commented Apr 6, 2016

Tarrasch commented Apr 6, 2016

riga commented Apr 6, 2016

riga commented Apr 6, 2016

daveFNbuck commented Apr 6, 2016

riga commented Apr 6, 2016

Choose a reason for hiding this comment

Tarrasch commented Apr 7, 2016

riga commented Apr 7, 2016

Tarrasch commented Apr 7, 2016

riga commented Apr 7, 2016

Tarrasch commented Apr 7, 2016

riga commented Apr 7, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michcio1234 commented Dec 19, 2017

riga commented Dec 19, 2017

michcio1234 commented Dec 19, 2017 • edited

michcio1234 commented Dec 19, 2017 •

edited