Print an execution summary at end of task runs #1091

nicolehedblom · 2015-07-24T16:21:18Z

See docs in luigi/execution_summary.py for example of output.

erikbern · 2015-07-24T16:22:57Z

luigi/execution_summary.py

+            lines.append(row)
+            break
+        if len(tasks[0].get_params()) == 1:
+            row += "- " + str(len(tasks)) + " " + str(task_family) + "(" + tasks[0].get_params()[0][0] + "="


use formatting strings here

nicolehedblom · 2015-07-24T16:23:43Z

@Tarrasch @ulzha @plinton @sisidra

erikbern · 2015-07-24T16:24:09Z

luigi/execution_summary.py

+            if task.task_family in group_tasks[status]:
+                group_tasks[status][task.task_family].append(task)
+            else:
+                group_tasks[status][task.task_family] = []


can do group_tasks[status][task.task_family].setdefault([]).append(task) I think

nicolehedblom · 2015-07-28T10:37:38Z

Can someone review my code from my latest commit? Thanks.
@Tarrasch @ulzha @plinton @sisidra

ulzha · 2015-07-29T09:36:34Z

examples/execution_summary_example.py

+
+    def requires(self):
+        yield MyExternal()
+        for i in range(1):


Why this range?

No reason. I think I added it by mistake. I'll remove it.

erikbern · 2015-07-29T12:34:25Z

This looks like a great addition to Luigi! Hope you're not dissuaded by all the comments – every code base has its conventions, and I generally think these reviews are a great way to learn from everyone's best practices

nicolehedblom · 2015-07-29T14:48:18Z

@erikbern What's your opinion on printing out which the external workers were if there were any and which tasks they ran? Because I have that in the code now but I don't know if it's informative or just in the way.

erikbern · 2015-07-29T17:55:30Z

Can you provide some sample output of what that looks like?

nicolehedblom · 2015-07-30T07:41:59Z

===== Luigi Execution Summary =====

Scheduled 57 tasks of which:

15 present dependencies were encountered:
- 15 examples.Bar(num=5...19)
42 were left pending, among these:
- 1 were missing external dependencies:
  - 1 MyExternal()
- 19 were being run by another worker:
  - 1 examples.Boom(num=0)
  - 5 examples.Dog(num=0...4)
  - examples.Cat(num=2, num2=10) and 3 other examples.Cat
  - 7 examples.Fem(num=9,10,3,20,...)
  - examples.Hej(a=7, b=8) and examples.Hej(a=9, b=20)
- 22 had missing external dependencies:
  - 1 examples.EntryPoint()
  - examples.Foo(num=101, num2=10) and 10 other examples.Foo
  - 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)
- 22 had dependencies that were being run by other worker:
  - 1 examples.EntryPoint()
  - examples.Foo(num=101, num2=10) and 10 other examples.Foo
  - 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)

The other workers were:
- Worker(salt=656332181, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=73930) ran 1 tasks
- Worker(salt=657882717, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=87489) ran 5 tasks
- Worker(salt=65235437, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=82359) ran 4 tasks
and 2 other workers

Did not run any tasks
This progress looks :| because there were missing tasks

===== Luigi Execution Summary =====

This is how it looks if it prints out only how many tasks each worker ran and that is how we currently have it. @erikbern

nicolehedblom · 2015-07-30T07:57:17Z

===== Luigi Execution Summary =====

Scheduled 57 tasks of which:

15 present dependencies were encountered:
- 15 examples.Bar(num=5...19)
42 were left pending, among these:
- 1 were missing external dependencies:
  - 1 MyExternal()
- 19 were being run by another worker:
  - 1 examples.Boom(num=0)
  - 5 examples.Dog(num=0...4)
  - examples.Cat(num=2, num2=10) and 3 other examples.Cat
  - 7 examples.Fem(num=9,10,3,20,...)
  - examples.Hej(a=7, b=8) and examples.Hej(a=9, b=20)
- 22 had missing external dependencies:
  - 1 examples.EntryPoint()
  - examples.Foo(num=101, num2=10) and 10 other examples.Foo
  - 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)
- 22 had dependencies that were being run by other worker:
  - 1 examples.EntryPoint()
  - examples.Foo(num=101, num2=10) and 10 other examples.Foo
  - 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)

The other workers were:
* Worker(salt=656332181, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=73930) ran:
- 1 examples.Boom(num=0)
* Worker(salt=657882717, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=87489) ran:
- 3 examples.Dog(num=0,1,2)
- examples.Cat(num=2, num2=10) and examples.Cat(num=7, num2=3)
* Worker((salt=65235437, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=82359) ran:
- examples.Cat(num=3, num2=8) and examples.Cat(num=5, num2=37)
- examples.Hej(a=7, b=8) and examples.Hej(a=9, b=20)
and 2 other workers

Did not run any tasks
This progress looks :| because there were missing tasks

===== Luigi Execution Summary =====

This is how it would look like if we also printed out which tasks it ran. @erikbern

leeeena · 2015-08-10T14:44:50Z

luigi/execution_summary.py

+
+
+def _get_statuses():
+    statuses = ["already_done", "completed", "failed", "still_pending", "still_pending_ext", "run_by_other_worker", "upstream_failure", "upstream_missing_dependency", "upstream_run_by_other_worker", "unknown_reason"]


The statuses are repeated in other parts of the code (_partition_tasks). Should only be hardcoded in one place. Using a class would have been a whole lot easier.

also maybe this should just be a list of reasons and comments that are being looped over

Tarrasch · 2015-08-10T15:56:43Z

Cool. I'll iterate on making this code a bit cleaner then.

Tarrasch · 2015-08-13T12:16:05Z

@lenaspotify @erikbern @ulzha want to review again? I cleaned up parts of the logic. :)

leeeena · 2015-08-18T08:57:52Z

luigi/execution_summary.py

+    return set_tasks
+
+
+def _dfs(set_tasks, current_task, visited):


Try to name this not an acronym.

See docs in luigi/execution_summary.py for example of output. This squashed commit is joint work between Nicole and Arash. Signed-off-by: Arash Rouhani <arash@spotify.com>

leeeena · 2015-08-18T11:17:33Z

Overall this code could be improved and become more readable. It could use some more comments and operate more with classes instead of dicts and sets. I will let it pass because this is @nicolehedblom's first contribution. Considering that I think she did great! I have reached out to her separately with some tips and tricks.

Considering that, LGTM.

Tarrasch · 2015-08-18T11:30:31Z

Well, the code is not using all the nice OOP-practices. But I think it makes it easier in some sense that it's only using primitive types. Thanks for review! :)

Print an execution summary at end of task runs

erikbern · 2015-08-18T13:12:29Z

Yes this is a great addition and there's some various things that could be cleaned up but overall it doesn't matter as much since this is a separate module with a very small interface and this thing clearly has pretty high utility. Looking forward to some more subsequent PR's!

Tarrasch · 2015-08-18T14:02:10Z

Yea :)

erikbern reviewed Jul 24, 2015
View reviewed changes

nicolehedblom force-pushed the execution_summary branch from 63f94d7 to 2d23a77 Compare July 27, 2015 13:50

Tarrasch mentioned this pull request Jul 28, 2015

luigi hadoop streaming didn't get to run #1095

Closed

Tarrasch mentioned this pull request Jul 28, 2015

Make DateHourParameter subclass of DateParameter #1098

Merged

ulzha reviewed Jul 29, 2015
View reviewed changes

Tarrasch force-pushed the execution_summary branch 3 times, most recently from 641a9f0 to 793bee3 Compare August 10, 2015 13:01

leeeena reviewed Aug 10, 2015
View reviewed changes

Tarrasch force-pushed the execution_summary branch from 46d2fde to cd10837 Compare August 13, 2015 12:38

leeeena reviewed Aug 18, 2015
View reviewed changes

luigi/execution_summary.py

return set_tasks

def _dfs(set_tasks, current_task, visited):

Copy link

leeeena Aug 18, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to name this not an acronym.

nicolehedblom and others added 2 commits August 18, 2015 11:50

Print an execution summary at end of task runs

c25e262

See docs in luigi/execution_summary.py for example of output. This squashed commit is joint work between Nicole and Arash. Signed-off-by: Arash Rouhani <arash@spotify.com>

Simplify execution summary logic

ef76e7c

Tarrasch force-pushed the execution_summary branch from cd10837 to ef76e7c Compare August 18, 2015 11:15

Tarrasch added a commit that referenced this pull request Aug 18, 2015

Merge pull request #1091 from nicolehedblom/execution_summary

533d7ee

Print an execution summary at end of task runs

Tarrasch merged commit 533d7ee into spotify:master Aug 18, 2015

Tarrasch deleted the execution_summary branch August 18, 2015 12:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print an execution summary at end of task runs #1091

Print an execution summary at end of task runs #1091

nicolehedblom commented Jul 24, 2015

erikbern Jul 24, 2015

nicolehedblom commented Jul 24, 2015

erikbern Jul 24, 2015

nicolehedblom commented Jul 28, 2015

ulzha Jul 29, 2015

nicolehedblom Jul 29, 2015

erikbern commented Jul 29, 2015

nicolehedblom commented Jul 29, 2015

erikbern commented Jul 29, 2015

nicolehedblom commented Jul 30, 2015

nicolehedblom commented Jul 30, 2015

leeeena Aug 10, 2015

erikbern Aug 10, 2015

Tarrasch commented Aug 10, 2015

Tarrasch commented Aug 13, 2015

leeeena Aug 18, 2015

leeeena commented Aug 18, 2015

Tarrasch commented Aug 18, 2015

erikbern commented Aug 18, 2015

Tarrasch commented Aug 18, 2015



		def _get_statuses():
		statuses = ["already_done", "completed", "failed", "still_pending", "still_pending_ext", "run_by_other_worker", "upstream_failure", "upstream_missing_dependency", "upstream_run_by_other_worker", "unknown_reason"]

Print an execution summary at end of task runs #1091

Print an execution summary at end of task runs #1091

Conversation

nicolehedblom commented Jul 24, 2015

erikbern Jul 24, 2015

Choose a reason for hiding this comment

nicolehedblom commented Jul 24, 2015

erikbern Jul 24, 2015

Choose a reason for hiding this comment

nicolehedblom commented Jul 28, 2015

ulzha Jul 29, 2015

Choose a reason for hiding this comment

nicolehedblom Jul 29, 2015

Choose a reason for hiding this comment

erikbern commented Jul 29, 2015

nicolehedblom commented Jul 29, 2015

erikbern commented Jul 29, 2015

nicolehedblom commented Jul 30, 2015

nicolehedblom commented Jul 30, 2015

leeeena Aug 10, 2015

Choose a reason for hiding this comment

erikbern Aug 10, 2015

Choose a reason for hiding this comment

Tarrasch commented Aug 10, 2015

Tarrasch commented Aug 13, 2015

leeeena Aug 18, 2015

Choose a reason for hiding this comment

leeeena commented Aug 18, 2015

Tarrasch commented Aug 18, 2015

erikbern commented Aug 18, 2015

Tarrasch commented Aug 18, 2015