Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print an execution summary at end of task runs #1091

Merged
merged 2 commits into from
Aug 18, 2015

Conversation

nicolehedblom
Copy link
Contributor

See docs in luigi/execution_summary.py for example of output.

lines.append(row)
break
if len(tasks[0].get_params()) == 1:
row += "- " + str(len(tasks)) + " " + str(task_family) + "(" + tasks[0].get_params()[0][0] + "="
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use formatting strings here

@nicolehedblom
Copy link
Contributor Author

if task.task_family in group_tasks[status]:
group_tasks[status][task.task_family].append(task)
else:
group_tasks[status][task.task_family] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do group_tasks[status][task.task_family].setdefault([]).append(task) I think

@nicolehedblom
Copy link
Contributor Author

Can someone review my code from my latest commit? Thanks.
@Tarrasch @ulzha @plinton @sisidra


def requires(self):
yield MyExternal()
for i in range(1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this range?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. I think I added it by mistake. I'll remove it.

@erikbern
Copy link
Contributor

This looks like a great addition to Luigi! Hope you're not dissuaded by all the comments – every code base has its conventions, and I generally think these reviews are a great way to learn from everyone's best practices

@nicolehedblom
Copy link
Contributor Author

@erikbern What's your opinion on printing out which the external workers were if there were any and which tasks they ran? Because I have that in the code now but I don't know if it's informative or just in the way.

@erikbern
Copy link
Contributor

Can you provide some sample output of what that looks like?

@nicolehedblom
Copy link
Contributor Author

===== Luigi Execution Summary =====

Scheduled 57 tasks of which:

  • 15 present dependencies were encountered:
    • 15 examples.Bar(num=5...19)
  • 42 were left pending, among these:
    • 1 were missing external dependencies:
      • 1 MyExternal()
    • 19 were being run by another worker:
      • 1 examples.Boom(num=0)
      • 5 examples.Dog(num=0...4)
      • examples.Cat(num=2, num2=10) and 3 other examples.Cat
      • 7 examples.Fem(num=9,10,3,20,...)
      • examples.Hej(a=7, b=8) and examples.Hej(a=9, b=20)
    • 22 had missing external dependencies:
      • 1 examples.EntryPoint()
      • examples.Foo(num=101, num2=10) and 10 other examples.Foo
      • 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)
    • 22 had dependencies that were being run by other worker:
      • 1 examples.EntryPoint()
      • examples.Foo(num=101, num2=10) and 10 other examples.Foo
      • 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)

The other workers were:
- Worker(salt=656332181, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=73930) ran 1 tasks
- Worker(salt=657882717, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=87489) ran 5 tasks
- Worker(salt=65235437, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=82359) ran 4 tasks
and 2 other workers

Did not run any tasks
This progress looks :| because there were missing tasks

===== Luigi Execution Summary =====

This is how it looks if it prints out only how many tasks each worker ran and that is how we currently have it. @erikbern

@nicolehedblom
Copy link
Contributor Author

===== Luigi Execution Summary =====

Scheduled 57 tasks of which:

  • 15 present dependencies were encountered:
    • 15 examples.Bar(num=5...19)
  • 42 were left pending, among these:
    • 1 were missing external dependencies:
      • 1 MyExternal()
    • 19 were being run by another worker:
      • 1 examples.Boom(num=0)
      • 5 examples.Dog(num=0...4)
      • examples.Cat(num=2, num2=10) and 3 other examples.Cat
      • 7 examples.Fem(num=9,10,3,20,...)
      • examples.Hej(a=7, b=8) and examples.Hej(a=9, b=20)
    • 22 had missing external dependencies:
      • 1 examples.EntryPoint()
      • examples.Foo(num=101, num2=10) and 10 other examples.Foo
      • 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)
    • 22 had dependencies that were being run by other worker:
      • 1 examples.EntryPoint()
      • examples.Foo(num=101, num2=10) and 10 other examples.Foo
      • 10 examples.DateTask(date=1998-03-23...1998-04-01, num=5)

The other workers were:
* Worker(salt=656332181, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=73930) ran:
- 1 examples.Boom(num=0)
* Worker(salt=657882717, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=87489) ran:
- 3 examples.Dog(num=0,1,2)
- examples.Cat(num=2, num2=10) and examples.Cat(num=7, num2=3)
* Worker((salt=65235437, workers=1, host=Nicoles-MacBook-Air.local, username=nicolehedblom, pid=82359) ran:
- examples.Cat(num=3, num2=8) and examples.Cat(num=5, num2=37)
- examples.Hej(a=7, b=8) and examples.Hej(a=9, b=20)
and 2 other workers

Did not run any tasks
This progress looks :| because there were missing tasks

===== Luigi Execution Summary =====

This is how it would look like if we also printed out which tasks it ran. @erikbern

@Tarrasch Tarrasch force-pushed the execution_summary branch 3 times, most recently from 641a9f0 to 793bee3 Compare August 10, 2015 13:01


def _get_statuses():
statuses = ["already_done", "completed", "failed", "still_pending", "still_pending_ext", "run_by_other_worker", "upstream_failure", "upstream_missing_dependency", "upstream_run_by_other_worker", "unknown_reason"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statuses are repeated in other parts of the code (_partition_tasks). Should only be hardcoded in one place. Using a class would have been a whole lot easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also maybe this should just be a list of reasons and comments that are being looped over

@Tarrasch
Copy link
Contributor

Cool. I'll iterate on making this code a bit cleaner then.

@Tarrasch
Copy link
Contributor

@lenaspotify @erikbern @ulzha want to review again? I cleaned up parts of the logic. :)

return set_tasks


def _dfs(set_tasks, current_task, visited):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to name this not an acronym.

nicolehedblom and others added 2 commits August 18, 2015 11:50
See docs in luigi/execution_summary.py for example of output.

This squashed commit is joint work between Nicole and Arash.

Signed-off-by: Arash Rouhani <arash@spotify.com>
@leeeena
Copy link

leeeena commented Aug 18, 2015

Overall this code could be improved and become more readable. It could use some more comments and operate more with classes instead of dicts and sets. I will let it pass because this is @nicolehedblom's first contribution. Considering that I think she did great! I have reached out to her separately with some tips and tricks.

Considering that, LGTM.

@Tarrasch
Copy link
Contributor

Well, the code is not using all the nice OOP-practices. But I think it makes it easier in some sense that it's only using primitive types. Thanks for review! :)

Tarrasch added a commit that referenced this pull request Aug 18, 2015
Print an execution summary at end of task runs
@Tarrasch Tarrasch merged commit 533d7ee into spotify:master Aug 18, 2015
@Tarrasch Tarrasch deleted the execution_summary branch August 18, 2015 12:45
@erikbern
Copy link
Contributor

Yes this is a great addition and there's some various things that could be cleaned up but overall it doesn't matter as much since this is a separate module with a very small interface and this thing clearly has pretty high utility. Looking forward to some more subsequent PR's!

@Tarrasch
Copy link
Contributor

Yea :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants