Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: task graph html repr #99

Merged
merged 2 commits into from
Jan 15, 2024
Merged

feat: task graph html repr #99

merged 2 commits into from
Jan 15, 2024

Conversation

jokasimr
Copy link
Contributor

Fixes #70

@jokasimr
Copy link
Contributor Author

jokasimr commented Jan 11, 2024

Here's a suggestion for how the task graph repr could look:
Screenshot from 2024-01-11 17-42-10

The entries under "based on" lists only the inputs to the graph, not the intermediate results.

@jokasimr
Copy link
Contributor Author

Case with ParamTables:
Screenshot from 2024-01-11 18-05-56

Comment on lines 132 to 136
'<p>Task graph computes</p>\n',
_list_items(leafs),
'<p>based on</p>\n',
_list_max_n_then_hide(roots),
f'<p>using the scheduler {scheduler}.</p>',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say not a fan of the "narrative" formatting. How about:

Output keys: A, B, C
Scheduler: DaskScheduler
Providers:
....

For the dask scheduler, it would be very helpful if we can include which underlying dask scheduler (get function) is used, see here: https://scipp.github.io/sciline/generated/classes/sciline.scheduler.DaskScheduler.html.

Copy link
Contributor Author

@jokasimr jokasimr Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a bit of an experiment ;) I'll change it.

However I don't think the last column should be "Providers" but rather something like "Input parameters"?
Or do you think it would be better to list all providers in the task graph rather than only the inputs?

Once the user has selected what output they are interested in I think the intermediate providers are kind of uninteresting, while the inputs that contribute to a particular output might be interesting for debugging purposes / understanding what is needed to compute a particular quantitiy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the user has selected what output they are interested in I think the intermediate providers are kind of uninteresting

Maybe? We can only list the inputs for now.

@jokasimr
Copy link
Contributor Author

New style
Screenshot from 2024-01-12 10-47-09
Screenshot from 2024-01-12 10-47-24

)


def _list_max_n_then_hide(items: Sequence[str], n: int = 5, header: str = '') -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good function name 👍

{
escape(keyname(key))
for key, (_, requires) in self._graph.items()
if len(requires) == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a sufficient check for an input param. One can define providers without inputs. We should use the same check as in other places (Pipeline repr, graph visualization).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I'm not sure I understand. Is there a conceptual difference between a parameter and a provider without inputs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a parameter is typically something a user sets, i.e., it would typically be different every time the pipeline is used (and therefore seeing its value is valuable). A provider is more static and would typically not change (and we cannot get its value without calling it, which may be expensive).

if len(requires) == 0
}
)
scheduler = escape(self._scheduler.__class__.__name__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the underlying dask scheduler, this is important information. Add a method to the schedulers? Not sure if __str__ or __repr__ would make sense for them, maybe?

Comment on lines 23 to 32
def keyname(key: Key) -> str:
if isinstance(key, Item):
return f'{keyname(key.tp)}({keyname(key.label[0].tp)})'
args = get_args(key)
if len(args):
parameters = ', '.join(map(keyname, args))
return f'{qualname(key)}[{parameters}]'
return qualname(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this compare to what is used in visualize? Can we use the same code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think we can.

@jokasimr
Copy link
Contributor Author

New version displaying some more details about the scheduler
Screenshot from 2024-01-12 11-06-23
Screenshot from 2024-01-12 11-06-43

@jokasimr jokasimr merged commit 7951e09 into main Jan 15, 2024
5 checks passed
@jokasimr jokasimr deleted the html-repr-for-taskgraph branch January 15, 2024 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add _repr_html_ for TaskGraph
2 participants