-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Html repr for pipeline #78
Conversation
src/sciline/pipeline.py
Outdated
def _kind_of_provider(p: Callable[..., Any]) -> str: | ||
if _qualname(p) == f'{_qualname(Pipeline.__setitem__)}.<locals>.<lambda>': | ||
return 'parameter' | ||
if _qualname(p) == f'{_qualname(Pipeline.set_param_table)}.<locals>.<lambda>': | ||
return 'table' | ||
return 'function' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse this code in the visualization module?
@@ -750,3 +764,52 @@ def bind_and_call( | |||
if not return_tuple: | |||
return results[0] | |||
return results | |||
|
|||
def _repr_html_(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we adapt some of the docs notebooks to make use of the repr?
Right now the "Providers" table in its current shape doesn't seem very useful. The |
I think the repr is too large, it should be a much more compact table, a bit like the html repr of a scipp data array. |
Agree. Another option is to show the docstring in a tooltip when hovering the mouse. We also cannot use so much horizontal space, we should show params and providers in a single table, I think? We also should list the provider names, since this can take different values depending on how a pipeline is put together. |
Wouldn't the output of |
I think the motives for adding a html repr for the pipeline were:
Are there other? |
Sure we can do that but then we will use more vertical space. What's the problem with using the horizontal space? |
I think it serves two very different purposes. Compare, e.g., also Dask's |
For example that is does not fit into the docs. |
I am not even sure we want to split providers from params: In different configs of a pipeline we might get the same type from either a provider or a param. Have you considering having everything in a single table, alphabetically? |
That makes sense. I split them because some columns don't map well to parameters that have values, and vice versa for providers that are functions. For example the "Value" column doesn't really make sense for providers that are functions, and "Docs" or "Source" don't really exist for providers that are just values. But I think you're right, it is better to put them all in same table and just don't put anything in the column if the kind of the provider doesn't have that information. |
Here's a new version. It uses quite a lot of space, but on the other hand it is quite useful to be able to tell quickly what providers exist and are satisfied. One option is to make the list of concrete providers collapsed by default. |
I think I'd keep things simple for now and simply remove the |
src/sciline/pipeline.py
Outdated
{chr(10).join((f""" | ||
<tr> | ||
<th scope="row">{pretty_provider_name(p)}</th> | ||
<td scope="row">{provider_source(p)}</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have information of package/module/class
it's coming from?
Instead of the source file and number of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes package/module/function_name makes more sense than file and line number. I'll see if I can fix that.
src/sciline/pipeline.py
Outdated
<tbody> | ||
{chr(10).join((f""" | ||
<tr> | ||
<th scope="row">{pretty_provider_name(p)}</th> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to have the names in bold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No not really
src/sciline/pipeline.py
Outdated
return f'{p["value"]}' | ||
|
||
return f''' | ||
<div style="display: flex; column-gap: 1em; align-items: flex-start;"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe use template instead of f-string here...?
Then it'll be much easier to update/fix them later.
Llike for the data group html wrapper: https://github.com/scipp/scipp/blob/main/src/scipp/visualization/formatting_datagroup_html.py
Or maybe later if we want to add more columns...!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the code needs a clean up. I'll collect the inline styles and remove excessive duplication and nested f-strings etc. I'm not sure that the refactor will involve templates, I'll look at the datagroup code but it might be overkill here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can also move this to a different file to not grow pipeline.py
too much?
src/sciline/pipeline.py
Outdated
providers = groupby( | ||
lambda p: p.kind, | ||
( | ||
ProviderDisplayData( | ||
origin, | ||
args, | ||
k := _kind_of_provider(value), | ||
value() if k != 'function' else value, | ||
) | ||
for origin, args, value in chain( | ||
providers_with_parameters, providers_without_parameters | ||
) | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this make one entry in the repr per row (and column) of a param table? Or am I misreading this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're misreading it. Take a look at the examples in the screenshots I've posted in this PR. Parameter Tables use their own built in html repr. #78 (comment)
However, it is quite messy to reconstruct the parameter tables from the _provider
and _subprovider
dicts, so it would be good if we could do that in a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had decided to combine everything into a single table? I had thought those screenshots were thus outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I see. The parameters were merged with the providers in a single table. But the parameter tables are still displayed separately. Do you think it'd be better to merge them in the larger table as well? I think that it might be more messy. The parameter tables are self contained and have a html repr already so I thought it sensible to use that and display them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho a user can display the param table themselves if they want to. The HTML repr of the pipeline should give info about the pipeline (i.e., mainly the structure), and as such I would consider each param table "column" as a single entity. That is, simply display a single summary line in the repr, instead of all the values. After all there might be thousands of rows in a param table? We must avoid making the pipeline's HTML repr unusable (or slow) in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we swap the "Source" and "Value" columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, I would be worried about size and lag when using this with large param tables. I think we should not include full nested HTML reprs of each value. Listing the size should be a good start.
It is generally always easier to make things more complicated later, than to remove it (because then some users who learned to love and rely on a feature will become upset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it might be a problem, but I think it is a unusual problem and one that is difficult to protect against in general. Displaying the html repr of any DataArray might lag if it is large enough. I also think being able to view the values of all parameters in the workflow like this is quite useful.
We talked before about displaying a reduced view of variables like is done in datagroup. The issue with that approach in this case is that we don't want the sciline code to contain lots of logic about how to display scipp datatypes.
I'm not sure about what you mean by listing the size, values of parameters can have any type and they don't have a common concept of size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about what you mean by listing the size, values of parameters can have any type and they don't have a common concept of size.
I meant the number of rows of the param table.
So maybe were are talking about two different things now (because the discussion above switched back and forth)? I see I get this (no table of values):
Shouldn't the param-table info go into the source column, not the value column? Something like ParamTable(int, length=1000)
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it switched a bit back and forth and I understand it got confusing.
When we talked at my desk the other day we decided to only display the length of parameter tables. That's what's done now.
Now I thought we were talking about parameters that are not in tables. Just regular input values like QBins
or Filename[Sample]
. Those are currently displayed below a details tag in case they have a html repr. Otherwise they're just stringified.
Shouldn't the param-table info go into the source column, not the value column? Something like ParamTable(int, length=1000) maybe?
That sounds like a good idea. Maybe the source column should change name to "info" or something like that instead.
ddc9873
to
fe33e72
Compare
Can you post some screenshots of the latest implementation? Also, you have some merge conflicts to address. |
979978b
to
609456f
Compare
Can you also show one with a param table? |
|
||
|
||
def test_html_repr() -> None: | ||
pipeline = sl.Pipeline([make_int], params={float: 5.0}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add generic providers and a param table for this (or separate) test, just to make sure this doesn't break the repr?
Fixes #70