Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Html repr for pipeline #78

Merged
merged 13 commits into from
Jan 11, 2024
Merged

Html repr for pipeline #78

merged 13 commits into from
Jan 11, 2024

Conversation

jokasimr
Copy link
Contributor

@jokasimr jokasimr commented Dec 1, 2023

Fixes #70

Comment on lines 145 to 148
def _kind_of_provider(p: Callable[..., Any]) -> str:
if _qualname(p) == f'{_qualname(Pipeline.__setitem__)}.<locals>.<lambda>':
return 'parameter'
if _qualname(p) == f'{_qualname(Pipeline.set_param_table)}.<locals>.<lambda>':
return 'table'
return 'function'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse this code in the visualization module?

@@ -750,3 +764,52 @@ def bind_and_call(
if not return_tuple:
return results[0]
return results

def _repr_html_(self) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we adapt some of the docs notebooks to make use of the repr?

@jokasimr
Copy link
Contributor Author

jokasimr commented Dec 8, 2023

Example based on one of the tutorials in the sciline docs
Screenshot from 2023-12-08 16-58-19

@jokasimr
Copy link
Contributor Author

jokasimr commented Dec 8, 2023

This is what is looks like for the SANS2D workflow in esssans:
Screenshot from 2023-12-08 17-05-44

@jokasimr
Copy link
Contributor Author

jokasimr commented Dec 8, 2023

Right now the "Providers" table in its current shape doesn't seem very useful. The docs column doesn't really describe what the provider represents and that is what I think we would like to have in the table. The satisfied column is not very useful either because it is ? as soon as the inputs depends on a TypeVar. The satisfied column also doesn't handle ParameterTables well, it will display that a provider is unsatisfied if the requirements come from a parameter table even if the parameter table is available.

@nvaytet
Copy link
Member

nvaytet commented Dec 11, 2023

I think the repr is too large, it should be a much more compact table, a bit like the html repr of a scipp data array.
Vertical space is very valuable in jupyter notebooks, and scrolling past large outputs can become annoying.
If you want to include the descriptions from the docstrings, I would at least recommend using a show/hide mechanism.

@SimonHeybrock
Copy link
Member

I think the repr is too large, it should be a much more compact table, a bit like the html repr of a scipp data array. Vertical space is very valuable in jupyter notebooks, and scrolling past large outputs can become annoying. If you want to include the descriptions from the docstrings, I would at least recommend using a show/hide mechanism.

Agree. Another option is to show the docstring in a tooltip when hovering the mouse.

We also cannot use so much horizontal space, we should show params and providers in a single table, I think?

We also should list the provider names, since this can take different values depending on how a pipeline is put together.

@nvaytet
Copy link
Member

nvaytet commented Dec 11, 2023

Wouldn't the output of visualize be a better repr? What was the reason we wanted to have a table?

@jokasimr
Copy link
Contributor Author

jokasimr commented Dec 11, 2023

I think the motives for adding a html repr for the pipeline were:

  • having a more compact description
  • listing information not avaliable in the graphviz, values of parameters etc
  • listing the domain types that can be produced by the pipeline
  • to be able to display a pipeline without selecting a particular result
  • for very large pipelines inspecting the graph can become cumbersome
  • debugging incomplete pipelines with missing parts

Are there other?

@jokasimr
Copy link
Contributor Author

We also cannot use so much horizontal space, we should show params and providers in a single table, I think?

Sure we can do that but then we will use more vertical space. What's the problem with using the horizontal space?

@jokasimr
Copy link
Contributor Author

New version, docstring removed, added name and source of provider function instead
Screenshot from 2023-12-11 11-40-48

@SimonHeybrock
Copy link
Member

Wouldn't the output of visualize be a better repr? What was the reason we wanted to have a table?

I think it serves two very different purposes. Compare, e.g., also Dask's _repr_html_ vs. visualize.

@SimonHeybrock
Copy link
Member

We also cannot use so much horizontal space, we should show params and providers in a single table, I think?

Sure we can do that but then we will use more vertical space. What's the problem with using the horizontal space?

For example that is does not fit into the docs.

@SimonHeybrock
Copy link
Member

We also cannot use so much horizontal space, we should show params and providers in a single table, I think?

Sure we can do that but then we will use more vertical space. What's the problem with using the horizontal space?

I am not even sure we want to split providers from params: In different configs of a pipeline we might get the same type from either a provider or a param. Have you considering having everything in a single table, alphabetically?

@jokasimr
Copy link
Contributor Author

We also cannot use so much horizontal space, we should show params and providers in a single table, I think?

Sure we can do that but then we will use more vertical space. What's the problem with using the horizontal space?

I am not even sure we want to split providers from params: In different configs of a pipeline we might get the same type from either a provider or a param. Have you considering having everything in a single table, alphabetically?

That makes sense. I split them because some columns don't map well to parameters that have values, and vice versa for providers that are functions. For example the "Value" column doesn't really make sense for providers that are functions, and "Docs" or "Source" don't really exist for providers that are just values.

But I think you're right, it is better to put them all in same table and just don't put anything in the column if the kind of the provider doesn't have that information.

@jokasimr
Copy link
Contributor Author

jokasimr commented Dec 13, 2023

Here's a new version.
The satisfied column now displays all type var combinations that are possible (based on the constraints of the type var) and highlights the ones that are satisfied.

It uses quite a lot of space, but on the other hand it is quite useful to be able to tell quickly what providers exist and are satisfied. One option is to make the list of concrete providers collapsed by default.

The value column uses the scipp ._repr_html_() method.
Screenshot from 2023-12-13 16-56-54
Screenshot from 2023-12-13 16-57-12
Screenshot from 2023-12-13 16-57-58

@SimonHeybrock
Copy link
Member

The satisfied column now displays all type var combinations that are possible (based on the constraints of the type var) and highlights the ones that are satisfied.

It uses quite a lot of space, but on the other hand it is quite useful to be able to tell quickly what providers exist and are satisfied. One option is to make the list of concrete providers collapsed by default.

I think I'd keep things simple for now and simply remove the satisfied column.

@jokasimr jokasimr marked this pull request as ready for review December 14, 2023 13:20
{chr(10).join((f"""
<tr>
<th scope="row">{pretty_provider_name(p)}</th>
<td scope="row">{provider_source(p)}</th>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting some strange values for the paths in the source dropdowns:
Screenshot at 2023-12-15 11-25-51

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have information of package/module/class it's coming from?
Instead of the source file and number of line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes package/module/function_name makes more sense than file and line number. I'll see if I can fix that.

<tbody>
{chr(10).join((f"""
<tr>
<th scope="row">{pretty_provider_name(p)}</th>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to have the names in bold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not really

return f'{p["value"]}'

return f'''
<div style="display: flex; column-gap: 1em; align-items: flex-start;">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe use template instead of f-string here...?
Then it'll be much easier to update/fix them later.
Llike for the data group html wrapper: https://github.com/scipp/scipp/blob/main/src/scipp/visualization/formatting_datagroup_html.py

Or maybe later if we want to add more columns...!

Copy link
Contributor Author

@jokasimr jokasimr Dec 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the code needs a clean up. I'll collect the inline styles and remove excessive duplication and nested f-strings etc. I'm not sure that the refactor will involve templates, I'll look at the datagroup code but it might be overkill here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can also move this to a different file to not grow pipeline.py too much?

Comment on lines 773 to 844
providers = groupby(
lambda p: p.kind,
(
ProviderDisplayData(
origin,
args,
k := _kind_of_provider(value),
value() if k != 'function' else value,
)
for origin, args, value in chain(
providers_with_parameters, providers_without_parameters
)
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this make one entry in the repr per row (and column) of a param table? Or am I misreading this?

Copy link
Contributor Author

@jokasimr jokasimr Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're misreading it. Take a look at the examples in the screenshots I've posted in this PR. Parameter Tables use their own built in html repr. #78 (comment)

However, it is quite messy to reconstruct the parameter tables from the _provider and _subprovider dicts, so it would be good if we could do that in a better way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we had decided to combine everything into a single table? I had thought those screenshots were thus outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see. The parameters were merged with the providers in a single table. But the parameter tables are still displayed separately. Do you think it'd be better to merge them in the larger table as well? I think that it might be more messy. The parameter tables are self contained and have a html repr already so I thought it sensible to use that and display them separately.

Copy link
Member

@SimonHeybrock SimonHeybrock Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho a user can display the param table themselves if they want to. The HTML repr of the pipeline should give info about the pipeline (i.e., mainly the structure), and as such I would consider each param table "column" as a single entity. That is, simply display a single summary line in the repr, instead of all the values. After all there might be thousands of rows in a param table? We must avoid making the pipeline's HTML repr unusable (or slow) in such cases.

Copy link
Member

@SimonHeybrock SimonHeybrock Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we swap the "Source" and "Value" columns?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said before, I would be worried about size and lag when using this with large param tables. I think we should not include full nested HTML reprs of each value. Listing the size should be a good start.

It is generally always easier to make things more complicated later, than to remove it (because then some users who learned to love and rely on a feature will become upset).

Copy link
Contributor Author

@jokasimr jokasimr Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it might be a problem, but I think it is a unusual problem and one that is difficult to protect against in general. Displaying the html repr of any DataArray might lag if it is large enough. I also think being able to view the values of all parameters in the workflow like this is quite useful.

We talked before about displaying a reduced view of variables like is done in datagroup. The issue with that approach in this case is that we don't want the sciline code to contain lots of logic about how to display scipp datatypes.

I'm not sure about what you mean by listing the size, values of parameters can have any type and they don't have a common concept of size.

Copy link
Member

@SimonHeybrock SimonHeybrock Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about what you mean by listing the size, values of parameters can have any type and they don't have a common concept of size.

I meant the number of rows of the param table.

So maybe were are talking about two different things now (because the discussion above switched back and forth)? I see I get this (no table of values):
image

Shouldn't the param-table info go into the source column, not the value column? Something like ParamTable(int, length=1000) maybe?

Copy link
Contributor Author

@jokasimr jokasimr Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it switched a bit back and forth and I understand it got confusing.

When we talked at my desk the other day we decided to only display the length of parameter tables. That's what's done now.

Now I thought we were talking about parameters that are not in tables. Just regular input values like QBins or Filename[Sample]. Those are currently displayed below a details tag in case they have a html repr. Otherwise they're just stringified.

Shouldn't the param-table info go into the source column, not the value column? Something like ParamTable(int, length=1000) maybe?

That sounds like a good idea. Maybe the source column should change name to "info" or something like that instead.

@SimonHeybrock
Copy link
Member

Can you post some screenshots of the latest implementation?

Also, you have some merge conflicts to address.

@jokasimr
Copy link
Contributor Author

Here are some screenshots:
One with collapsed fields and second with uncollapsed.
The value field is collapsed or not based on a cutoff for the string length.
Screenshot from 2024-01-10 11-34-34
Screenshot from 2024-01-10 11-35-27

@SimonHeybrock
Copy link
Member

Can you also show one with a param table?



def test_html_repr() -> None:
pipeline = sl.Pipeline([make_int], params={float: 5.0})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add generic providers and a param table for this (or separate) test, just to make sure this doesn't break the repr?

@jokasimr
Copy link
Contributor Author

Here is a screenshot for a case with parameter tables:
Screenshot from 2024-01-10 11-49-36

@jokasimr jokasimr merged commit 5b462b6 into main Jan 11, 2024
5 checks passed
@jokasimr jokasimr deleted the html-repr-for-pipeline branch January 11, 2024 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add _repr_html_ for TaskGraph
4 participants