Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pprint to represent DataContainers, not forcing HDFStubs #1057

Closed
wants to merge 2 commits into from

Conversation

niklassiemer
Copy link
Member

to_builtin got a new argument load_stubs=True such that loading HDFStubs can be avoided.

Might close #1031

to_builtin got a new argument load_stubs=True such that loading HDFStubs can be avoided.
@niklassiemer
Copy link
Member Author

With these changes, the DataContainer representation is better to read on the command line and in notebooks - at least in my opinion. However, it also slightly changes for subgroups, which are not shown as DataContainer any more:
For the DataContainer

dc = DataContainer({"a": 1, "b": 2, 'c':{'A':1}})

we get with the current main

DataContainer({'a': 1, 'b': 2, 'c': DataContainer({'A': 1})})

but with these changes we would get

DataContainer({"a": 1, "b": 2, 'c':{'A':1}})

Furthermore, it (currently) omits everything nested deeper than the first two levels by '...'.

@@ -451,50 +439,53 @@ def _read_only_error(cls):
"finished.".format(cls.__name__)
)

def to_builtin(self, stringify=False):
def to_builtin(self, stringify=False, load_stubs=True, _no_str_keys=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_stubs should have a short documentation even though it's fairly obvious. If I see correctly _no_str_keys is not used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, documentation is missing and I forgot to pass the _no_str_keys to the sub groups. However, it is used and switches off the key conversion such that the keys stay e.g. int if they are.

@pmrv
Copy link
Contributor

pmrv commented Mar 22, 2023

With these changes, the DataContainer representation is better to read on the command line and in notebooks - at least in my opinion. However, it also slightly changes for subgroups, which are not shown as DataContainer any more: For the DataContainer

dc = DataContainer({"a": 1, "b": 2, 'c':{'A':1}})

we get with the current main

DataContainer({'a': 1, 'b': 2, 'c': DataContainer({'A': 1})})

but with these changes we would get

DataContainer({"a": 1, "b": 2, 'c':{'A':1}})

Furthermore, it (currently) omits everything nested deeper than the first two levels by '...'.

I have a slight preference to keep the class name in the output to differentiate true dict members from DataContainer sub groups or their subclasses (e.g. Group from Sphinx). Omitting after a certain depth is ok with me.

@niklassiemer
Copy link
Member Author

I agree, however, the pprint does not naively work on the DataContainer with depth truncation. Therefore, I use the to_builtin under the hood, which of course converts to plain dict... I'll take another shot at it. Maybe I find a way to tell pprint to use the DataContainer directly.

name = self.__class__.__name__
plain = f"{name}({json.dumps(self.to_builtin(stringify=True), indent=2, default=str)})"
return "<pre>" + plain + "</pre>"
return self.to_builtin(stringify=True, load_stubs=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a bit the question: Do we want to not load the stubs in this case? Also the nice jupyter lab interface would not show the full container from the beginning but the stubs. To have the full view one would need to call to_builtin beforehand.

@pmrv
Copy link
Contributor

pmrv commented Mar 22, 2023

I found this and this, which suggests to subclass the PrettyPrinter. Your call if that's worth it. :')

Not forcing stubs is prefered, I think, since it might incur significant I/O.

@pmrv
Copy link
Contributor

pmrv commented Mar 22, 2023

See also the second answer from the SO link and this POTW. Those manage without subclassing, by doing __repr__ in certain way, but haven't looked at the details.

@pmrv
Copy link
Contributor

pmrv commented Mar 22, 2023

Last link I promise..

@stale
Copy link

stale bot commented May 21, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label May 21, 2023
@stale stale bot closed this Jun 10, 2023
@samwaseda samwaseda deleted the pp_dc branch August 15, 2024 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rendering DataContainers on an ipython shell
2 participants