Use pprint to represent DataContainers, not forcing HDFStubs #1057

niklassiemer · 2023-03-16T15:22:43Z

to_builtin got a new argument load_stubs=True such that loading HDFStubs can be avoided.

Might close #1031

to_builtin got a new argument load_stubs=True such that loading HDFStubs can be avoided.

niklassiemer · 2023-03-17T07:32:00Z

With these changes, the DataContainer representation is better to read on the command line and in notebooks - at least in my opinion. However, it also slightly changes for subgroups, which are not shown as DataContainer any more:
For the DataContainer

dc = DataContainer({"a": 1, "b": 2, 'c':{'A':1}})

we get with the current main

DataContainer({'a': 1, 'b': 2, 'c': DataContainer({'A': 1})})

but with these changes we would get

DataContainer({"a": 1, "b": 2, 'c':{'A':1}})

Furthermore, it (currently) omits everything nested deeper than the first two levels by '...'.

pmrv · 2023-03-22T09:47:53Z

pyiron_base/storage/datacontainer.py

@@ -451,50 +439,53 @@ def _read_only_error(cls):
            "finished.".format(cls.__name__)
        )

-    def to_builtin(self, stringify=False):
+    def to_builtin(self, stringify=False, load_stubs=True, _no_str_keys=False):


load_stubs should have a short documentation even though it's fairly obvious. If I see correctly _no_str_keys is not used.

Yes, documentation is missing and I forgot to pass the _no_str_keys to the sub groups. However, it is used and switches off the key conversion such that the keys stay e.g. int if they are.

pmrv · 2023-03-22T09:50:13Z

With these changes, the DataContainer representation is better to read on the command line and in notebooks - at least in my opinion. However, it also slightly changes for subgroups, which are not shown as DataContainer any more: For the DataContainer
dc = DataContainer({"a": 1, "b": 2, 'c':{'A':1}})
we get with the current main
DataContainer({'a': 1, 'b': 2, 'c': DataContainer({'A': 1})})
but with these changes we would get
DataContainer({"a": 1, "b": 2, 'c':{'A':1}})
Furthermore, it (currently) omits everything nested deeper than the first two levels by '...'.

I have a slight preference to keep the class name in the output to differentiate true dict members from DataContainer sub groups or their subclasses (e.g. Group from Sphinx). Omitting after a certain depth is ok with me.

niklassiemer · 2023-03-22T17:40:20Z

I agree, however, the pprint does not naively work on the DataContainer with depth truncation. Therefore, I use the to_builtin under the hood, which of course converts to plain dict... I'll take another shot at it. Maybe I find a way to tell pprint to use the DataContainer directly.

niklassiemer · 2023-03-22T17:50:20Z

pyiron_base/storage/datacontainer.py

-        name = self.__class__.__name__
-        plain = f"{name}({json.dumps(self.to_builtin(stringify=True), indent=2, default=str)})"
-        return "<pre>" + plain + "</pre>"
+        return self.to_builtin(stringify=True, load_stubs=False)


Also a bit the question: Do we want to not load the stubs in this case? Also the nice jupyter lab interface would not show the full container from the beginning but the stubs. To have the full view one would need to call to_builtin beforehand.

pmrv · 2023-03-22T19:16:02Z

I found this and this, which suggests to subclass the PrettyPrinter. Your call if that's worth it. :')

Not forcing stubs is prefered, I think, since it might incur significant I/O.

pmrv · 2023-03-22T19:18:56Z

See also the second answer from the SO link and this POTW. Those manage without subclassing, by doing __repr__ in certain way, but haven't looked at the details.

pmrv · 2023-03-22T19:20:12Z

Last link I promise..

stale · 2023-05-21T11:11:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

niklassiemer added 2 commits March 16, 2023 16:21

Use pprint to represent DataContainers, not forcing HDFStubs

483d904

to_builtin got a new argument load_stubs=True such that loading HDFStubs can be avoided.

Allow non-str keys in __repr__

7fa6a95

niklassiemer requested review from pmrv and jan-janssen March 17, 2023 07:46

pmrv reviewed Mar 22, 2023

View reviewed changes

niklassiemer commented Mar 22, 2023

View reviewed changes

stale bot added the stale label May 21, 2023

stale bot closed this Jun 10, 2023

samwaseda deleted the pp_dc branch August 15, 2024 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pprint to represent DataContainers, not forcing HDFStubs #1057

Use pprint to represent DataContainers, not forcing HDFStubs #1057

niklassiemer commented Mar 16, 2023

niklassiemer commented Mar 17, 2023

pmrv Mar 22, 2023

niklassiemer Mar 22, 2023

pmrv commented Mar 22, 2023

niklassiemer commented Mar 22, 2023

niklassiemer Mar 22, 2023

pmrv commented Mar 22, 2023

pmrv commented Mar 22, 2023

pmrv commented Mar 22, 2023

stale bot commented May 21, 2023

Use pprint to represent DataContainers, not forcing HDFStubs #1057

Use pprint to represent DataContainers, not forcing HDFStubs #1057

Conversation

niklassiemer commented Mar 16, 2023

niklassiemer commented Mar 17, 2023

pmrv Mar 22, 2023

Choose a reason for hiding this comment

niklassiemer Mar 22, 2023

Choose a reason for hiding this comment

pmrv commented Mar 22, 2023

niklassiemer commented Mar 22, 2023

niklassiemer Mar 22, 2023

Choose a reason for hiding this comment

pmrv commented Mar 22, 2023

pmrv commented Mar 22, 2023

pmrv commented Mar 22, 2023

stale bot commented May 21, 2023