Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty print json for human readable files #2706

Merged
merged 3 commits into from
Jan 19, 2022
Merged

Conversation

berland
Copy link
Contributor

@berland berland commented Jan 12, 2022

Issue
Resolves #318 once more..

Approach
Add indent=4 to occurences of json.dump() whenever it writes to a file that a human might read.

There are some occurences of indent=1 already in the code base, not sure if consistency with that is important, or if we should try to be consistent with how black would format the same dictionaries in source code

Changed existing indent=1 into indent=4.

Considered to add sort_keys=True also, but skipped it, might confuse.

Occurences of json.dump() in tests are not touched.

Pre review checklist

  • Added appropriate labels

Adding labels helps the maintainers when writing release notes, see sections and the
corresponding labels here: https://github.com/equinor/ert/blob/main/.github/release.yml

@markusdregi
Copy link
Contributor

There are some occurences of indent=1 already in the code base, not sure if consistency with that
is important

I would say that is not important and that you should feel free to bump those to 4 as well in this PR...

@@ -198,4 +198,4 @@ def _dump_ok_file(self):

def _dump_status_json(self):
with open(self.STATUS_json, "w") as fp:
json.dump(self.status_dict, fp, indent=1)
json.dump(self.status_dict, fp, indent=4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a home made JSON parser somewhere. Just make sure that this file either is not read by that parser, or if it is, that the parser can deal with this indentation. 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping for the test to detect that 😁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not even sure the C-layer reads this file anymore... 🤔 It definitively checks whether it exists (and removes it upon starting a forward model) to deduce running state, but I'm not sure it actually loads the file. It appears that this is all done in the Python layer... If so, there is no worries related to parsing it :)

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2022

Codecov Report

Merging #2706 (837eb66) into main (19ba42c) will decrease coverage by 0.09%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2706      +/-   ##
==========================================
- Coverage   65.05%   64.95%   -0.10%     
==========================================
  Files         651      651              
  Lines       53907    53657     -250     
  Branches     4610     4732     +122     
==========================================
- Hits        35069    34854     -215     
+ Misses      17358    17323      -35     
  Partials     1480     1480              
Impacted Files Coverage Δ
ert/serialization/_serializer.py 90.90% <100.00%> (ø)
ert3/workspace/_workspace.py 89.88% <100.00%> (ø)
ert_shared/services/_base_service.py 93.20% <100.00%> (ø)
libres/lib/enkf/block_fs_driver.cpp 81.35% <0.00%> (-0.57%) ⬇️
res/enkf/config_keys.py 100.00% <0.00%> (ø)
libres/lib/python/init.cpp 0.00% <0.00%> (ø)
libres/lib/enkf/config_keys.cpp 0.00% <0.00%> (ø)
libres/lib/res_util/block_fs.cpp 53.38% <0.00%> (+0.14%) ⬆️
libres/lib/enkf/time_map.cpp 54.00% <0.00%> (+0.91%) ⬆️
res/enkf/enkf_fs_manager.py 94.00% <0.00%> (+1.00%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19ba42c...837eb66. Read the comment docs.

@berland berland enabled auto-merge (rebase) January 13, 2022 11:54
@@ -31,6 +31,8 @@ async def decode_from_path(

class _json_serializer(Serializer):
def encode(self, obj: Any, *args: Any, **kwargs: Any) -> str:
if "indent" not in kwargs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this into the function declaration as an optional argument and then afterwards passed explicitly to dumps. It should yield the same behaviour and maybe look a bit less wacky :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that worked! Looks better now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, pylint complained:

************* Module ert.serialization._serializer
ert/serialization/_serializer.py:33:4: W0221: Number of parameters was 4 in 'Serializer.encode' and is now 5 in overridden '_json_serializer.encode' method (arguments-differ)

So either go wacky, add pyllint exception, or add indent as a argument to all encode() implementations.

)

# Most compact json serialization:
assert json_serializer.encode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative method here would be to just drop all whitespaces both from obj_json and the result of json_serializer. It might be easier to read in the future and it removes the need to ever toggle indent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I got your question? There are multiple ways to reach a compact json serialization, one way is through the options to encode(). This test is basically testing that separators is passed on while we are overriding indent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply. I retract my point :)

@jondequinor
Copy link
Contributor

jondequinor commented Jan 14, 2022

An argument against this, is that raw JSON will use more space (a lot of and \n). Some of these indented JSON files will surely be large, and they will touch "hot" cloud storage, which is expensive.

Another argument against, it that transport will marginally be slowed down due to more stuff to compress.

Both of these arguments are also against the use of JSON at all, but that's beyond the scope of this PR.

Wouldn't cat some.json|jsonpp take care of the human readable part of this PR?

@berland
Copy link
Contributor Author

berland commented Jan 14, 2022

That is correct. jq . < somefile.json | less works on-prem, but is not as user-friendly as less somefile.json. The extra whitespace shuffled around might be acceptable until ert3 is mature enough for few users needing to debug in the terminal.

Avoiding whitespace for speed reasons is possibly premature optimization.

@jondequinor
Copy link
Contributor

jondequinor commented Jan 14, 2022

That is correct. jq . < somefile.json | less works on-prem, but is not as user-friendly as less somefile.json. The extra whitespace shuffled around might be acceptable until ert3 is mature enough for few users needing to debug in the terminal.

Fair enough.

@berland
Copy link
Contributor Author

berland commented Jan 14, 2022

Setting indent=1 as default may be a compromise though. indent=4 does not improve readability by much compared to 1.

@jondequinor
Copy link
Contributor

No, the debugging argument is good, and 4 spaces is the sweet spot.

Copy link
Contributor

@markusdregi markusdregi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are good arguments both ways, but this is also easy to change at any point based on new data. I suggest we merge this as is now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prettyprint status.json (whitespace)
4 participants