Add indent parameter to edn_format.dumps() #70

thiagokokada · 2019-12-15T03:23:09Z

This adds a pretty printer similar to json.dumps(indent=<int>). However, it does not follow Clojure formatting guidelines, instead formatting in a way more common to users from other languages like Python.

So it will convert this:

{"a": 1, "b": (1, 2, 3), "c": {"d": [1, 2, 3]}, "e": {1, 2, 3}}

To this:

Instead of this:

{:a 1,
 :b (1
     2
     3),
 :c {:d [1
         2
         3]},
 :e #{1
      2
      3}}

This should be already better than the current status quo (that is, no pretty printer at all).

Should fix issue #39 (unless the author of the issue wanted a more Clojure-like pretty printer).

Alternative implementation of PR #64. It fixes all issues found by @bfontaine, and also this approach is simpler. However, different of the older approach this one also brings change in the non-indent flow, and it may be slower (however I think the difference will be insignificant).

thiagokokada · 2019-12-15T03:46:42Z

Using the script from @bfontaine from PR #64 (just one run because I am lazy):

This branch without indent:

1203203 function calls (1024403 primitive calls) in 0.525 seconds

This branch with indent:

1203203 function calls (1024403 primitive calls) in 0.524 seconds

Master:

1119203 function calls (1005203 primitive calls) in 0.456 seconds

So yeah, a slightly overhead, however it does not seem that bad. WDYT @swaroopch ?

bfontaine · 2019-12-15T21:22:43Z

Awesome! 🙌

FYI I generated 100,000 random EDN structures which I dumped with indent=2 then re-loaded to check nothing was lost in the roundtrip, and I haven’t had any issue like I had with the previous implementation. 👌

swaroopch · 2019-12-17T00:19:00Z

So yeah, a slightly overhead, however it does not seem that bad. WDYT @swaroopch ?

@thiagokokada Will follow up this week.

swaroopch · 2019-12-19T22:01:08Z

@thiagokokada This is a really nice implementation! 👍 for writing the tests.

Can we please add docstrings to the functions indent_lines, udump, dump that describes the parameters? For example, I had to read the PR twice to understand what the difference between indent and indent_step :-)
Out of curiosity, do you think indent_lines would be faster by using https://docs.python.org/3/library/io.html#io.StringIO vs. string concatenation?

Thank you!

thiagokokada · 2019-12-19T22:04:38Z

Sure, will do 👍
No, I don't think so. Creating a array of strings and joining them should be really efficient, probably even more than StringIO (the older implementation used string concatenation, that yeah, it is kind slow): https://stackoverflow.com/q/4733693/2751730

swaroopch · 2019-12-19T22:07:38Z

@thiagokokada

Thanks!
Got it, thanks for looking into it :-)

bfontaine · 2019-12-19T22:08:15Z

One minor optimization that may help would be to store indent_step * ' ' in a variable instead of re-computing it for every line. It might be worth trying with StringIO/cStringIO just to check.

thiagokokada · 2019-12-20T02:06:09Z

I applied the small optimization from @bfontaine anyway and add the docstrings asked by @swaroopch.

Now about the StringIO. I am not going to run exhaustive tests, however this is what I got with StringIO:

1203203 function calls (1024403 primitive calls) in 0.510 seconds

There really doesn't seem to have much difference. Actually even using string concat (that should be slower) there isn't much difference in performance, at least using the benchmark from @bfontaine.

I think the current code is more idiomatic Python too and it also avoids an import, so I prefer it as current it is. WDYT?

swaroopch · 2019-12-20T19:50:25Z

Thank you @thiagokokada !

thiagokokada mentioned this pull request Dec 15, 2019

Add indent parameter in edn_format.dumps() #64

Closed

thiagokokada added 2 commits December 15, 2019 00:26

Add indent parameter to edn_format.dumps()

8088708

Make seq() return an array instead of formatted string

1b4f7b0

thiagokokada added 2 commits December 19, 2019 22:55

Add docstrings to edn_dump.udump() and edn_dump.indent_lines()

eb31712

Some small optimizations to edn_dump.indent_lines()

0d2a2f6

Fix comments in edn_dumps.indent_lines()

bf3d6e8

swaroopch approved these changes Dec 20, 2019

View reviewed changes

swaroopch merged commit 7d98bcb into swaroopch:master Dec 20, 2019

swaroopch mentioned this pull request Dec 20, 2019

Feature request: Pretty printing #39

Closed

thiagokokada deleted the add-indent-for-dumps-reload branch December 22, 2019 01:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indent parameter to edn_format.dumps() #70

Add indent parameter to edn_format.dumps() #70

thiagokokada commented Dec 15, 2019 •

edited

Loading

thiagokokada commented Dec 15, 2019

bfontaine commented Dec 15, 2019

swaroopch commented Dec 17, 2019

swaroopch commented Dec 19, 2019 •

edited

Loading

thiagokokada commented Dec 19, 2019

swaroopch commented Dec 19, 2019

bfontaine commented Dec 19, 2019

thiagokokada commented Dec 20, 2019 •

edited

Loading

swaroopch commented Dec 20, 2019

Add indent parameter to edn_format.dumps() #70

Add indent parameter to edn_format.dumps() #70

Conversation

thiagokokada commented Dec 15, 2019 • edited Loading

thiagokokada commented Dec 15, 2019

bfontaine commented Dec 15, 2019

swaroopch commented Dec 17, 2019

swaroopch commented Dec 19, 2019 • edited Loading

thiagokokada commented Dec 19, 2019

swaroopch commented Dec 19, 2019

bfontaine commented Dec 19, 2019

thiagokokada commented Dec 20, 2019 • edited Loading

swaroopch commented Dec 20, 2019

thiagokokada commented Dec 15, 2019 •

edited

Loading

swaroopch commented Dec 19, 2019 •

edited

Loading

thiagokokada commented Dec 20, 2019 •

edited

Loading