Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indent parameter to edn_format.dumps() #70

Merged
merged 5 commits into from
Dec 20, 2019
Merged

Add indent parameter to edn_format.dumps() #70

merged 5 commits into from
Dec 20, 2019

Conversation

thiagokokada
Copy link
Contributor

@thiagokokada thiagokokada commented Dec 15, 2019

This adds a pretty printer similar to json.dumps(indent=<int>). However, it does not follow Clojure formatting guidelines, instead formatting in a way more common to users from other languages like Python.

So it will convert this:

{"a": 1, "b": (1, 2, 3), "c": {"d": [1, 2, 3]}, "e": {1, 2, 3}}

To this:

{
  :a 1
  :b (
    1
    2
    3
  )
  :c {
    :d [
      1
      2
      3
    ]
  }
  :e #{
    1
    2
    3
  }
}

Instead of this:

{:a 1,
 :b (1
     2
     3),
 :c {:d [1
         2
         3]},
 :e #{1
      2
      3}}

This should be already better than the current status quo (that is, no pretty printer at all).

Should fix issue #39 (unless the author of the issue wanted a more Clojure-like pretty printer).

Alternative implementation of PR #64. It fixes all issues found by @bfontaine, and also this approach is simpler. However, different of the older approach this one also brings change in the non-indent flow, and it may be slower (however I think the difference will be insignificant).

@thiagokokada
Copy link
Contributor Author

Using the script from @bfontaine from PR #64 (just one run because I am lazy):

This branch without indent:

1203203 function calls (1024403 primitive calls) in 0.525 seconds

This branch with indent:

1203203 function calls (1024403 primitive calls) in 0.524 seconds

Master:

1119203 function calls (1005203 primitive calls) in 0.456 seconds

So yeah, a slightly overhead, however it does not seem that bad. WDYT @swaroopch ?

@bfontaine
Copy link
Collaborator

Awesome! 🙌

FYI I generated 100,000 random EDN structures which I dumped with indent=2 then re-loaded to check nothing was lost in the roundtrip, and I haven’t had any issue like I had with the previous implementation. 👌

@swaroopch
Copy link
Owner

So yeah, a slightly overhead, however it does not seem that bad. WDYT @swaroopch ?

@thiagokokada Will follow up this week.

@swaroopch
Copy link
Owner

swaroopch commented Dec 19, 2019

@thiagokokada This is a really nice implementation! 👍 for writing the tests.

  1. Can we please add docstrings to the functions indent_lines, udump, dump that describes the parameters? For example, I had to read the PR twice to understand what the difference between indent and indent_step :-)

  2. Out of curiosity, do you think indent_lines would be faster by using https://docs.python.org/3/library/io.html#io.StringIO vs. string concatenation?

Thank you!

@thiagokokada
Copy link
Contributor Author

  1. Sure, will do 👍
  2. No, I don't think so. Creating a array of strings and joining them should be really efficient, probably even more than StringIO (the older implementation used string concatenation, that yeah, it is kind slow): https://stackoverflow.com/q/4733693/2751730

@swaroopch
Copy link
Owner

@thiagokokada

  1. Thanks!
  2. Got it, thanks for looking into it :-)

@bfontaine
Copy link
Collaborator

One minor optimization that may help would be to store indent_step * ' ' in a variable instead of re-computing it for every line. It might be worth trying with StringIO/cStringIO just to check.

@thiagokokada
Copy link
Contributor Author

thiagokokada commented Dec 20, 2019

I applied the small optimization from @bfontaine anyway and add the docstrings asked by @swaroopch.

Now about the StringIO. I am not going to run exhaustive tests, however this is what I got with StringIO:

1203203 function calls (1024403 primitive calls) in 0.510 seconds

There really doesn't seem to have much difference. Actually even using string concat (that should be slower) there isn't much difference in performance, at least using the benchmark from @bfontaine.

I think the current code is more idiomatic Python too and it also avoids an import, so I prefer it as current it is. WDYT?

@swaroopch swaroopch merged commit 7d98bcb into swaroopch:master Dec 20, 2019
@swaroopch
Copy link
Owner

Thank you @thiagokokada !

@thiagokokada thiagokokada deleted the add-indent-for-dumps-reload branch December 22, 2019 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants