Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format.pp_print_custom_break, a more general break hint #2002

Merged
merged 2 commits into from Dec 8, 2018

Conversation

Projects
None yet
7 participants
@keleshev
Copy link
Contributor

commented Aug 23, 2018

Format.pp_print_custom_break is a more general break hint that can emit non-whitespace characters, for example, an additional trailing semicolon when a list of items is printed vertically.

Motivation

Sometimes when formatting code or data structures you might want to print different non-whitespace characters depending on the layout being horizontal or vertical. For example, you might want to add a trailing semicolon when printing an OCaml list vertically:

(* Horizontal *)
[foo; baz; bar]

(* Vertical *)
[
  foo;
  bar;
  baz; (* Note the additional semicolon *)
]

Another example use-case is when a separator in a language is optional. For example, in many languages semicolons are optional for separating statements if each statement is located on a separate line. So you might want to remove semicolons when printing a bunch of statements vertically:

# Horizontal
{foo; baz; bar}

# Vertical
{
  foo
  baz
  bar
}

Both examples can be implemented using the new Format.pp_print_custom_break. See the added test file for the implementations.

Implementation notes

The new break hint is implemented by introducing a new pretty-printing token Pp_custom_break. It is handled similarly to the existing Pp_break, with some modifications. If desirable, the Pp_break can be removed and pp_print_break can be re-implemented in terms of Pp_custom_break.

@keleshev keleshev force-pushed the keleshev:custom-break branch from b7a08af to 600b8c4 Aug 23, 2018

@keleshev keleshev closed this Aug 23, 2018

@keleshev keleshev reopened this Aug 23, 2018

@jberdine
Copy link

left a comment

The code looks good to me, though I have not tested it much. Best to refactor to remove repetition if the change is approved.

As a vote for utility, note that I made a similar addition here using a similar implementation, which is used extensively in ocamlformat.

Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml
@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

I think the addition is useful and the API looks good. I'm not completely convinced by the labels, but that's a minor point. I agree with @jberdine's remarks: you should definitely express everything that can be in term of this new more general primitive.

@jberdine This is slightly offtopic, but is there somewhere a list of your local improvements for format in ocamlformat's fork ? Integrating practical improvements to Format in the mainline version would definitely interest me.

@jberdine

This comment has been minimized.

Copy link

commented Aug 23, 2018

(@Drup, there is #1229, but it is slightly behind ocamlformat_suppport.Format_ and is embarrassingly in need of rebasing.)

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

@jberdine Oh, I missed that one. It seems to me that the current version superseeds your PR, except for "fits_or_breaks". Is that correct ?

@jberdine

This comment has been minimized.

Copy link

commented Aug 23, 2018

I'm not sure that pp_print_string_if_newline and pp_print_or_newline are expressible in terms of this PR. They are more similar to pp_print_if_newline in that they test if the current line has been broken, rather than being a break hint themselves. I think pp_print_pre_break is subsumed by this PR, but have not tested (I have been surprised by corners of Format behavior before).

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2018

@jberdine I've added a test that shows that pp_print_custom_break covers the pp_print_string_if_newline use-case from #1229. I hope to make sure that this PR is useful for ocamlformat, and not something that you will need work around!

I'll re-implement Pp_break (and related code) in terms of Pp_custom_break. By that time Pp_custom_break will probably be simply called Pp_break.

@jberdine

This comment has been minimized.

Copy link

commented Aug 23, 2018

Note that the main point of the "test if the current line has been broken" behavior of e.g. pp_print_string_if_newline is to enable the flexibility of separating the break hint and the conditionally formatted string, in particular in cases where there is a close_box between them.

@pierreweis

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

@jberdine could you explain a bit more? Perhaps, an example?

Edit: and most importantly, would there be cases in ocamlformat where pp_print_string_if_newline would do the job and pp_print_cusom_break would not be enough?

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

However, the philosophy of the
library is to provide a high level API to the programmer and thus to provide
formatting annotations directly in the format string arguments of *printf high
level functions. Could you explain how do you plan to provide this new breaks
in format strings ?

Currently, there is no special syntax for this break hint. But it works well with %t (from the test):

        fprintf ppf "[@;<0 2>@[<%s>%a@]%t]" box_type
          (list ~pp_sep (format_array box_type)) items
          (pp_print_custom_break ~no_break:"" ~yes_break:(";", ""))

Another way would be to provide a special syntax, for example {|@;<"" ";" "">|}. I'm interested to hear opinions about that.

In the three current tests/examples, the new break is used as 1) last break, 2) separator, 3) first break. I don't see it as a problem.

I will try to write more tests to see how the new breaks works with compacting boxes.

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

In the last commit I added tests for compacting boxes. (I will clean up the history once we are happy with how the code looks. I think the "emergent" commit history is valuable during review, but not after it is merged.)

I am quite happy with how the custom break works with compacting boxes (see tests):

[
  Foo;
  Baz;
  Bar;
  Qux;
  Quux;
  Quuz;
  Corge;
  Grault;
  Garply;
]
[
  Foo; Baz; Bar; Qux;
  Quux; Quuz; Corge;
  Grault; Garply;
]
[Foo; Baz; Bar; Qux; Quux; Quuz; Corge; Grault; Garply]

{
  Foo
  Baz
  Bar
  Qux
  Quux
  Quuz
  Corge
  Grault
  Garply
}
{
  Foo; Baz; Bar; Qux
  Quux; Quuz; Corge
  Grault; Garply
}
{Foo; Baz; Bar; Qux; Quux; Quuz; Corge; Grault; Garply}
@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

Nice work with the tests!

Regarding the syntax, @;<"" ";" ""> is going to be a bit painful, due to string escaping (also, format syntax generally doesn't use "), but the underlying idea sounds good to me, we just need to find an appropriate separator. Until now, format uses spaces as separator, which is obviously not appropriate here.

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

I think that this custom break is so rare in practice, that I would not give it a special syntax.

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

Regarding implementing pp_print_break in terms of pp_print_custom_break. I have done this in a separate branch:

keleshev@cbdcb91

There are some issues with that. pp_print_break uses state.pp_out_spaces callaback for same-line offset. If the offset is generalized to any string, then state.pp_out_string should be used instead. However, if so, then state.pp_out_spaces becomes an unused callback for all formatters! This is one breaking change.

Similar with new-line offset. In pp_print_break it uses state.pp_out_indent for printing all of the indentation + offset. However, if implemented with pp_print_custom_break the offset must be printed with state.pp_out_string. That's another breaking change.

Both could be fixed by peeking into the offset to check if it consists only of spaces, but that feels like a dirty hack to me. This also can influence performance, if we need to iterate over each char in offset each time.

Given the above, I lean towards the current implementation, which does not re-implement pp_print_break in terms of pp_print_custom_break.

@Drup

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

That's an interesting question. While it's possible in theory to have pp_out_space n not being equivalent to pp_out_string (String.create n ' '), I'm not sure anyone is doing that in the wild. In this case, doing that particular breaking change would be fine (given an appropriate removal path which neds to be discussed). pp_out_indent is still there and it's clearly the most useful of the two. I agree that analyzing the string does not sound like a good idea (if people want to do that, they can do it themselves in pp_out_string).

I think we need a bit of a survey of the ecosystem to know if we can make that simplification.

@pierreweis Do you have any remarks on using pp_out_string instead of pp_out_space for (generalized) breaks ?

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

I think we need a bit of a survey of the ecosystem to know if we can make that simplification.

How is something like that usually done?

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

FYI, state.out_indent was added in #615 in 4.06.0

@pierreweis

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

@pierreweis this sounds very interesting! I will try out this design. Good point about the negative offset—that's yet another reason why the current proposal is not a generalization of the existing break hint.

@jberdine

This comment has been minimized.

Copy link

commented Aug 24, 2018

@jberdine

This comment has been minimized.

Copy link

commented Aug 24, 2018

@keleshev keleshev force-pushed the keleshev:custom-break branch from b384c99 to 1b09abe Aug 29, 2018

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 29, 2018

I have implemented @pierreweis's design and am ready for review.

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2018

@rbonichon

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2018

@keleshev You have indeed handled the printing aspects of @pierreweis 's suggestion.
My reading is that he also mentions generalizing the format string formatting command, like @;<%s n %s, %s m %s> which could be specified as the current break hints.

In a way this respects a kind of (implicit) symmetry where you can use direct formatting functions or formatted format strings for you pretty-printing specifications (for example: pp_print_cut and @,).

I do not see that your proposal handles that aspect --- and this may actually make your pull request more involved, touching GADT's and all.
Would you be keen on adding that as well ?

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2018

My 2 cents: format strings are good as long as format remain relatively simple, but I don't see a strong need for creating complex syntax for advanced concepts. They are better served (in term or code readability, tooling, etc) by normal functions.

@jberdine

This comment has been minimized.

Copy link

commented Aug 31, 2018

I agree with @alainfrisch. I think format strings are great for short or simple uses, but I have found that they become hard to read, and are less flexible, for more complicated uses. And for code that primarily uses format strings, there a %a specifier can be used for the added custom break hints. If adding format strings for this is not too involved, it would be good to do. But I would hate to see this PR get blocked on it.

(I plan to look at this PR in detail, but probably won't have time until early next week.)

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Aug 31, 2018

It is intentional that there's no special syntax for the new break. I think it is too rare to warrant a special syntax, and it works well enough with %t. Example:

        fprintf ppf "[@;<0 2>@[<%s>%a@]%t]" box_type
          (list ~pp_sep (format_array box_type)) items
          (pp_print_custom_break ~no_break:("", 0, "") ~yes_break:(";", 0, ""))
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved Changes Outdated
Show resolved Hide resolved stdlib/format.mli
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved testsuite/tests/lib-format/pp_print_custom_break.ml Outdated

@jberdine jberdine referenced this pull request Sep 5, 2018

Closed

Rebase ocaml/ocaml#1229 #310

@gasche

gasche approved these changes Sep 5, 2018

Copy link
Member

left a comment

In order to limit uncertainty about the work being poured into this PR, I'm approving it (but I have not reviewed it myself) on the basis of:

  • the choice of API suggested by @pierreweis (thanks!)
  • the thorough implementation review by @jberdine (thanks!)

This should not be merged immediately, in particular there remain comments by @jberdine to address -- then rebase the PR, check the CI, etc. @keleshev, please say explicitly when you think that the PR has reached a mergeable state (including a cleaned-up commit history). And thanks!

@gasche

This comment has been minimized.

Copy link
Member

commented Oct 12, 2018

Gentle ping. There are conflicts in the Changes entry now, so this would need a rebase. @keleshev, are you interested in integrating @jberdine's feedback?

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Oct 15, 2018

@gasche yes, I'm planning to address all the excellent @jberdine suggestions. I hope that I will have time to do this during November.

@keleshev keleshev force-pushed the keleshev:custom-break branch 2 times, most recently from 7ca8a30 to 0f57dee Nov 23, 2018

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Nov 26, 2018

@jberdine I've addressed your suggestions. Can you have another look?

@jberdine

This comment has been minimized.

Copy link

commented Nov 29, 2018

@jberdine
Copy link

left a comment

Looks good. I have only one suggestion, and a few very minor comments.

Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated
Show resolved Hide resolved stdlib/format.ml Outdated

@keleshev keleshev force-pushed the keleshev:custom-break branch from 0f57dee to cd03a17 Dec 7, 2018

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Dec 7, 2018

@jberdine how about now?

@jberdine

This comment has been minimized.

Copy link

commented Dec 8, 2018

Thanks! This looks good to me.

@keleshev

This comment has been minimized.

Copy link
Contributor Author

commented Dec 8, 2018

@gasche this is ready to be merged, I think.

@gasche gasche merged commit 46a828c into ocaml:trunk Dec 8, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@gasche

This comment has been minimized.

Copy link
Member

commented Dec 8, 2018

Merged! Thanks everyone for your work.

@keleshev keleshev deleted the keleshev:custom-break branch Dec 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.