Skip to content

Format: add pp_print_substring and pp_print_substring_as#12133

Merged
Octachron merged 4 commits into
ocaml:trunkfrom
Octachron:format_substring
Feb 7, 2024
Merged

Format: add pp_print_substring and pp_print_substring_as#12133
Octachron merged 4 commits into
ocaml:trunkfrom
Octachron:format_substring

Conversation

@Octachron
Copy link
Copy Markdown
Member

While looking at Format implementation, I was intrigued by the discrepancy that Format requires that its low-level printing devices are able to output slice of strings, but Format never exposes this feature to end users.

This PR proposes to fix this asymmetry by adding two functions: pp_print_substring and pp_print_substring_as to the Format module. Along the way, it updates pp_print_text to use the new function pp_print_substring function.

Note that this PR is mostly a request for comments, I am not sure if the functions are that useful but the implementation was very straightforward.

@c-cube
Copy link
Copy Markdown
Contributor

c-cube commented Mar 23, 2023

This is one of these little paper cuts in the stdlib. I wish this had been added in 3.12 when I started using OCaml in earnest :).

Copy link
Copy Markdown
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is reasonable to expose this capability, and the implementation looks fine. Approved.

(If we wanted to implement Jules' idea of precision for %s format specifiers, we would also need something like that for efficient support in Format. The other thing we would need is a new case Acc_data_substring in the type camlinternalFormat.acc.

@Octachron
Copy link
Copy Markdown
Member Author

We might want at least a second opinion on the labels for the function (that were picked because they sounded good enough).

@dbuenzli
Copy link
Copy Markdown
Contributor

dbuenzli commented Apr 4, 2023

Personally I tend more to use first/last/len for specifyng subranges and start/last for specifying processes (e.g. find) operating in a subrange (no stop, it's ambiguous).

But if you want to be consistent with StringLabels these start should rather be pos.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 6, 2023

If I understand correctly, the recommendation is to use either ~first ~last, or ~first ~len, or ~pos ~len. @Octachron, which do you prefer?

@Octachron
Copy link
Copy Markdown
Member Author

Keeping the information that the first integer is the first character of the substring range à la ~first ~len seems more important to me than being consistent with StringLabels in this case.

@gasche
Copy link
Copy Markdown
Member

gasche commented Apr 25, 2023

Fine with me. This PR still needs a second approval.

Comment thread stdlib/format.mli Outdated
Comment on lines +238 to +244
val pp_print_substring : first:int -> len:int -> formatter -> string -> unit
val print_substring : first:int -> len:int -> string -> unit
(** [pp_print_substring ~first ~len ppf s] prints the substring of [s]
that starts at index [start] and stop at index [start+len] in the current
pretty-printing box.
@since 5.1
*)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring does not match the labels: start <-> first.

Personally, I didn't understand from the discussion why we don't use pos for the first label as in StringLabels. It seems needlessly noisy to introduce yet another convention of how to refer to the starting offset of a substring.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that we should be beholden to the label choice of StringLabels outside of the labelled variant of the standard library and pos is an uninformative and ambiguous label name.

But if you prefer I could use it; after all Format API is already barely usable without Fmt due to its extremely verbose naming convention.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but for me pos means "position" and I have no doubt that it means the position of the (start of the) substring, and len its len. So i am fine with either pos and first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that String attributes a particular meaning to the term position (versus index, see the String module preamble). If pos is used one should check the function works when it is equal to String.length s.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this issue, so I'm happy to let @Octachron decide. In any case, the docstring should be corrected.

Comment thread stdlib/format.mli
Comment on lines +261 to +264
(** [pp_print_substring_as ~first ~len ppf len_as s] prints the substring of [s]
that starts at index [start] and stop at index [start+len] in the current
pretty-printing box as if it were of length [len_as].
@since 5.1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

@Octachron
Copy link
Copy Markdown
Member Author

I have updated the documentation and decided to go with the more uniform ~pos label (after checking that yes print_substring s ~pos:(String.length s) ~len:0 works as intended).

@Octachron
Copy link
Copy Markdown
Member Author

@nojb , are you fine with the current state?

@nojb
Copy link
Copy Markdown
Contributor

nojb commented Feb 2, 2024

@nojb , are you fine with the current state?

Yes!

The @since tags need to be updated, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants