Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve docs on some char boolean methods #61794
This is an attempt to improve some of the documentation on
I believe the uniformity and wording is improved, but perhaps not everyone agrees. You will find, for example, that I replace “is an alphabetic code point” with “has the
I link to the latest Unicode documentation since that seems easiest to maintain. However, it is not necessarily the standard implemented. Right now,
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @dtolnay (or someone else) soon.
If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.
Please see the contribution instructions for more information.
Click to expand the log.
dtolnay left a comment
Thanks! I agree that documenting char::is_lowercase as "Returns true if this char is lowercase" is silly.
I think this change is mostly fine but I would like to try to improve clarity of the all-important first sentence on each method. First sentences are supposed to stand alone to the extent that if you only read the first sentence of each method you still get a reasonable understanding of the point of the method. Documentation writers commonly get this wrong by having the first sentence summarize the rest of the documentation of the method; if the summary can only make sense after reading and understanding the rest of the documentation then the first sentence fails to stand alone.
My feeling is that many of these new first sentences only make sense if the reader already understands the point of the method.
For example you have:
This sentence means nothing until after the reader learns what "White_Space property" refers to by reading the rest of the documentation.
As I see it there are 2 key things that a first sentence would need to communicate to characterize this method:
I think the sentence as written isn't effective at communicating either of these things. We can try to brainstorm some alternatives; I feel that something more in this direction would work better:
Another dimension to keep in mind about first sentences is that they ideally communicate the point of the method to the intended target audience for the method 1 in terms that makes sense for that audience. The set of people who will want to call is_whitespace() is different and has different background knowledge compared to the set of people who will need to call something like is_grapheme_extended(), so there will tend to be differences in the character of the documentation as a result.
1 Occasionally we extend this to include also the set of people who would be most likely mislead into thinking that they want to call a method they shouldn't, i.e. "this is probably not what you want"-style documentation, but this is uncommon.
@dtolnay Thank you for the enthusiastic response. I'm more than happy to brainstorm alternative approaches.
I generally agree with your summary of the goals of the first sentence.
I concede that I may have erred more on the side of brevity than necessary. I left out “Unicode” since I think it's understood that
Considering your specific example:
I like the usage of “classified” better than the existing “satisfies” or my “has.” I would amend it slightly to:
I think it's better to avoid the possessive form when it's not necessary, and “the Unicode ... property” helps to imply that there's only one and that it's standardized.
Another option would be to re-emphasize the code point aspect:
I feel that
Also, you left out “this
dtolnay left a comment
Sounds good, “the Unicode ... property” works for me. "Code point" instead of "character" is fine too, will leave that up to you. I would lean toward "character" because the type's documentation introduces it as "The
I think Unicode is important to mention by name in some form. I would not expect readers to deduce that "the