Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<ctype.h> functions for AsciiExt #39658

Closed
zackw opened this issue Feb 8, 2017 · 30 comments
Closed

<ctype.h> functions for AsciiExt #39658

zackw opened this issue Feb 8, 2017 · 30 comments
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@zackw
Copy link
Contributor

zackw commented Feb 8, 2017

As discussed at https://internals.rust-lang.org/t/improvements-to-asciiext/4689 I have added is_ascii_* equivalents of most (not quite all) of C's ctype.h's isxxx functions to the AsciiExt trait.

The new functions are

  • is_ascii_alphabetic
  • is_ascii_uppercase
  • is_ascii_lowercase
  • is_ascii_alphanumeric
  • is_ascii_digit
  • is_ascii_hexdigit
  • is_ascii_punctuation
  • is_ascii_graphic
  • is_ascii_whitespace (matches the Infra Standard definition of ASCII whitespace, not POSIX)
  • is_ascii_control

They are implemented for char, u8, str, and [u8], and, for backward compatibility with external trait implementations, have default implementations that call unimplemented!().

PR to follow, but I needed to file this bug in order to get an issue number to put in the stability annotations.

zackw added a commit to zackw/rust that referenced this issue Feb 8, 2017
 * `is_ascii_alphabetic`
 * `is_ascii_uppercase`
 * `is_ascii_lowercase`
 * `is_ascii_alphanumeric`
 * `is_ascii_digit`
 * `is_ascii_hexdigit`
 * `is_ascii_punctuation`
 * `is_ascii_graphic`
 * `is_ascii_whitespace`
 * `is_ascii_control`

This addresses issue rust-lang#39658.
@frewsxcv frewsxcv added the A-libs label Feb 9, 2017
@alexcrichton alexcrichton added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 13, 2017
frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 14, 2017
Add equivalents of C's <ctype.h> functions to AsciiExt.

 * `is_ascii_alphabetic`
 * `is_ascii_uppercase`
 * `is_ascii_lowercase`
 * `is_ascii_alphanumeric`
 * `is_ascii_digit`
 * `is_ascii_hexdigit`
 * `is_ascii_punctuation`
 * `is_ascii_graphic`
 * `is_ascii_whitespace`
 * `is_ascii_control`

This addresses issue rust-lang#39658.

Lightly tested on x86-64-linux.  tidy complains about the URLs in the documentation making lines too long, I don't know what to do about that.
frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 14, 2017
Add equivalents of C's <ctype.h> functions to AsciiExt.

 * `is_ascii_alphabetic`
 * `is_ascii_uppercase`
 * `is_ascii_lowercase`
 * `is_ascii_alphanumeric`
 * `is_ascii_digit`
 * `is_ascii_hexdigit`
 * `is_ascii_punctuation`
 * `is_ascii_graphic`
 * `is_ascii_whitespace`
 * `is_ascii_control`

This addresses issue rust-lang#39658.

Lightly tested on x86-64-linux.  tidy complains about the URLs in the documentation making lines too long, I don't know what to do about that.
frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 14, 2017
Add equivalents of C's <ctype.h> functions to AsciiExt.

 * `is_ascii_alphabetic`
 * `is_ascii_uppercase`
 * `is_ascii_lowercase`
 * `is_ascii_alphanumeric`
 * `is_ascii_digit`
 * `is_ascii_hexdigit`
 * `is_ascii_punctuation`
 * `is_ascii_graphic`
 * `is_ascii_whitespace`
 * `is_ascii_control`

This addresses issue rust-lang#39658.

Lightly tested on x86-64-linux.  tidy complains about the URLs in the documentation making lines too long, I don't know what to do about that.
frewsxcv added a commit to frewsxcv/rust that referenced this issue Feb 14, 2017
Add equivalents of C's <ctype.h> functions to AsciiExt.

 * `is_ascii_alphabetic`
 * `is_ascii_uppercase`
 * `is_ascii_lowercase`
 * `is_ascii_alphanumeric`
 * `is_ascii_digit`
 * `is_ascii_hexdigit`
 * `is_ascii_punctuation`
 * `is_ascii_graphic`
 * `is_ascii_whitespace`
 * `is_ascii_control`

This addresses issue rust-lang#39658.

Lightly tested on x86-64-linux.  tidy complains about the URLs in the documentation making lines too long, I don't know what to do about that.
@SimonSapin
Copy link
Contributor

Stabilize?

I just wrote if let b'A' ... b'Z' = b {, it’s not too bad but might have looked nicer as if b.is_ascii_uppercase() {

@zackw
Copy link
Contributor Author

zackw commented May 1, 2017

There was a request over in #39659 to deprecate AsciiExt and convert all the methods to intrinsics on str, [u8], etc. but I never got around to it and don't have time in the near future either. I dunno how a plan to do that would affect the stabilization schedule.

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jul 22, 2017
@LukasKalbertodt
Copy link
Member

What's the status here? Could I try to "deprecate AsciiExt and convert all the methods to intrinsics on str, [u8]", create a PR and it might be accepted? Or are there other plans? If there is nothing except "lack of time" blocking this, I might be able to help.

@zackw
Copy link
Contributor Author

zackw commented Aug 14, 2017 via email

@LukasKalbertodt
Copy link
Member

Right now AsciiExt is implemented for &[u8] and u8. When copying all AsciiExt methods to the individual types, shall I also copy them for &[u8] and u8? I kinda doubt those implementations are really useful. And it also doesn't make sense not to implement it for u16 (e.g. an array of utf16 codepoints), too.

What do you think?

@SimonSapin
Copy link
Contributor

I’ve definitely used is_ascii, to_ascii_lowercase, and eq_ignore_ascii_case on [u8] and u8.

@SimonSapin
Copy link
Contributor

CC @alexcrichton / @rust-lang/libs for FCP to stabilize as inherent methods: #44042

@aturon
Copy link
Member

aturon commented Oct 3, 2017

@rfcbot fcp merge

@rfcbot
Copy link

rfcbot commented Oct 3, 2017

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

Concerns:

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Oct 3, 2017
@dtolnay
Copy link
Member

dtolnay commented Oct 3, 2017

The semantics of e.g. str::is_ascii_hexdigit as meaning every byte is a hex digit is a bit of a stretch. Now that these are inherent methods rather than a common extension trait across char and str, we should consider that we no longer need all the methods across all the types.

  • Is "A" an ASCII hex digit? Sure.
  • Is "AA" an ASCII hex digit? Not obvious to me.
  • Is "" an ASCII hex digit?

Would it be bad to require the user to be more explicit in some of these cases?

s.bytes().all(|b| b.is_ascii_hexdigit())

@rfcbot concern str and [u8] methods

@alexcrichton
Copy link
Member

I think I agree with @dtolnay, and given #44042 I think we should "soon" have the ability to implement that relatively easily?

@alexcrichton
Copy link
Member

Discussed at the libs triage meeting today, we're going to scope this issue to be blocked on #44042 and then we'll stabilize these methods only on the u8 and char types.

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Nov 29, 2017
@codeyash
Copy link

Any update here?

@SimonSapin
Copy link
Contributor

Rust 1.24 was recently released with these methods available as inherent methods of u8 and char. For example: https://doc.rust-lang.org/std/primitive.char.html#method.is_ascii_alphabetic

I think we can now deprecate the AsciiExt trait. (And, a release cycle or two after that, perhaps remove its unstable methods.) This issue should probably be re-opened to track this.

@nagisa nagisa reopened this Feb 22, 2018
@nagisa
Copy link
Member

nagisa commented Feb 22, 2018

Reopened as per request on IRC.

@zbraniecki
Copy link

Is it still the plan to get at least is_ascii_alphabetic, is_ascii_uppercase ,is_ascii_lowercase, is_ascii_digit and is_ascii_alphanumeric on str? It's a bit verbose to have to create an iterator over chars for each string.

@LukasKalbertodt
Copy link
Member

@zbraniecki We decided that we prefer the explicit version of s.chars().all(|c| c.is_foo()) (see comment). I'd doubt that those methods will be added to str and [u8] anytime soon...

@SimonSapin
Copy link
Contributor

Deprecation PR for the AsciiExt trait: #49109

SimonSapin added a commit to SimonSapin/rust that referenced this issue Mar 21, 2018
The trait and some of its methods are stable and will remain.
Some of the newer methods are unstable and can be removed later.

Fixes rust-lang#39658
kennytm added a commit to kennytm/rust that referenced this issue Mar 21, 2018
…excrichton

Deprecate the AsciiExt trait in favor of inherent methods

The trait and some of its methods are stable and will remain.
Some of the newer methods are unstable and can be removed later.

Fixes rust-lang#39658
kennytm added a commit to kennytm/rust that referenced this issue Mar 22, 2018
…excrichton

Deprecate the AsciiExt trait in favor of inherent methods

The trait and some of its methods are stable and will remain.
Some of the newer methods are unstable and can be removed later.

Fixes rust-lang#39658
@LukasKalbertodt
Copy link
Member

Shouldn't this issue also track "removing of unstable AsciiExt methods"? Otherwise we might forget to remove those. Or was there already a PR doing that which I missed?

@SimonSapin
Copy link
Contributor

I don’t know how useful it is to track such features individually. We can occasionally do a pass on the entire standard library looking for items that are both unstable and have been deprecated for a while.

@LukasKalbertodt
Copy link
Member

@SimonSapin Ah, I missed the fact that the unstable methods were also marked deprecated. That makes sense then!

@BartMassey
Copy link
Contributor

I'm disappointed that there is now no officially-supported version of is_ascii_lowercase() and friends for str. This was functionality I was using, and it's annoying to go back and write out fancy iterators everywhere or provide my own crate for that functionality. I understand moving this stuff out of AsciiExt, but could we please have the full functionality back?

@SimonSapin
Copy link
Contributor

@BartMassey IMO str::is_ascii_lowercase might have an unexpected behavior. On char, the same method tells whether the character is in the range a to z, whether it is an ASCII lower-case letter. The "simple" generalization of that to strings is: is this string made entirely of ASCII lower-case letters?

However, what one might be looking for, and not realize that str::is_ascii_lowercase is different is: if I were to call to_ascii_lowercase, would I obtain the same string? which really means does this string not contain any upper-case letter?

Forcing users to write s.chars().all(something something) makes it apparent what a call actually does.

(Maybe char::is_ascii_lowercase (respectively uppercase) should be named something like is_ascii_lowercase_alphabetic instead…)

@BartMassey
Copy link
Contributor

That is an interesting point. There are certainly a number of fine distinctions that can be drawn here: one might reasonably expect that str::is_ascii_lowercase() might return true if and only if the target contained only lowercase alphabetic ASCII characters, for example.

That said, I still feel like the AsciiExt crate already defined a behavior for this function, and that the right thing to do would be to preserve that functionality across the move, at least for now. It feels a bit iffy to me to remove existing functionality without going through the RFC process (as far as I can tell?) in the interest of protecting users from using functions wrong. Do we have evidence that people have routinely misunderstood or misused is_ascii_lowercase()? Its documentation seemed pretty clear to me.

There's an underlying problem here in that anything we do to make ASCII more of a first-class citizen is a step toward encouraging folks to routinely write ASCII-only programs. Nobody wants that.

Having said that, tutorials and examples almost always use and pretty much depend on ASCII: things like Exercism and the Advent Of Code series have caused me an occasional "any other language" kind of feeling because the exercises just assume quality ASCII support. Asking newer users to understand how to write the appropriate iterators to complete these exercises seems to me like it might have its own set of issues.

I don't know. I guess I just need to identify or write a crate that has good support for ASCII-only strings and chars and start using it in situations where that's what I'm working with. AsciiExt was never that, and so maybe removing it makes a hole that can be filled with something better. (A quick survey of the ascii crate looks promising, although I'd still have to add the string functions, looks like. It's been downloaded quite a lot, which might indicate a real demand for this kind of thing.)

For now, s.chars().all(|c| c.is_ascii() && !c.is_ascii_uppercase()) it is, I guess.

@SimonSapin
Copy link
Contributor

to remove existing functionality without going through the RFC process

For what it’s worth the addition of AsciiExt::to_ascii_lowercase never went through the RFC process in the first place.

For now, s.chars().all(|c| c.is_ascii() && !c.is_ascii_uppercase()) it is, I guess.

Interesting, because this behavior is neither <str as AsciiExt>::is_ascii_lowercase (that’s self.bytes().all(|b| b.is_ascii_lowercase())) nor what I had in mind for the alternative behavior (.bytes().all(|b| !b.is_ascii_uppercase()), without checking that the string is entirely ASCII).

Anyway, it’s always possible to add more methods but a closed issue may not be the best place to propose them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests