Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initial stubs for unicode case folding support #5822

Closed

Conversation

Kimundi
Copy link
Member

@Kimundi Kimundi commented Apr 10, 2013

No description provided.

@graydon
Copy link
Contributor

graydon commented Apr 10, 2013

It's not clear to me that we want to pull case folding (or "more unicode logic in general") into native rust code, vs. absorbing a dependency on an external unicode library. Every time this comes up I mention http://www.gnu.org/software/libunistring/ as a possible contender, but I balk on trying to figure out the implications of depending on LGPL code. I should probably talk to a licensing-and-law expert to clarify.

@Kimundi
Copy link
Member Author

Kimundi commented Apr 10, 2013

Yeah, would be nice if a good implementation could get integrated.

@kud1ing
Copy link

kud1ing commented Apr 11, 2013

GHC have replaced LGPLed libgmp: http://hackage.haskell.org/trac/ghc/wiki/ReplacingGMPNotes

"Even though [GHC] is essentially a "free software" license (BSD3), according to paragraph 2 of the LGPL, GHC must be distributed under the terms of the LGPL! "

Currently it's also an issue for LGPLed Qt on iOS: http://qt-project.org/wiki/Licensing-talk-about-mobile-platforms

@kud1ing
Copy link

kud1ing commented Apr 11, 2013

I don't know whether D's Unicode module is correct or complete, but they do not seem to depend on anything thirdparty: https://github.com/D-Programming-Language/phobos/blob/master/std/uni.d
The code is probably generated.

Maybe we should go on an extending unicode.rs with more generated code?
I consider out-of-the-box Unicode support important, even for a systems programming language.

@graydon
Copy link
Contributor

graydon commented Apr 13, 2013

I agree. I have some fixes to the unicode script in a wip branch of my own (mostly moving from switch statements to static tables now that we support those) but it doesn't cover the "slightly more than basic" stuff like normalization forms, collation and case folding. Also locales and message formatting.

We discussed a bit more at the end of the workweek and concluded ... in an unsatisfying place: that libICU is the only non-LGPL thing we can use as a project dependency, but that it's too big and clunky to make a dependency of libcore, and should be an external package managed by rustpkg when it's in shape (possibly one we consider part of the canonical / standard package set that mozilla supports).

This is roughly where we started from -- not able to commit to wanting "full" unicode (libICU is over 20mb of binary footprint) so thinking in terms of libcore containing a "partial" unicode library, like the one it currently has (possibly plus some more functions). Where to draw the line remains ... confusing to me, to say the least. There are a lot of parts to unicode.

Where would you draw the line?

@Kimundi
Copy link
Member Author

Kimundi commented Apr 13, 2013

Those are hard questions, and I don't know how to solve them, but I know that I'd like to know what happens to this pull request ;) Should I remove the parts that give possible unicode support, making it ascii only, or?...

@thestinger
Copy link
Contributor

@Kimundi: I don't think they should be implemented on char if they don't have Unicode support, because it represents a ucs4 code point.

The libc functions like toupper aren't actually ascii, and they have half-working locale support depending on the platform. There are actual ascii versions included in glib because the implementation-defined libc ones just aren't reliable.

@Kimundi
Copy link
Member Author

Kimundi commented Apr 15, 2013

Would it be okay to remove the locale stuff, and add '_ascii' suffixes to the functions to signify that they only do their work on [A-Za-z] ? I'd imagine that they'd mostly just used for ascii values anyway, and this way the functions don't promise more then the can do.

@thestinger
Copy link
Contributor

They're not going to do the expected thing all the time though, for example if the locale is Turkish.

@Kimundi
Copy link
Member Author

Kimundi commented Apr 16, 2013

@thestinger : I'm talking about explicit 'map [A-Z] <-> [a-z] functions that don't use libc for that.

@Kimundi
Copy link
Member Author

Kimundi commented Apr 17, 2013

Closing this for now, unicode issues are still undecided, and ascii values should be done differently.

@Kimundi Kimundi closed this Apr 17, 2013
flip1995 pushed a commit to flip1995/rust that referenced this pull request Aug 28, 2020
This includes a workaround of the issue rust-lang#5822,
the cause of this little mistake.
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 3, 2021
Downgrade option_if_let_else to nursery

I believe that this lint's loose understanding of ownership (rust-lang#5822, rust-lang#6737) makes it unsuitable to be enabled by default in its current state, even as a pedantic lint.

Additionally the lint has known problems with type inference (rust-lang#6137), though I may be willing to consider this a non-blocker in isolation if it weren't for the ownership false positives.

A fourth false positive involving const fn: rust-lang#7567.

But on top of these, for me the biggest issue is I basically fully agree with rust-lang/rust-clippy#6137 (comment). In my experience this lint universally makes code worse even when the resulting code does compile.

---

changelog: remove [`option_if_let_else`] from default set of enabled lints
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 3, 2021
Fix `option_if_let_else`

fixes: rust-lang#5822
fixes: rust-lang#6737
fixes: rust-lang#7567

The inference from rust-lang#6137 still exists so I'm not sure if this should be moved from the nursery. Before doing that though I'd almost want to see this split into two lints. One suggesting `map_or` and the other suggesting `map_or_else`.

`map_or_else` tends to have longer expressions for both branches so it doesn't end up much shorter than a match expression in practice. It also seems most people find it harder to read. `map_or` at least has the terseness benefit of being on one line most of the time, especially when the `None` branch is just a literal or path expression.

changelog: `break` and `continue` statments local to the would-be closure are allowed in `option_if_let_else`
changelog: don't lint in const contexts  in `option_if_let_else`
changelog: don't lint when yield expressions are used  in `option_if_let_else`
changelog: don't lint when the captures made by the would-be closure conflict with the other branch  in `option_if_let_else`
changelog: don't lint when a field of a local is used when the type could be pontentially moved from  in `option_if_let_else`
changelog: in some cases, don't lint when scrutinee expression conflicts with the captures of the would-be closure  in `option_if_let_else`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants