Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign updeprecate Unicode functions that will be moved to crates.io #24428
Conversation
rust-highfive
assigned
huonw
Apr 14, 2015
This comment has been minimized.
This comment has been minimized.
|
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
rust-highfive
assigned
alexcrichton
and unassigned
huonw
Apr 14, 2015
kwantam
force-pushed the
kwantam:deprecate_unicode_fns
branch
from
1e709bf
to
c596b9a
Apr 14, 2015
alexcrichton
reviewed
Apr 14, 2015
| @@ -161,6 +161,9 @@ enum DecompositionType { | |||
| /// External iterator for a string decomposition's characters. | |||
| /// | |||
| /// For use with the `std::iter` module. | |||
| #[allow(deprecated)] | |||
| #[deprecated(reason = "use the crates.io `unicode-decomp` library instead", | |||
| since = "1.0.0-nightly-20150415")] | |||
This comment has been minimized.
This comment has been minimized.
alexcrichton
Apr 14, 2015
Member
These since tags should all be "1.0.0" for now (e.g. they're applicable for the 1.0.0 release)
kwantam
force-pushed the
kwantam:deprecate_unicode_fns
branch
2 times, most recently
from
1bf8bd0
to
7aec8a0
Apr 14, 2015
This comment has been minimized.
This comment has been minimized.
|
But why? I think unicode crate is the right place to hold these features.
|
kwantam
force-pushed the
kwantam:deprecate_unicode_fns
branch
from
7aec8a0
to
d14a96c
Apr 15, 2015
This comment has been minimized.
This comment has been minimized.
|
I'm also interested to understand the rationale for breaking these into fine-grained crates, rather than building out |
This comment has been minimized.
This comment has been minimized.
|
The two things that convinced me each function should be in its own crate are:
I suppose one other point in favor is that there are already a few small crates out there that provide their own small bits of Unicode functionality, so an omnibus crate would end up either duplicating their functionality or requiring people to pull in multiple crates anyway. |
kwantam
force-pushed the
kwantam:deprecate_unicode_fns
branch
from
d14a96c
to
516795d
Apr 15, 2015
This comment has been minimized.
This comment has been minimized.
|
@liigo the idea of moving these out has been discussed in #24402, #24340, #15628, and rust-lang/rfcs#1054 . In short, the notion is that there is no particular reason to have these in the standard library, and on the other hand, including them is a burden both to libstd and to libunicode, because of the stability guarantees that libstd wants to provide to users. |
This comment has been minimized.
This comment has been minimized.
I don't think grapheme/width management should move out of libunicode, since there're strong related to Unicode. How about moves out the whole libunicode? cc @alexcrichton |
This comment has been minimized.
This comment has been minimized.
|
I think maybe you all are talking about different things: One is: should these three new The other is: should this functionality be in a crate distributed with rustc? (As opposed to crates.io.) The As @kwantam said, removing them from But there is some compiler usage where this PR simply adds |
This comment has been minimized.
This comment has been minimized.
(We could theoretically offer a |
This comment has been minimized.
This comment has been minimized.
|
@SimonSapin I agree regarding renaming libunicode. I was going to suggest libcore_unicode: the non-deprecated functionality really is being used in libcore, libcollections, and libstd, so calling it librustc_unicode might be suggestive of a narrower set of uses than is actually the case. Regarding the use of If there's general consensus on a rename, I can rename libunicode to whatever name we decide (libcore_unicode and librustc_unicode have been suggested so far), and then leave behind a dummy libunicode that re-exports everything from lib{core,rustc}_unicode with a |
This comment has been minimized.
This comment has been minimized.
|
I don’t have an opinion on the new name,
If that’s an option, I’d rather have it in this PR than |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
|
I'm ok not handling graphemes and friends in width calculations for the compiler, hardwiring to 1 seems like it's definitely fine for now. I don't have a super strong opinion on one crate vs many crates, but I might err on the side of small crates for now as it pushes back on the idea of a "dumping ground" for unicode-related functionality and as @huonw suggested we can always have our own facade crate if necessary. I would also be fine renaming libunicode in-tree, and it's also probably fine to not have much of a deprecation strategy as it looks like very few crates are still using it and it's unstable to start out with. I would recommend |
This comment has been minimized.
This comment has been minimized.
|
@alexcrichton would you prefer I leave the feature name the same ( |
This comment has been minimized.
This comment has been minimized.
|
I think leaving the same feature name is fine, e.g. "unicode support in general" |
kwantam
force-pushed the
kwantam:deprecate_unicode_fns
branch
from
516795d
to
9952769
Apr 15, 2015
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
I'm on board with the proposed changes. |
kwantam
force-pushed the
kwantam:deprecate_unicode_fns
branch
2 times, most recently
from
d1341f9
to
503533c
Apr 15, 2015
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Apr 17, 2015
This comment has been minimized.
This comment has been minimized.
|
|
bors
added a commit
that referenced
this pull request
Apr 17, 2015
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
This comment has been minimized.
This comment has been minimized.
|
Not sure how to proceed here. Don't have a mac on which to try and reproduce, and I find it slightly hard to believe that a segfault building libserialize has anything to do with this PR. |
This comment has been minimized.
This comment has been minimized.
|
@bors: retry Ah I believe this was spurious |
This comment has been minimized.
This comment has been minimized.
|
Thanks! |
This comment has been minimized.
This comment has been minimized.
bors
added a commit
that referenced
this pull request
Apr 18, 2015
This comment has been minimized.
This comment has been minimized.
bors
merged commit 29d1252
into
rust-lang:master
Apr 18, 2015
bors
referenced this pull request
Apr 18, 2015
Closed
Make the `width` function of `char` return `u32` #23539
kwantam
deleted the
kwantam:deprecate_unicode_fns
branch
Apr 18, 2015
This comment has been minimized.
This comment has been minimized.
|
I suppose it’s a bit late now, but shouldn’t deprecating/removing an entire crate + a bunch of commonly-used methods require an RFC, even if it’s marked as unstable? I know there’s been a lot of discussion about this elsewhere, but I thought that any decent-sized change should still go through the full RFC process instead of informally reaching a decision through discussion all about GitHub. |
P1start
referenced this pull request
Apr 19, 2015
Closed
Bad span computations with unicode characters, should be handling them as graphemes #8706
This comment has been minimized.
This comment has been minimized.
|
In fairness, at this point nothing's been removed from libunicode, only deprecated (well, and renamed). We can still put things through the RFC process if that's deemed necessary. It's possible that we could leave the width-related functions behind a rustc_private feature gate to keep #8706 closed. It's not clear to me how critical that bug is. |
This comment has been minimized.
This comment has been minimized.
|
@P1start unfortunately requiring an RFC for nearly any modification to libstd is probably infeasible, especially when it comes to unstable APIs. We've long thought that these APIs would move out of the standard library at some point as they've stuck out as not quite belonging for some time now, but we may not have communicated that clearly enough. Right now we don't have a great story for external libraries in the rust-lang organization (and elsewhere) in terms of evolving their API, possibly coming back into libstd, etc. I would expect an RFC if these libraries are to be re-included, but for now I wouldn't expect an RFC to move unstable features out into external crates. |
This comment has been minimized.
This comment has been minimized.
|
Hm, I'm unclear why the compiler's use of these functions was removed: isn't the point of having the |
This comment has been minimized.
This comment has been minimized.
|
@huonw I would personally be somewhat uncomfortable having to maintain these tables in the standard library for private usage by the compiler when they don't really come up in practice that often, so to slate them for deletion the compiler was hardwired to char == 1-wide slot on the screen. |
kwantam commentedApr 14, 2015
This patch
librustc_driver and libsyntax. This may change pretty-printed
output from these modules in cases involving wide or combining
characters used in filenames, identifiers, etc.
The following functions are marked deprecated:
--> use unicode-width crate
--> use unicode-segmentation crate
char.compose(), char.decompose_canonical(), char.decompose_compatible(),
char.canonical_combining_class():
--> use unicode-normalization crate