Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate Unicode functions that will be moved to crates.io #24428

Merged
merged 1 commit into from Apr 18, 2015

Conversation

Projects
None yet
9 participants
@kwantam
Copy link
Contributor

kwantam commented Apr 14, 2015

This patch

  1. renames libunicode to librustc_unicode,
  2. deprecates several pieces of libunicode (see below), and
  3. removes references to deprecated functions from
    librustc_driver and libsyntax. This may change pretty-printed
    output from these modules in cases involving wide or combining
    characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

  1. char.width() and str.width():
    --> use unicode-width crate
  2. str.graphemes() and str.grapheme_indices():
    --> use unicode-segmentation crate
  3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
    char.compose(), char.decompose_canonical(), char.decompose_compatible(),
    char.canonical_combining_class():
    --> use unicode-normalization crate
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Apr 14, 2015

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 14, 2015

@rust-highfive rust-highfive assigned alexcrichton and unassigned huonw Apr 14, 2015

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 14, 2015

Crates are up on crates.io:

(cc @aturon carrying over from previous discussion in now-closed #24402)

@kwantam kwantam force-pushed the kwantam:deprecate_unicode_fns branch from 1e709bf to c596b9a Apr 14, 2015

@@ -161,6 +161,9 @@ enum DecompositionType {
/// External iterator for a string decomposition's characters.
///
/// For use with the `std::iter` module.
#[allow(deprecated)]
#[deprecated(reason = "use the crates.io `unicode-decomp` library instead",
since = "1.0.0-nightly-20150415")]

This comment has been minimized.

@alexcrichton

alexcrichton Apr 14, 2015

Member

These since tags should all be "1.0.0" for now (e.g. they're applicable for the 1.0.0 release)

@kwantam kwantam force-pushed the kwantam:deprecate_unicode_fns branch 2 times, most recently from 1bf8bd0 to 7aec8a0 Apr 14, 2015

@liigo

This comment has been minimized.

Copy link
Contributor

liigo commented Apr 15, 2015

@kwantam kwantam force-pushed the kwantam:deprecate_unicode_fns branch from 7aec8a0 to d14a96c Apr 15, 2015

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Apr 15, 2015

I'm also interested to understand the rationale for breaking these into fine-grained crates, rather than building out libunicode's functionality. Could you elaborate? cc @SimonSapin

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 15, 2015

The two things that convinced me each function should be in its own crate are:

  1. Some of these functions can get by with no_std, which makes them more generally useful (though you might argue that this is less useful nowadays).
  2. Each piece of functionality requires its own relatively large set of Unicode tables. Separating them means that these can be brought into a project piecemeal rather than all at once. (TBQH I'm not sure the extent to which rustc does link time optimization, but in the best case LTO would render this concern moot, at least for optimized builds.)

I suppose one other point in favor is that there are already a few small crates out there that provide their own small bits of Unicode functionality, so an omnibus crate would end up either duplicating their functionality or requiring people to pull in multiple crates anyway.

@kwantam kwantam force-pushed the kwantam:deprecate_unicode_fns branch from d14a96c to 516795d Apr 15, 2015

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 15, 2015

@liigo the idea of moving these out has been discussed in #24402, #24340, #15628, and rust-lang/rfcs#1054 .

In short, the notion is that there is no particular reason to have these in the standard library, and on the other hand, including them is a burden both to libstd and to libunicode, because of the stability guarantees that libstd wants to provide to users.

@liigo

This comment has been minimized.

Copy link
Contributor

liigo commented Apr 15, 2015

@kwantam

the idea of moving these out has been discussed in #24402, #24340, #15628, and rust-lang/rfcs#1054

I don't think grapheme/width management should move out of libunicode, since there're strong related to Unicode. How about moves out the whole libunicode? cc @alexcrichton

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Apr 15, 2015

I think maybe you all are talking about different things:

One is: should these three new unicode-* crates on crates.io be a single unicode crate on crates.io? I personally tend to prefer multiple smaller single-purpose crates over fewer larger crates containing multiple loosely-related features, but this is debatable.

The other is: should this functionality be in a crate distributed with rustc? (As opposed to crates.io.) The unicode crate perhaps should be renamed rustc_unicode, if only to clarify what we’re talking about. It serves two purpose (with some overlap): provide functionality used in the compile itself, and provide functionality to be re-exported in std.

As @kwantam said, removing them from std (in favor of crates.io) has been discussed before.

But there is some compiler usage where this PR simply adds #[allow(deprecated)]. I think there should be a better path forward for the compiler, IMO “deprecated” means we intend to remove this at some point. Perhaps the required functionality could stay non-depracated in rustc_unicode (where using it directly would require #![feature(rustc_private)] and only the std/collections wrappers would be deprecated.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Apr 15, 2015

One is: should these three new unicode-* crates on crates.io be a single unicode crate on crates.io? I personally tend to prefer multiple smaller single-purpose crates over fewer larger crates containing multiple loosely-related features, but this is debatable.

(We could theoretically offer a unicode crate that just reexports functionality from a variety of unicode-* crates, to have our cake and eat it too.)

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 15, 2015

@SimonSapin I agree regarding renaming libunicode. I was going to suggest libcore_unicode: the non-deprecated functionality really is being used in libcore, libcollections, and libstd, so calling it librustc_unicode might be suggestive of a narrower set of uses than is actually the case.

Regarding the use of width() in librustc_driver and libsyntax: there are a couple spots where this is done for pretty printing. I argued in #24402 that these can just be removed, because they will almost never impact correctness of the pretty printed output. We can also just leave the width tables and functionality behind a feature gate if there's objection to removing them.

If there's general consensus on a rename, I can rename libunicode to whatever name we decide (libcore_unicode and librustc_unicode have been suggested so far), and then leave behind a dummy libunicode that re-exports everything from lib{core,rustc}_unicode with a #deprecated tag to push people toward fixing their code.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Apr 15, 2015

I don’t have an opinion on the new name, rustc_unicode was just a straw man proposal.

I argued in #24402 that these can just be removed, because they will almost never impact correctness of the pretty printed output.

If that’s an option, I’d rather have it in this PR than #[allow(deprecated)] outside of tests.

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 15, 2015

If that’s an option, I’d rather have it in this PR than #[allow(deprecated)] outside of tests.

👍 if everyone else is onboard.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 15, 2015

I'm ok not handling graphemes and friends in width calculations for the compiler, hardwiring to 1 seems like it's definitely fine for now.

I don't have a super strong opinion on one crate vs many crates, but I might err on the side of small crates for now as it pushes back on the idea of a "dumping ground" for unicode-related functionality and as @huonw suggested we can always have our own facade crate if necessary.

I would also be fine renaming libunicode in-tree, and it's also probably fine to not have much of a deprecation strategy as it looks like very few crates are still using it and it's unstable to start out with. I would recommend rustc_unicode to align with rustc_bitflags.

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 15, 2015

@alexcrichton would you prefer I leave the feature name the same (unicode) or update it to match the name of the crate?

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 15, 2015

I think leaving the same feature name is fine, e.g. "unicode support in general"

@kwantam kwantam force-pushed the kwantam:deprecate_unicode_fns branch from 516795d to 9952769 Apr 15, 2015

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 15, 2015

👍 PR updated as discussed.

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Apr 15, 2015

I'm on board with the proposed changes.

@kwantam kwantam force-pushed the kwantam:deprecate_unicode_fns branch 2 times, most recently from d1341f9 to 503533c Apr 15, 2015

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⌛️ Testing commit 29d1252 with merge b0e8298...

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⛄️ The build was interrupted to prioritize another pull request.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⌛️ Testing commit 29d1252 with merge 7616f19...

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⛄️ The build was interrupted to prioritize another pull request.

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⌛️ Testing commit 29d1252 with merge b7661c9...

bors added a commit that referenced this pull request Apr 17, 2015

Auto merge of #24428 - kwantam:deprecate_unicode_fns, r=alexcrichton
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⛄️ The build was interrupted to prioritize another pull request.

bors added a commit that referenced this pull request Apr 17, 2015

Auto merge of #24428 - kwantam:deprecate_unicode_fns, r=alexcrichton
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

⌛️ Testing commit 29d1252 with merge c9069fe...

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 17, 2015

💔 Test failed - auto-mac-64-nopt-t

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 17, 2015

Not sure how to proceed here. Don't have a mac on which to try and reproduce, and I find it slightly hard to believe that a segfault building libserialize has anything to do with this PR.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 17, 2015

@bors: retry

Ah I believe this was spurious

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 17, 2015

Thanks!

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Apr 18, 2015

⌛️ Testing commit 29d1252 with merge 7a5754b...

bors added a commit that referenced this pull request Apr 18, 2015

Auto merge of #24428 - kwantam:deprecate_unicode_fns, r=alexcrichton
This patch
1. renames libunicode to librustc_unicode,
2. deprecates several pieces of libunicode (see below), and
3. removes references to deprecated functions from
   librustc_driver and libsyntax. This may change pretty-printed
   output from these modules in cases involving wide or combining
   characters used in filenames, identifiers, etc.

The following functions are marked deprecated:

1. char.width() and str.width():
   --> use unicode-width crate

2. str.graphemes() and str.grapheme_indices():
   --> use unicode-segmentation crate

3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(),
   char.compose(), char.decompose_canonical(), char.decompose_compatible(),
   char.canonical_combining_class():
   --> use unicode-normalization crate

@bors bors merged commit 29d1252 into rust-lang:master Apr 18, 2015

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@kwantam kwantam deleted the kwantam:deprecate_unicode_fns branch Apr 18, 2015

@P1start

This comment has been minimized.

Copy link
Contributor

P1start commented Apr 19, 2015

I suppose it’s a bit late now, but shouldn’t deprecating/removing an entire crate + a bunch of commonly-used methods require an RFC, even if it’s marked as unstable? I know there’s been a lot of discussion about this elsewhere, but I thought that any decent-sized change should still go through the full RFC process instead of informally reaching a decision through discussion all about GitHub.

@kwantam

This comment has been minimized.

Copy link
Contributor Author

kwantam commented Apr 19, 2015

In fairness, at this point nothing's been removed from libunicode, only deprecated (well, and renamed). We can still put things through the RFC process if that's deemed necessary.

It's possible that we could leave the width-related functions behind a rustc_private feature gate to keep #8706 closed. It's not clear to me how critical that bug is.

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 19, 2015

@P1start unfortunately requiring an RFC for nearly any modification to libstd is probably infeasible, especially when it comes to unstable APIs. We've long thought that these APIs would move out of the standard library at some point as they've stuck out as not quite belonging for some time now, but we may not have communicated that clearly enough.

Right now we don't have a great story for external libraries in the rust-lang organization (and elsewhere) in terms of evolving their API, possibly coming back into libstd, etc. I would expect an RFC if these libraries are to be re-included, but for now I wouldn't expect an RFC to move unstable features out into external crates.

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Apr 19, 2015

Hm, I'm unclear why the compiler's use of these functions was removed: isn't the point of having the rustc_unicode crate to provide things that the compiler needs. Could you clarify? (i.e. yes, I think we should use the rustc_private feature gate.)

@alexcrichton

This comment has been minimized.

Copy link
Member

alexcrichton commented Apr 20, 2015

@huonw I would personally be somewhat uncomfortable having to maintain these tables in the standard library for private usage by the compiler when they don't really come up in practice that often, so to slate them for deletion the compiler was hardwired to char == 1-wide slot on the screen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.