Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more specific panic message for &str slicing errors #38066

Merged
merged 1 commit into from Jan 4, 2017

Conversation

@bluss
Copy link
Member

commented Nov 29, 2016

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

It reports the first error it finds in:

  1. begin out of bounds
  2. end out of bounds
  3. begin <= end violated
  4. begin not char boundary
  5. end not char boundary.

Example:

&"abcαβγ"[..4]

thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

Fixes #38052

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Nov 29, 2016

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss bluss force-pushed the bluss:string-slice-error branch 2 times, most recently from c29b730 to 46a26f2 Nov 29, 2016

@durka

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2016

Should we show all the problems instead of just the first one?

@durka
durka approved these changes Nov 29, 2016
let ellipsis = if truncated { "[...]" } else { "" };

// 1. out of bounds
let index = if !s.is_char_boundary(begin) { begin } else { end };

This comment has been minimized.

Copy link
@durka

durka Nov 29, 2016

Contributor

This is confusing to me. It seems like you choose which index is going to be in the message before checking which one is out of bounds.

This comment has been minimized.

Copy link
@bluss

bluss Nov 29, 2016

Author Member

is_char_boundary is false if the index is out of bounds.

This comment has been minimized.

Copy link
@bluss

bluss Nov 29, 2016

Author Member

ok, it should be confusing, because it's not really right.

while !s.is_char_boundary(char_start) {
char_start -= 1;
}
let ch = s[char_start..].chars().next().unwrap();

This comment has been minimized.

Copy link
@durka

durka Nov 29, 2016

Contributor

This can panic if the string is empty or if char_start is at the end. I think I've convinced myself this can't happen, but at least a comment would help, or just to be defensive

if let Some(ch) = s[char_start..].chars().next() {
    let char_range = char_start .. char_start + ch.len_utf8();
    panic!("index {} is not a char boundary; it is inside {:?} (bytes {:?}) of `{}`{}",
           index, ch, char_range, s_trunc, ellipsis);
} else {
    panic!("index {} is not a char boundary in `{}`{}", index, s_trunc, ellipsis);
}

This comment has been minimized.

Copy link
@bluss

bluss Nov 29, 2016

Author Member

char_start is a character boundary, and then using it for slicing is ok. It could not be larger than len, that case has already been handled. It could not be equal to len, then it would not be an error in the first place.

while !s.is_char_boundary(char_start) {
char_start -= 1;
}
let ch = s[char_start..].chars().next().unwrap();

This comment has been minimized.

Copy link
@durka

durka Nov 29, 2016

Contributor

Also, if the problematic index is beyond MAX_DISPLAY_LENGTH, the preview isn't going to be very useful. It could instead show the region of the string from max(0, index - MAX_DISPLAY_LENGTH/2) .. min(s.len(), index + MAX_DISPLAY_LENGTH/2) with ellipses.

This comment has been minimized.

Copy link
@bluss

bluss Nov 29, 2016

Author Member

I feel like this is unnecessary, it just wants to identify the string somehow.

@killercup
Copy link
Member

left a comment

Nice work! Handling the different cases with custom messages makes this much better.

As mentioned in #38052 I'd like to try and convey that indexing a string (slice) is done using byte offsets -- and not character offsets. I think this is one of those messages people will see this in failing tests and then have a hard time recognizing why this went wrong; and then have an even harder time to find a solution like the unicode-segmentation crate. UTF-8 is hard when people don't think about it.

I know putting all that in the panic message is not a solution; but maybe you can change it to "slicing at byte 4 which is not a char boundary" or something like that?

(By the way, is there a lint to catch statically known wrong string slices?)

// 1. out of bounds
if begin > s.len() || end > s.len() {
let oob_index = if begin > s.len() { begin } else { end };
panic!("index {} is out of bounds of `{}`{}", oob_index, s_trunc, ellipsis);

This comment has been minimized.

Copy link
@killercup

killercup Nov 29, 2016

Member

Can you add a test for this as well?

This comment has been minimized.

Copy link
@bluss

bluss Nov 29, 2016

Author Member

I'll change the existing test to match more of the panic message

@bluss bluss force-pushed the bluss:string-slice-error branch from 9bc3cd4 to 95e8c41 Nov 29, 2016

@durka

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2016

@killercup maybe just s/index/byte index/ in the panic messages.

@bluss

This comment has been minimized.

Copy link
Member Author

commented Nov 29, 2016

I like byte index or something like that. I've gotten into the habit of saying byte offset too, but don't know which is best. char_indices says indices, and other docs say .. what? 😄

Use more specific panic message for &str slicing errors
Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

Example:

    &"abcαβγ"[..4]

    thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
    a char boundary; it is inside `α` (bytes 3..5) of `abcαβγ`'

@bluss bluss force-pushed the bluss:string-slice-error branch from 9c659ea to d83fff3 Nov 30, 2016

@steveklabnik

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

ping @sfackler ; you're the reviewer on this PR but haven't commented. Should we move it to someone else?

@bluss

This comment has been minimized.

Copy link
Member Author

commented Jan 3, 2017

Oh sorry, I also forgot about this. Thank you steveklabnik!

@sfackler

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

Oops, missed this!

I'm a bit worried about including the entire string in the panic message. I can see that getting pretty incomprehensible for larger strings. Maybe just 20 or 30 characters of context around the index?

@bluss

This comment has been minimized.

Copy link
Member Author

commented Jan 3, 2017

This PR keeps the previous behaviour that the string piece shown is truncated to 256 bytes or less. That was just some arbitrary limit that I picked in a previous that PR introduced some truncation at all.

@sfackler

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

Oh great, my mistake!

@sfackler

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

📌 Commit d83fff3 has been approved by sfackler

@bors

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

⌛️ Testing commit d83fff3 with merge 9954355...

bors added a commit that referenced this pull request Jan 3, 2017
Auto merge of #38066 - bluss:string-slice-error, r=sfackler
Use more specific panic message for &str slicing errors

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

It reports the first error it finds in:

1. begin out of bounds
2. end out of bounds
3. begin <= end violated
3. begin not char boundary
5. end not char boundary.

Example:

    &"abcαβγ"[..4]

    thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
    a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

Fixes #38052
@bors

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

💔 Test failed - status-travis

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Jan 3, 2017

@bors

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2017

⌛️ Testing commit d83fff3 with merge 4682271...

bors added a commit that referenced this pull request Jan 3, 2017
Auto merge of #38066 - bluss:string-slice-error, r=sfackler
Use more specific panic message for &str slicing errors

Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.

It reports the first error it finds in:

1. begin out of bounds
2. end out of bounds
3. begin <= end violated
3. begin not char boundary
5. end not char boundary.

Example:

    &"abcαβγ"[..4]

    thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
    a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'

Fixes #38052
@bors

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: sfackler
Pushing 4682271 to master...

@bors bors merged commit d83fff3 into rust-lang:master Jan 4, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Mar 20, 2017
jperkin
Update lang/rust to 1.16.0. Changes since 1.15.1:
Version 1.16.0 (2017-03-16)
===========================

Language
--------

* Lifetimes in statics and consts default to `'static`. [RFC 1623]
* [The compiler's `dead_code` lint now accounts for type aliases][38051].
* [Uninhabitable enums (those without any variants) no longer permit wildcard
  match patterns][38069]
* [Clean up semantics of `self` in an import list][38313]
* [`Self` may appear in `impl` headers][38920]
* [`Self` may appear in struct expressions][39282]

Compiler
--------

* [`rustc` now supports `--emit=metadata`, which causes rustc to emit
  a `.rmeta` file containing only crate metadata][38571]. This can be
  used by tools like the Rust Language Service to perform
  metadata-only builds.
* [Levenshtein based typo suggestions now work in most places, while
  previously they worked only for fields and sometimes for local
  variables][38927]. Together with the overhaul of "no
  resolution"/"unexpected resolution" errors (#[38154]) they result in
  large and systematic improvement in resolution diagnostics.
* [Fix `transmute::<T, U>` where `T` requires a bigger alignment than
  `U`][38670]
* [rustc: use -Xlinker when specifying an rpath with ',' in it][38798]
* [`rustc` no longer attempts to provide "consider using an explicit
  lifetime" suggestions][37057]. They were inaccurate.

Stabilized APIs
---------------

* [`VecDeque::truncate`]
* [`VecDeque::resize`]
* [`String::insert_str`]
* [`Duration::checked_add`]
* [`Duration::checked_sub`]
* [`Duration::checked_div`]
* [`Duration::checked_mul`]
* [`str::replacen`]
* [`str::repeat`]
* [`SocketAddr::is_ipv4`]
* [`SocketAddr::is_ipv6`]
* [`IpAddr::is_ipv4`]
* [`IpAddr::is_ipv6`]
* [`Vec::dedup_by`]
* [`Vec::dedup_by_key`]
* [`Result::unwrap_or_default`]
* [`<*const T>::wrapping_offset`]
* [`<*mut T>::wrapping_offset`]
* `CommandExt::creation_flags`
* [`File::set_permissions`]
* [`String::split_off`]

Libraries
---------

* [`[T]::binary_search` and `[T]::binary_search_by_key` now take
  their argument by `Borrow` parameter][37761]
* [All public types in std implement `Debug`][38006]
* [`IpAddr` implements `From<Ipv4Addr>` and `From<Ipv6Addr>`][38327]
* [`Ipv6Addr` implements `From<[u16; 8]>`][38131]
* [Ctrl-Z returns from `Stdin.read()` when reading from the console on
  Windows][38274]
* [std: Fix partial writes in `LineWriter`][38062]
* [std: Clamp max read/write sizes on Unix][38062]
* [Use more specific panic message for `&str` slicing errors][38066]
* [`TcpListener::set_only_v6` is deprecated][38304]. This
  functionality cannot be achieved in std currently.
* [`writeln!`, like `println!`, now accepts a form with no string
  or formatting arguments, to just print a newline][38469]
* [Implement `iter::Sum` and `iter::Product` for `Result`][38580]
* [Reduce the size of static data in `std_unicode::tables`][38781]
* [`char::EscapeDebug`, `EscapeDefault`, `EscapeUnicode`,
  `CaseMappingIter`, `ToLowercase`, `ToUppercase`, implement
  `Display`][38909]
* [`Duration` implements `Sum`][38712]
* [`String` implements `ToSocketAddrs`][39048]

Cargo
-----

* [The `cargo check` command does a type check of a project without
  building it][cargo/3296]
* [crates.io will display CI badges from Travis and AppVeyor, if
  specified in Cargo.toml][cargo/3546]
* [crates.io will display categories listed in Cargo.toml][cargo/3301]
* [Compilation profiles accept integer values for `debug`, in addition
  to `true` and `false`. These are passed to `rustc` as the value to
  `-C debuginfo`][cargo/3534]
* [Implement `cargo --version --verbose`][cargo/3604]
* [All builds now output 'dep-info' build dependencies compatible with
  make and ninja][cargo/3557]
* [Build all workspace members with `build --all`][cargo/3511]
* [Document all workspace members with `doc --all`][cargo/3515]
* [Path deps outside workspace are not members][cargo/3443]

Misc
----

* [`rustdoc` has a `--sysroot` argument that, like `rustc`, specifies
  the path to the Rust implementation][38589]
* [The `armv7-linux-androideabi` target no longer enables NEON
  extensions, per Google's ABI guide][38413]
* [The stock standard library can be compiled for Redox OS][38401]
* [Rust has initial SPARC support][38726]. Tier 3. No builds
  available.
* [Rust has experimental support for Nvidia PTX][38559]. Tier 3. No
  builds available.
* [Fix backtraces on i686-pc-windows-gnu by disabling FPO][39379]

Compatibility Notes
-------------------

* [Uninhabitable enums (those without any variants) no longer permit wildcard
  match patterns][38069]
* In this release, references to uninhabited types can not be
  pattern-matched. This was accidentally allowed in 1.15.
* [The compiler's `dead_code` lint now accounts for type aliases][38051].
* [Ctrl-Z returns from `Stdin.read()` when reading from the console on
  Windows][38274]
* [Clean up semantics of `self` in an import list][38313]

[37057]: rust-lang/rust#37057
[37761]: rust-lang/rust#37761
[38006]: rust-lang/rust#38006
[38051]: rust-lang/rust#38051
[38062]: rust-lang/rust#38062
[38062]: rust-lang/rust#38622
[38066]: rust-lang/rust#38066
[38069]: rust-lang/rust#38069
[38131]: rust-lang/rust#38131
[38154]: rust-lang/rust#38154
[38274]: rust-lang/rust#38274
[38304]: rust-lang/rust#38304
[38313]: rust-lang/rust#38313
[38314]: rust-lang/rust#38314
[38327]: rust-lang/rust#38327
[38401]: rust-lang/rust#38401
[38413]: rust-lang/rust#38413
[38469]: rust-lang/rust#38469
[38559]: rust-lang/rust#38559
[38571]: rust-lang/rust#38571
[38580]: rust-lang/rust#38580
[38589]: rust-lang/rust#38589
[38670]: rust-lang/rust#38670
[38712]: rust-lang/rust#38712
[38726]: rust-lang/rust#38726
[38781]: rust-lang/rust#38781
[38798]: rust-lang/rust#38798
[38909]: rust-lang/rust#38909
[38920]: rust-lang/rust#38920
[38927]: rust-lang/rust#38927
[39048]: rust-lang/rust#39048
[39282]: rust-lang/rust#39282
[39379]: rust-lang/rust#39379
[`<*const T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset
[`<*mut T>::wrapping_offset`]: https://doc.rust-lang.org/std/primitive.pointer.html#method.wrapping_offset
[`Duration::checked_add`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_add
[`Duration::checked_div`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_div
[`Duration::checked_mul`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_mul
[`Duration::checked_sub`]: https://doc.rust-lang.org/std/time/struct.Duration.html#method.checked_sub
[`File::set_permissions`]: https://doc.rust-lang.org/std/fs/struct.File.html#method.set_permissions
[`IpAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv4
[`IpAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.IpAddr.html#method.is_ipv6
[`Result::unwrap_or_default`]: https://doc.rust-lang.org/std/result/enum.Result.html#method.unwrap_or_default
[`SocketAddr::is_ipv4`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv4
[`SocketAddr::is_ipv6`]: https://doc.rust-lang.org/std/net/enum.SocketAddr.html#method.is_ipv6
[`String::insert_str`]: https://doc.rust-lang.org/std/string/struct.String.html#method.insert_str
[`String::split_off`]: https://doc.rust-lang.org/std/string/struct.String.html#method.split_off
[`Vec::dedup_by_key`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by_key
[`Vec::dedup_by`]: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup_by
[`VecDeque::resize`]:  https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.resize
[`VecDeque::truncate`]: https://doc.rust-lang.org/std/collections/vec_deque/struct.VecDeque.html#method.truncate
[`str::repeat`]: https://doc.rust-lang.org/std/primitive.str.html#method.repeat
[`str::replacen`]: https://doc.rust-lang.org/std/primitive.str.html#method.replacen
[cargo/3296]: rust-lang/cargo#3296
[cargo/3301]: rust-lang/cargo#3301
[cargo/3443]: rust-lang/cargo#3443
[cargo/3511]: rust-lang/cargo#3511
[cargo/3515]: rust-lang/cargo#3515
[cargo/3534]: rust-lang/cargo#3534
[cargo/3546]: rust-lang/cargo#3546
[cargo/3557]: rust-lang/cargo#3557
[cargo/3604]: rust-lang/cargo#3604
[RFC 1623]: https://github.com/rust-lang/rfcs/blob/master/text/1623-static.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.