Make `str` a `struct str([u8])` #2692

varkor · 2019-04-23T13:04:22Z

This was previously attempted in rust-lang/rust#19612, but since rust-lang/rust#45225 some of the issues are no longer relevant, so it may be feasible to try this again.

See also Make bool an enum.

The text was updated successfully, but these errors were encountered:

SimonSapin · 2019-04-23T13:28:19Z

rust-lang/rust#19612 (comment)

Allows for inherent methods on str -- but note, it does not allow us to do away with extension traits because some of the operators must be defined in liballoc.

These days we do have inherent methods on str, both in libcore and in liballoc, through lang items that allow these impl blocks despite usual impl coherence rules. If str becomes a struct in libcore, it will still need at least a lang item for liballoc methods.

I have presented too much Rust fandom in the past. Time for some hate! (I still love Rust, but I need an archive for the things I am sad about it) (Title is inspired by http://phpsadness.com/) (Some of these are unpopular opinions, quite many people think differently) ## Standard library ### Primitives #### Primitives are not really primitive [Why is `bool` not an `enum Bool { False, True }`?](rust-lang/rfcs#348) [Why is `str` not a `struct Str([u8]);`?](rust-lang/rfcs#2692) #### Conversion between integers is not explicit `as` conversions do not explicitly state whether and what is lost. For example, `u8 as u16` is perfectly fine (expand size), but `u16 as u8` is a truncation, `u16 as i16` might overflow, `u16 as f32` is perfectly fine (lossless), `u32 as f32` is lossy, `usize as u32` and `u64 as usize` are lossy depending on the platform... On the other hand, while we can use `.try_into()` on the numbers, it is unclear what the impacts are, and handling the error is more challenging if panic is not desired. It is difficult to find (if it even exists) the *named* safe conversion method from the standard library documentation. To solve this problem, I published the [xias](https://docs.rs/xias) crate, which exposes a blanket impl on primitive types to facilitate explicitly explained conversions. ### Collections #### `.len()` and `.is_empty()` Clippy [recommends](https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty) that types implementing `.len()` should also implement `.is_empty()`. This much sounds like `.clone()` and `.clone_from()`, which should use a default trait impl. Why wasn't `.len()` a trait anyway? ## General syntax ### Operators #### The `!` operator ```rs if !matches!(value, Enum::Variant) { panic!("Do you think value matches Enum::Variant or not here, \ if you didn't know Rust?"); } ``` An alternative is to use `condition.not()`, but that requires `use std::ops::Not` everywhere. What about `condition == false`? [Thanks so much clippy.](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison) #### `.await` No, don't get me wrong, I think `future.await` is much better than `await future`. Not a fan of cargo cult. But `future.await` looks as if it is a field, especially if the syntax highlighter is not aware of it (e.g. in outdated syntax highlighting libraries). Why can't we make it *look like* a postfix macro, like `future.await!` or `future.await!()`? ### Pattern matching #### `let` has mixed use in assignment and conditions If you didn't know Rust, you're probably confused why we use `=` instead of `==` here ```rs if let Enum::Variant = value { // ... } ``` This is getting even more confusing with the let...else syntax ```rs let Ok(socket) = Socket::bind(addr) else { anyhow::bail!("Cannot bind socket to {addr}"); }; ``` Guess what, `expr else { ... }` itself is not a condition. You actually group it as `let pat = expr` `else` `{...}` instead. #### Ambiguous new bindings ```rs pub fn validate(input_byte: u8) -> u8 { const FORBIDDEN: u8 = 42; match input_byte { FORBIDDEN => panic!("input byte cannot be {FORBIDDEN}"), value => return value, } } fn main() { dbg!(validate(41)); dbg!(validate(42)); } ``` ``` [src/main.rs:11] validate(41) = 41 thread 'main' panicked at 'input byte cannot be 42', src/main.rs:5:22 ``` `FORBIDDEN` is a constant but `value` is a new binding? how is this even supposed to work? #### Use of `|` in patterns ```rs fn convert_bitmask(input: u8) -> Option<bool> { match x { Some(1 | 2 | 4) => Some(true), Some(..8) => Some(false), _ => None, } } ``` Didn't learn Rust before? Guess what the above means: - "returns true if x is 1 or 2 or 4" - "returns true if x is 7 (1 bitwise-OR 2 bitwise-OR 4)" #### `_` and `_var` mean different things You may have noticed that you can suppress the unused variable warning with both `let _ =` and `let _var =`. Golang prefers the former (the latter doesn't work in Go), but they actually mean different things in Rust. `_var` is just a normal binding name (local variable name in the case of `let _var =`), similar to `var`, except it suppresses some warnings about unused identifiers. `_` is the pattern that matches everything and binds the value to nothing. Since there is no binding, the value is dropped immediately. This causes a significant difference when it comes to RAII: ```rs { let _guard = mutex.lock(); println!("mutex is still locked here"); } println!("mutex is unlocked here because `_guard` is dropped"); ``` ```rs { let _ = mutex.lock(); println!("mutex is already unlocked here because `_` drops immediately"); } ``` Fortunately, this is usually not a problem, because it is rare to acquire an unused RAII object (although it is still used in some rare occasions). ## Testing ### Difficulty to unit test procedural macros Procedural macros are typically tested in the form of integration tests, where the user writes example code to use the procedural macro and check if it compiles correctly. However, this means the whole program is tested together, and breaking any single part would cause all tests to fail. This is particularly troublesome when errors caused by macro output are hard to debug. It doesn't make sense to unit test particular functions that generate `TokenStream` output, because that would imply writing test cases that repeat every token in the generated output, and modifying or even reformatting (e.g. adding trailing commas) any part of the output would require updating all unit tests. We want to unit test the logic to ensure that each unit contributes what it is expected to contribute, not to unit test the exact shape of the generated code. Another option is to have `#[cfg(test)] #[proc_macro*] pub fn` macros that expose the internal functions, but this suffers from two problems. Firstly, this involves additional code to parse an additional TokenStream to serve as the input arguments, and sometimes inapplicable when the macro generates a special, incomplete part of the syntax tree like match arms, struct fields, trait bounds, other non-item non-statement ASTs, or a combination of multiple ASTs (e.g. multiple non-exhaustive match arms). Secondly, this only works when the integration test is located in the same package as the procedural macro. This is often not the case when the procedural macro references some code in another closely coupled crate, e.g. `derive@serde_derive::Deserialize` depends on `trait@serde::Deserialize`. This can be solved by setting a dependent as a dev-dependency, but such a solution would not be scalable. ### Inflexible panic tests Unit tests currently allow `#[should_panic]` to assert that a unit test should panic. `#[should_panic(expected = "substring")]` additionally asserts that the panic message must contain `substring`. This is not ideal for two reasons. Firstly, substrings may not be safe as some parts of the panic message remain untested. Secondly, the panic message is an implementation detail of the panic but not the subject to be tested. The goal is to test that a certain code panics for a known reason, not to test what it shows to the user. For example, `panic!("Cannot insert entry into string map because {:?} is not one of String or &'static str", ty)`, where `ty: std::any::TypeId`, cannot be tested properly, because we cannot make sure that both `insert entry into the string map` and `is not one of String or &'static str` appear in the panic message since `TypeId as fmt::Debug` has inconsistent output (which is perfectly fine because it should not be relied on). This also means we cannot test which `TypeId` the panic involves (although a crate like this should probably attempt to take the `any::type_name` for slightly more consistent output). Coauthored-by: xqwtxon <xqwtxon@hotmail.com>

This wiki records some of my unpopular opinions. Since this wiki is expected to be somewhat unorganized, there is no index page. See the sidebar if you want to continue browsing. I have presented too much Rust fandom in the past. Time for some hate! (I still love Rust, but I need an archive for the things I am sad about it) (Title is inspired by http://phpsadness.com/) (Some of these are unpopular opinions, quite many people think differently) ## Standard library ### Primitives #### Primitives are not really primitive [Why is `bool` not an `enum Bool { False, True }`?](rust-lang/rfcs#348) [Why is `str` not a `struct Str([u8]);`?](rust-lang/rfcs#2692) #### Conversion between integers is not explicit `as` conversions do not explicitly state whether and what is lost. For example, `u8 as u16` is perfectly fine (expand size), but `u16 as u8` is a truncation, `u16 as i16` might overflow, `u16 as f32` is perfectly fine (lossless), `u32 as f32` is lossy, `usize as u32` and `u64 as usize` are lossy depending on the platform... On the other hand, while we can use `.try_into()` on the numbers, it is unclear what the impacts are, and handling the error is more challenging if panic is not desired. It is difficult to find (if it even exists) the *named* safe conversion method from the standard library documentation. To solve this problem, I published the [xias](https://docs.rs/xias) crate, which exposes a blanket impl on primitive types to facilitate explicitly explained conversions. ### Collections #### `.len()` and `.is_empty()` Clippy [recommends](https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty) that types implementing `.len()` should also implement `.is_empty()`. This much sounds like `.clone()` and `.clone_from()`, which should use a default trait impl. Why wasn't `.len()` a trait anyway? ## General syntax ### Operators #### The `!` operator ```rs if !matches!(value, Enum::Variant) { panic!("Do you think value matches Enum::Variant or not here, \ if you didn't know Rust?"); } ``` An alternative is to use `condition.not()`, but that requires `use std::ops::Not` everywhere. What about `condition == false`? [Thanks so much clippy.](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison) #### `.await` No, don't get me wrong, I think `future.await` is much better than `await future`. Not a fan of cargo cult. But `future.await` looks as if it is a field, especially if the syntax highlighter is not aware of it (e.g. in outdated syntax highlighting libraries). Why can't we make it *look like* a postfix macro, like `future.await!` or `future.await!()`? ### Pattern matching #### `let` has mixed use in assignment and conditions If you didn't know Rust, you're probably confused why we use `=` instead of `==` here ```rs if let Enum::Variant = value { // ... } ``` This is getting even more confusing with the let...else syntax ```rs let Ok(socket) = Socket::bind(addr) else { anyhow::bail!("Cannot bind socket to {addr}"); }; ``` Guess what, `expr else { ... }` itself is not a condition. You actually group it as `let pat = expr` `else` `{...}` instead. #### Ambiguous new bindings ```rs pub fn validate(input_byte: u8) -> u8 { const FORBIDDEN: u8 = 42; match input_byte { FORBIDDEN => panic!("input byte cannot be {FORBIDDEN}"), value => return value, } } fn main() { dbg!(validate(41)); dbg!(validate(42)); } ``` ``` [src/main.rs:11] validate(41) = 41 thread 'main' panicked at 'input byte cannot be 42', src/main.rs:5:22 ``` `FORBIDDEN` is a constant but `value` is a new binding? how is this even supposed to work? #### Use of `|` in patterns ```rs fn convert_bitmask(input: u8) -> Option<bool> { match x { Some(1 | 2 | 4) => Some(true), Some(..8) => Some(false), _ => None, } } ``` Didn't learn Rust before? Guess what the above means: - "returns true if x is 1 or 2 or 4" - "returns true if x is 7 (1 bitwise-OR 2 bitwise-OR 4)" #### `_` and `_var` mean different things You may have noticed that you can suppress the unused variable warning with both `let _ =` and `let _var =`. Golang prefers the former (the latter doesn't work in Go), but they actually mean different things in Rust. `_var` is just a normal binding name (local variable name in the case of `let _var =`), similar to `var`, except it suppresses some warnings about unused identifiers. `_` is the pattern that matches everything and binds the value to nothing. Since there is no binding, the value is dropped immediately. This causes a significant difference when it comes to RAII: ```rs { let _guard = mutex.lock(); println!("mutex is still locked here"); } println!("mutex is unlocked here because `_guard` is dropped"); ``` ```rs { let _ = mutex.lock(); println!("mutex is already unlocked here because `_` drops immediately"); } ``` Fortunately, this is usually not a problem, because it is rare to acquire an unused RAII object (although it is still used in some rare occasions). ## Testing ### Difficulty to unit test procedural macros Procedural macros are typically tested in the form of integration tests, where the user writes example code to use the procedural macro and check if it compiles correctly. However, this means the whole program is tested together, and breaking any single part would cause all tests to fail. This is particularly troublesome when errors caused by macro output are hard to debug. It doesn't make sense to unit test particular functions that generate `TokenStream` output, because that would imply writing test cases that repeat every token in the generated output, and modifying or even reformatting (e.g. adding trailing commas) any part of the output would require updating all unit tests. We want to unit test the logic to ensure that each unit contributes what it is expected to contribute, not to unit test the exact shape of the generated code. Another option is to have `#[cfg(test)] #[proc_macro*] pub fn` macros that expose the internal functions, but this suffers from two problems. Firstly, this involves additional code to parse an additional TokenStream to serve as the input arguments, and sometimes inapplicable when the macro generates a special, incomplete part of the syntax tree like match arms, struct fields, trait bounds, other non-item non-statement ASTs, or a combination of multiple ASTs (e.g. multiple non-exhaustive match arms). Secondly, this only works when the integration test is located in the same package as the procedural macro. This is often not the case when the procedural macro references some code in another closely coupled crate, e.g. `derive@serde_derive::Deserialize` depends on `trait@serde::Deserialize`. This can be solved by setting a dependent as a dev-dependency, but such a solution would not be scalable. ### Inflexible panic tests Unit tests currently allow `#[should_panic]` to assert that a unit test should panic. `#[should_panic(expected = "substring")]` additionally asserts that the panic message must contain `substring`. This is not ideal for two reasons. Firstly, substrings may not be safe as some parts of the panic message remain untested. Secondly, the panic message is an implementation detail of the panic but not the subject to be tested. The goal is to test that a certain code panics for a known reason, not to test what it shows to the user. For example, `panic!("Cannot insert entry into string map because {:?} is not one of String or &'static str", ty)`, where `ty: std::any::TypeId`, cannot be tested properly, because we cannot make sure that both `insert entry into the string map` and `is not one of String or &'static str` appear in the panic message since `TypeId as fmt::Debug` has inconsistent output (which is perfectly fine because it should not be relied on). This also means we cannot test which `TypeId` the panic involves (although a crate like this should probably attempt to take the `any::type_name` for slightly more consistent output). Co-authored-by: xqwtxon <xqwtxon@hotmail.com>

jonas-schievink added A-primitive Primitive types related proposals & ideas A-string Proposals relating to strings. T-lang Relevant to the language team, which will review and decide on the RFC. labels Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `str` a `struct str([u8])` #2692

Make `str` a `struct str([u8])` #2692

varkor commented Apr 23, 2019

SimonSapin commented Apr 23, 2019 •

edited

Loading

Make str a struct str([u8]) #2692

Make str a struct str([u8]) #2692

Comments

varkor commented Apr 23, 2019

SimonSapin commented Apr 23, 2019 • edited Loading

Make `str` a `struct str([u8])` #2692

Make `str` a `struct str([u8])` #2692

SimonSapin commented Apr 23, 2019 •

edited

Loading