Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make str a struct str([u8]) #2692

Open
varkor opened this issue Apr 23, 2019 · 1 comment
Open

Make str a struct str([u8]) #2692

varkor opened this issue Apr 23, 2019 · 1 comment
Labels
A-primitive Primitive types related proposals & ideas A-string Proposals relating to strings. T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@varkor
Copy link
Member

varkor commented Apr 23, 2019

This was previously attempted in rust-lang/rust#19612, but since rust-lang/rust#45225 some of the issues are no longer relevant, so it may be feasible to try this again.

See also Make bool an enum.

@SimonSapin
Copy link
Contributor

SimonSapin commented Apr 23, 2019

rust-lang/rust#19612 (comment)

Allows for inherent methods on str -- but note, it does not allow us to do away with extension traits because some of the operators must be defined in liballoc.

These days we do have inherent methods on str, both in libcore and in liballoc, through lang items that allow these impl blocks despite usual impl coherence rules. If str becomes a struct in libcore, it will still need at least a lang item for liballoc methods.

@jonas-schievink jonas-schievink added A-primitive Primitive types related proposals & ideas A-string Proposals relating to strings. T-lang Relevant to the language team, which will review and decide on the RFC. labels Nov 19, 2019
Reinfy added a commit to Reinfy/Pair-Extraordinaire that referenced this issue Aug 8, 2022
I have presented too much Rust fandom in the past.
Time for some hate!

(I still love Rust, but I need an archive for the things I am sad about it)

(Title is inspired by http://phpsadness.com/)

(Some of these are unpopular opinions, quite many people think differently)

## Standard library
### Primitives
#### Primitives are not really primitive
[Why is `bool` not an `enum Bool { False, True }`?](rust-lang/rfcs#348)
[Why is `str` not a `struct Str([u8]);`?](rust-lang/rfcs#2692)

#### Conversion between integers is not explicit
`as` conversions do not explicitly state whether and what is lost.
For example, `u8 as u16` is perfectly fine (expand size),
but `u16 as u8` is a truncation,
`u16 as i16` might overflow,
`u16 as f32` is perfectly fine (lossless),
`u32 as f32` is lossy,
`usize as u32` and `u64 as usize` are lossy depending on the platform...

On the other hand, while we can use `.try_into()` on the numbers,
it is unclear what the impacts are,
and handling the error is more challenging if panic is not desired.
It is difficult to find (if it even exists) the *named* safe conversion method
from the standard library documentation.

To solve this problem, I published the [xias](https://docs.rs/xias) crate,
which exposes a blanket impl on primitive types to facilitate explicitly explained conversions.

### Collections
#### `.len()` and `.is_empty()`
Clippy [recommends](https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty)
that types implementing `.len()` should also implement `.is_empty()`.
This much sounds like `.clone()` and `.clone_from()`, which should use a default trait impl.
Why wasn't `.len()` a trait anyway?

## General syntax
### Operators
#### The `!` operator
```rs
if !matches!(value, Enum::Variant) {
    panic!("Do you think value matches Enum::Variant or not here, \
        if you didn't know Rust?");
}
```

An alternative is to use `condition.not()`, but that requires `use std::ops::Not` everywhere.

What about `condition == false`? [Thanks so much clippy.](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison)

#### `.await`
No, don't get me wrong, I think `future.await` is much better than `await future`.
Not a fan of cargo cult.

But `future.await` looks as if it is a field,
especially if the syntax highlighter is not aware of it
(e.g. in outdated syntax highlighting libraries).
Why can't we make it *look like* a postfix macro,
like `future.await!` or `future.await!()`?

### Pattern matching
#### `let` has mixed use in assignment and conditions
If you didn't know Rust, you're probably confused why we use `=` instead of `==` here

```rs
if let Enum::Variant = value {
    // ...
}
```

This is getting even more confusing with the let...else syntax

```rs
let Ok(socket) = Socket::bind(addr) else {
    anyhow::bail!("Cannot bind socket to {addr}");
};
```

Guess what, `expr else { ... }` itself is not a condition. You actually group it as `let pat = expr` `else` `{...}` instead.

#### Ambiguous new bindings
```rs
pub fn validate(input_byte: u8) -> u8 {
    const FORBIDDEN: u8 = 42;
    
    match input_byte {
        FORBIDDEN => panic!("input byte cannot be {FORBIDDEN}"),
        value => return value,
    }
}

fn main() {
    dbg!(validate(41));
    dbg!(validate(42));
}
```

```
[src/main.rs:11] validate(41) = 41
thread 'main' panicked at 'input byte cannot be 42', src/main.rs:5:22
```

`FORBIDDEN` is a constant but `value` is a new binding? how is this even supposed to work?

#### Use of `|` in patterns
```rs
fn convert_bitmask(input: u8) -> Option<bool> {
    match x {
        Some(1 | 2 | 4) => Some(true),
        Some(..8) => Some(false),
        _ => None,
    }
}
```

Didn't learn Rust before? Guess what the above means:

- "returns true if x is 1 or 2 or 4"
- "returns true if x is 7 (1 bitwise-OR 2 bitwise-OR 4)"

#### `_` and `_var` mean different things
You may have noticed that you can suppress the unused variable warning with both `let _ =` and `let _var =`.
Golang prefers the former (the latter doesn't work in Go), but they actually mean different things in Rust.

`_var` is just a normal binding name (local variable name in the case of `let _var =`),
similar to `var`, except it suppresses some warnings about unused identifiers.

`_` is the pattern that matches everything and binds the value to nothing.
Since there is no binding, the value is dropped immediately.

This causes a significant difference when it comes to RAII:

```rs
{
    let _guard = mutex.lock();
    println!("mutex is still locked here");
}
println!("mutex is unlocked here because `_guard` is dropped");
```

```rs
{
    let _ = mutex.lock();
    println!("mutex is already unlocked here because `_` drops immediately");
}
```

Fortunately, this is usually not a problem, because it is rare to acquire an unused RAII object
(although it is still used in some rare occasions).

## Testing
### Difficulty to unit test procedural macros
Procedural macros are typically tested in the form of integration tests,
where the user writes example code to use the procedural macro
and check if it compiles correctly.
However, this means the whole program is tested together,
and breaking any single part would cause all tests to fail.
This is particularly troublesome when errors caused by macro output are hard to debug.

It doesn't make sense to unit test particular functions that generate `TokenStream` output,
because that would imply writing test cases that repeat every token in the generated output,
and modifying or even reformatting (e.g. adding trailing commas) any part of the output
would require updating all unit tests.
We want to unit test the logic to ensure that
each unit contributes what it is expected to contribute,
not to unit test the exact shape of the generated code.

Another option is to have `#[cfg(test)] #[proc_macro*] pub fn` macros that expose the internal functions,
but this suffers from two problems.
Firstly, this involves additional code to parse an additional TokenStream to serve as the input arguments,
and sometimes inapplicable when the macro generates a special, incomplete part of the syntax tree
like match arms, struct fields, trait bounds, other non-item non-statement ASTs,
or a combination of multiple ASTs (e.g. multiple non-exhaustive match arms).
Secondly, this only works when the integration test is located in the same package as the procedural macro.
This is often not the case when the procedural macro references some code in another closely coupled crate,
e.g. `derive@serde_derive::Deserialize` depends on `trait@serde::Deserialize`.
This can be solved by setting a dependent as a dev-dependency,
but such a solution would not be scalable.

### Inflexible panic tests
Unit tests currently allow `#[should_panic]` to assert that a unit test should panic.
`#[should_panic(expected = "substring")]` additionally asserts that the panic message must contain `substring`.
This is not ideal for two reasons.
Firstly, substrings may not be safe as some parts of the panic message remain untested.
Secondly, the panic message is an implementation detail of the panic but not the subject to be tested.
The goal is to test that a certain code panics for a known reason, not to test what it shows to the user.
For example, `panic!("Cannot insert entry into string map because {:?} is not one of String or &'static str", ty)`,
where `ty: std::any::TypeId`, cannot be tested properly,
because we cannot make sure that both `insert entry into the string map`
and `is not one of String or &'static str` appear in the panic message
since `TypeId as fmt::Debug` has inconsistent output
(which is perfectly fine because it should not be relied on).
This also means we cannot test which `TypeId` the panic involves
(although a crate like this should probably attempt to take the `any::type_name` for slightly more consistent output).

Coauthored-by: xqwtxon <xqwtxon@hotmail.com>
Reinfy added a commit to Reinfy/Pair-Extraordinaire that referenced this issue Aug 8, 2022
This wiki records some of my unpopular opinions.

Since this wiki is expected to be somewhat unorganized, there is no index page. See the sidebar if you want to continue browsing.

I have presented too much Rust fandom in the past.
Time for some hate!

(I still love Rust, but I need an archive for the things I am sad about it)

(Title is inspired by http://phpsadness.com/)

(Some of these are unpopular opinions, quite many people think differently)

## Standard library
### Primitives
#### Primitives are not really primitive
[Why is `bool` not an `enum Bool { False, True }`?](rust-lang/rfcs#348)
[Why is `str` not a `struct Str([u8]);`?](rust-lang/rfcs#2692)

#### Conversion between integers is not explicit
`as` conversions do not explicitly state whether and what is lost.
For example, `u8 as u16` is perfectly fine (expand size),
but `u16 as u8` is a truncation,
`u16 as i16` might overflow,
`u16 as f32` is perfectly fine (lossless),
`u32 as f32` is lossy,
`usize as u32` and `u64 as usize` are lossy depending on the platform...

On the other hand, while we can use `.try_into()` on the numbers,
it is unclear what the impacts are,
and handling the error is more challenging if panic is not desired.
It is difficult to find (if it even exists) the *named* safe conversion method
from the standard library documentation.

To solve this problem, I published the [xias](https://docs.rs/xias) crate,
which exposes a blanket impl on primitive types to facilitate explicitly explained conversions.

### Collections
#### `.len()` and `.is_empty()`
Clippy [recommends](https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty)
that types implementing `.len()` should also implement `.is_empty()`.
This much sounds like `.clone()` and `.clone_from()`, which should use a default trait impl.
Why wasn't `.len()` a trait anyway?

## General syntax
### Operators
#### The `!` operator
```rs
if !matches!(value, Enum::Variant) {
    panic!("Do you think value matches Enum::Variant or not here, \
        if you didn't know Rust?");
}
```

An alternative is to use `condition.not()`, but that requires `use std::ops::Not` everywhere.

What about `condition == false`? [Thanks so much clippy.](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison)

#### `.await`
No, don't get me wrong, I think `future.await` is much better than `await future`.
Not a fan of cargo cult.

But `future.await` looks as if it is a field,
especially if the syntax highlighter is not aware of it
(e.g. in outdated syntax highlighting libraries).
Why can't we make it *look like* a postfix macro,
like `future.await!` or `future.await!()`?

### Pattern matching
#### `let` has mixed use in assignment and conditions
If you didn't know Rust, you're probably confused why we use `=` instead of `==` here

```rs
if let Enum::Variant = value {
    // ...
}
```

This is getting even more confusing with the let...else syntax

```rs
let Ok(socket) = Socket::bind(addr) else {
    anyhow::bail!("Cannot bind socket to {addr}");
};
```

Guess what, `expr else { ... }` itself is not a condition. You actually group it as `let pat = expr` `else` `{...}` instead.

#### Ambiguous new bindings
```rs
pub fn validate(input_byte: u8) -> u8 {
    const FORBIDDEN: u8 = 42;
    
    match input_byte {
        FORBIDDEN => panic!("input byte cannot be {FORBIDDEN}"),
        value => return value,
    }
}

fn main() {
    dbg!(validate(41));
    dbg!(validate(42));
}
```

```
[src/main.rs:11] validate(41) = 41
thread 'main' panicked at 'input byte cannot be 42', src/main.rs:5:22
```

`FORBIDDEN` is a constant but `value` is a new binding? how is this even supposed to work?

#### Use of `|` in patterns
```rs
fn convert_bitmask(input: u8) -> Option<bool> {
    match x {
        Some(1 | 2 | 4) => Some(true),
        Some(..8) => Some(false),
        _ => None,
    }
}
```

Didn't learn Rust before? Guess what the above means:

- "returns true if x is 1 or 2 or 4"
- "returns true if x is 7 (1 bitwise-OR 2 bitwise-OR 4)"

#### `_` and `_var` mean different things
You may have noticed that you can suppress the unused variable warning with both `let _ =` and `let _var =`.
Golang prefers the former (the latter doesn't work in Go), but they actually mean different things in Rust.

`_var` is just a normal binding name (local variable name in the case of `let _var =`),
similar to `var`, except it suppresses some warnings about unused identifiers.

`_` is the pattern that matches everything and binds the value to nothing.
Since there is no binding, the value is dropped immediately.

This causes a significant difference when it comes to RAII:

```rs
{
    let _guard = mutex.lock();
    println!("mutex is still locked here");
}
println!("mutex is unlocked here because `_guard` is dropped");
```

```rs
{
    let _ = mutex.lock();
    println!("mutex is already unlocked here because `_` drops immediately");
}
```

Fortunately, this is usually not a problem, because it is rare to acquire an unused RAII object
(although it is still used in some rare occasions).

## Testing
### Difficulty to unit test procedural macros
Procedural macros are typically tested in the form of integration tests,
where the user writes example code to use the procedural macro
and check if it compiles correctly.
However, this means the whole program is tested together,
and breaking any single part would cause all tests to fail.
This is particularly troublesome when errors caused by macro output are hard to debug.

It doesn't make sense to unit test particular functions that generate `TokenStream` output,
because that would imply writing test cases that repeat every token in the generated output,
and modifying or even reformatting (e.g. adding trailing commas) any part of the output
would require updating all unit tests.
We want to unit test the logic to ensure that
each unit contributes what it is expected to contribute,
not to unit test the exact shape of the generated code.

Another option is to have `#[cfg(test)] #[proc_macro*] pub fn` macros that expose the internal functions,
but this suffers from two problems.
Firstly, this involves additional code to parse an additional TokenStream to serve as the input arguments,
and sometimes inapplicable when the macro generates a special, incomplete part of the syntax tree
like match arms, struct fields, trait bounds, other non-item non-statement ASTs,
or a combination of multiple ASTs (e.g. multiple non-exhaustive match arms).
Secondly, this only works when the integration test is located in the same package as the procedural macro.
This is often not the case when the procedural macro references some code in another closely coupled crate,
e.g. `derive@serde_derive::Deserialize` depends on `trait@serde::Deserialize`.
This can be solved by setting a dependent as a dev-dependency,
but such a solution would not be scalable.

### Inflexible panic tests
Unit tests currently allow `#[should_panic]` to assert that a unit test should panic.
`#[should_panic(expected = "substring")]` additionally asserts that the panic message must contain `substring`.
This is not ideal for two reasons.
Firstly, substrings may not be safe as some parts of the panic message remain untested.
Secondly, the panic message is an implementation detail of the panic but not the subject to be tested.
The goal is to test that a certain code panics for a known reason, not to test what it shows to the user.
For example, `panic!("Cannot insert entry into string map because {:?} is not one of String or &'static str", ty)`,
where `ty: std::any::TypeId`, cannot be tested properly,
because we cannot make sure that both `insert entry into the string map`
and `is not one of String or &'static str` appear in the panic message
since `TypeId as fmt::Debug` has inconsistent output
(which is perfectly fine because it should not be relied on).
This also means we cannot test which `TypeId` the panic involves
(although a crate like this should probably attempt to take the `any::type_name` for slightly more consistent output).

Co-authored-by: xqwtxon <xqwtxon@hotmail.com>
Reinfy added a commit to Reinfy/Pair-Extraordinaire that referenced this issue Aug 8, 2022
This wiki records some of my unpopular opinions.

Since this wiki is expected to be somewhat unorganized, there is no index page. See the sidebar if you want to continue browsing.
I have presented too much Rust fandom in the past.
Time for some hate!

(I still love Rust, but I need an archive for the things I am sad about it)

(Title is inspired by http://phpsadness.com/)

(Some of these are unpopular opinions, quite many people think differently)

## Standard library
### Primitives
#### Primitives are not really primitive
[Why is `bool` not an `enum Bool { False, True }`?](rust-lang/rfcs#348)
[Why is `str` not a `struct Str([u8]);`?](rust-lang/rfcs#2692)

#### Conversion between integers is not explicit
`as` conversions do not explicitly state whether and what is lost.
For example, `u8 as u16` is perfectly fine (expand size),
but `u16 as u8` is a truncation,
`u16 as i16` might overflow,
`u16 as f32` is perfectly fine (lossless),
`u32 as f32` is lossy,
`usize as u32` and `u64 as usize` are lossy depending on the platform...

On the other hand, while we can use `.try_into()` on the numbers,
it is unclear what the impacts are,
and handling the error is more challenging if panic is not desired.
It is difficult to find (if it even exists) the *named* safe conversion method
from the standard library documentation.

To solve this problem, I published the [xias](https://docs.rs/xias) crate,
which exposes a blanket impl on primitive types to facilitate explicitly explained conversions.

### Collections
#### `.len()` and `.is_empty()`
Clippy [recommends](https://rust-lang.github.io/rust-clippy/master/index.html#len_without_is_empty)
that types implementing `.len()` should also implement `.is_empty()`.
This much sounds like `.clone()` and `.clone_from()`, which should use a default trait impl.
Why wasn't `.len()` a trait anyway?

## General syntax
### Operators
#### The `!` operator
```rs
if !matches!(value, Enum::Variant) {
    panic!("Do you think value matches Enum::Variant or not here, \
        if you didn't know Rust?");
}
```

An alternative is to use `condition.not()`, but that requires `use std::ops::Not` everywhere.

What about `condition == false`? [Thanks so much clippy.](https://rust-lang.github.io/rust-clippy/master/index.html#bool_comparison)

#### `.await`
No, don't get me wrong, I think `future.await` is much better than `await future`.
Not a fan of cargo cult.

But `future.await` looks as if it is a field,
especially if the syntax highlighter is not aware of it
(e.g. in outdated syntax highlighting libraries).
Why can't we make it *look like* a postfix macro,
like `future.await!` or `future.await!()`?

### Pattern matching
#### `let` has mixed use in assignment and conditions
If you didn't know Rust, you're probably confused why we use `=` instead of `==` here

```rs
if let Enum::Variant = value {
    // ...
}
```

This is getting even more confusing with the let...else syntax

```rs
let Ok(socket) = Socket::bind(addr) else {
    anyhow::bail!("Cannot bind socket to {addr}");
};
```

Guess what, `expr else { ... }` itself is not a condition. You actually group it as `let pat = expr` `else` `{...}` instead.

#### Ambiguous new bindings
```rs
pub fn validate(input_byte: u8) -> u8 {
    const FORBIDDEN: u8 = 42;
    
    match input_byte {
        FORBIDDEN => panic!("input byte cannot be {FORBIDDEN}"),
        value => return value,
    }
}

fn main() {
    dbg!(validate(41));
    dbg!(validate(42));
}
```

```
[src/main.rs:11] validate(41) = 41
thread 'main' panicked at 'input byte cannot be 42', src/main.rs:5:22
```

`FORBIDDEN` is a constant but `value` is a new binding? how is this even supposed to work?

#### Use of `|` in patterns
```rs
fn convert_bitmask(input: u8) -> Option<bool> {
    match x {
        Some(1 | 2 | 4) => Some(true),
        Some(..8) => Some(false),
        _ => None,
    }
}
```

Didn't learn Rust before? Guess what the above means:

- "returns true if x is 1 or 2 or 4"
- "returns true if x is 7 (1 bitwise-OR 2 bitwise-OR 4)"

#### `_` and `_var` mean different things
You may have noticed that you can suppress the unused variable warning with both `let _ =` and `let _var =`.
Golang prefers the former (the latter doesn't work in Go), but they actually mean different things in Rust.

`_var` is just a normal binding name (local variable name in the case of `let _var =`),
similar to `var`, except it suppresses some warnings about unused identifiers.

`_` is the pattern that matches everything and binds the value to nothing.
Since there is no binding, the value is dropped immediately.

This causes a significant difference when it comes to RAII:

```rs
{
    let _guard = mutex.lock();
    println!("mutex is still locked here");
}
println!("mutex is unlocked here because `_guard` is dropped");
```

```rs
{
    let _ = mutex.lock();
    println!("mutex is already unlocked here because `_` drops immediately");
}
```

Fortunately, this is usually not a problem, because it is rare to acquire an unused RAII object
(although it is still used in some rare occasions).

## Testing
### Difficulty to unit test procedural macros
Procedural macros are typically tested in the form of integration tests,
where the user writes example code to use the procedural macro
and check if it compiles correctly.
However, this means the whole program is tested together,
and breaking any single part would cause all tests to fail.
This is particularly troublesome when errors caused by macro output are hard to debug.

It doesn't make sense to unit test particular functions that generate `TokenStream` output,
because that would imply writing test cases that repeat every token in the generated output,
and modifying or even reformatting (e.g. adding trailing commas) any part of the output
would require updating all unit tests.
We want to unit test the logic to ensure that
each unit contributes what it is expected to contribute,
not to unit test the exact shape of the generated code.

Another option is to have `#[cfg(test)] #[proc_macro*] pub fn` macros that expose the internal functions,
but this suffers from two problems.
Firstly, this involves additional code to parse an additional TokenStream to serve as the input arguments,
and sometimes inapplicable when the macro generates a special, incomplete part of the syntax tree
like match arms, struct fields, trait bounds, other non-item non-statement ASTs,
or a combination of multiple ASTs (e.g. multiple non-exhaustive match arms).
Secondly, this only works when the integration test is located in the same package as the procedural macro.
This is often not the case when the procedural macro references some code in another closely coupled crate,
e.g. `derive@serde_derive::Deserialize` depends on `trait@serde::Deserialize`.
This can be solved by setting a dependent as a dev-dependency,
but such a solution would not be scalable.

### Inflexible panic tests
Unit tests currently allow `#[should_panic]` to assert that a unit test should panic.
`#[should_panic(expected = "substring")]` additionally asserts that the panic message must contain `substring`.
This is not ideal for two reasons.
Firstly, substrings may not be safe as some parts of the panic message remain untested.
Secondly, the panic message is an implementation detail of the panic but not the subject to be tested.
The goal is to test that a certain code panics for a known reason, not to test what it shows to the user.
For example, `panic!("Cannot insert entry into string map because {:?} is not one of String or &'static str", ty)`,
where `ty: std::any::TypeId`, cannot be tested properly,
because we cannot make sure that both `insert entry into the string map`
and `is not one of String or &'static str` appear in the panic message
since `TypeId as fmt::Debug` has inconsistent output
(which is perfectly fine because it should not be relied on).
This also means we cannot test which `TypeId` the panic involves
(although a crate like this should probably attempt to take the `any::type_name` for slightly more consistent output).

Co-authored-by: xqwtxon <xqwtxon@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-primitive Primitive types related proposals & ideas A-string Proposals relating to strings. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

3 participants