Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value mangling #61486

Open
4 of 6 tasks
varkor opened this issue Jun 3, 2019 · 18 comments
Open
4 of 6 tasks

Value mangling #61486

varkor opened this issue Jun 3, 2019 · 18 comments
Assignees
Labels
A-const-generics Area: const generics (parameters and arguments) C-enhancement Category: An issue proposing an enhancement or a PR with one. F-adt_const_params `#![feature(adt_const_params)]` requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@varkor
Copy link
Member

varkor commented Jun 3, 2019

For const generics, we will eventually allow arbitrary structural_match values as generic arguments. Thus, we need to figure out how to mangle them. At the moment, only unsigned integers are supported.

min_const_generics

adt_const_params

  • Tuples and arrays
  • Complex data types

cc @eddyb @yodaldevoid

@varkor varkor added the A-const-generics Area: const generics (parameters and arguments) label Jun 3, 2019
@eddyb
Copy link
Member

eddyb commented Jun 3, 2019

cc #60705

@est31
Copy link
Member

est31 commented Jun 3, 2019

At the moment, only unsigned integers are supported.

bools work too, don't they? #61383 (comment)

@varkor
Copy link
Member Author

varkor commented Jun 3, 2019

@est31: the current implementation of const generics supports more types than the mangle supports, which is something we need to address.

@est31
Copy link
Member

est31 commented Jun 3, 2019

@varkor I see. As you are special casing bools already in #61409 do you think they should be allowed in the initial version of const generics, too? Mangling schemes for them are straightforward.

@varkor
Copy link
Member Author

varkor commented Jun 3, 2019

It's unclear what "initial version of const generics" means at this point. We're not ready to talk about stabilisation yet. Hopefully we'll be able to discuss proper mangling soon. I know @eddyb already has some ideas.

@yodaldevoid
Copy link
Contributor

Quoting @eddyb from Discord channel #compiler for my own benefit if nothing else:

unsigned integers is really what people want, IMO. but I have an idea of how to encode arbitrary ADTs into the symbol mangling, so it shouldn't be too bad

(basically something like std::fmt::Formatter::debug_{list,struct,tuple}, and the struct/tuple forms can take a path that's the path of the variant in an enum, for enum values)

lol I didn't finish my thought: you'd encode Some(0i32) as something like TNvINt...6OptioniE4Somei0_E
that is, ...::Option<i32>::Some(0i32)

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 9, 2019
@Centril Centril added F-const_generics `#![feature(const_generics)]` requires-nightly This issue requires a nightly compiler in some way. labels Aug 6, 2019
@oli-obk
Copy link
Contributor

oli-obk commented Jan 24, 2020

Such printing should probably share the walking logic with the const validator (https://github.com/rust-lang/rust/blob/master/src/librustc_mir/interpret/validity.rs), because both need to do the same thing.

@eddyb
Copy link
Member

eddyb commented Mar 14, 2020

These tests fail with -Zsymbol-mangling-version=v0:

    [ui] ui/const-generics/fn-const-param-call.rs
    [ui] ui/const-generics/issues/issue-61432.rs
    [ui] ui/const-generics/issues/issue-62579-no-match.rs
    [ui] ui/const-generics/raw-ptr-const-param-deref.rs
    [ui] ui/const-generics/slice-const-param.rs

Maybe we should start adding revisions that use -Zsymbol-mangling-version=v0 to all const generic tests and FIXMEs pointing to this issue for the ones that don't work?


More specifically, what I did to test this was replace Legacy with V0 here:

symbol_mangling_version: SymbolManglingVersion = (SymbolManglingVersion::Legacy,

@eddyb
Copy link
Member

eddyb commented Mar 14, 2020

I've just noticed a potential issue with the syntax:
We support p after the "basic type" (currently only unsigned integers).

As a type, p means "placeholder" and is printed as _.
This is useful for e.g. a static inside an impl method, because it doesn't depend on the impls's parameters, so we end up with e.g. <Foo<_>>::foo::BAR.

But for a constant integer, we encode a similar situation (e.g. <ArrayVec<_, _>>::foo::BAR) using, for the const generic arg, jp (j is the type usize) - so more like _: usize than just _.
(For completeness, 42: usize is encoded as j2a_ - i.e. we use the value in hex)

Now, jp vs just p (which is not supported as a constant right now) is harmless, even if a bit longer, and the ability to print _: usize in verbose mode is neat.

But if we start including ADTs, we might need, for const generics of type (usize, usize), to encode TjpjpE, i.e. (_: usize, _: usize), printed as (_, _) by default.
This is starting to get weird, but it's still manageable, the real problem is enums.

How would you encode an erased Option<usize>? Normal values of that type would be rely on either Option::<usize>::Some or Option::<usize>::None, but for placeholders we can't choose.


The good news is we can still add a general placeholder, either just p as a constant, or p followed by its type (if we want to be able to print a type).
However, since the type of a constant must be monomorphic (or maybe in the future, determined by type parameters in scope), we don't have to encode it.

Speaking of which, the current syntax is also redundant in that we print all the integers the same, so we don't really need to encode which one it is, just that the hex nibbles should be printed as an integer.

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 20, 2020
ty/print: pretty-print constant aggregates (arrays, tuples and ADTs).

Oddly enough, we don't have any UI tests showing this off in types, only `mir-opt` tests.
However, the pretty form should show up in the test output diff of rust-lang#71018, if this PR is merged first.

<hr/>

Examples of before/after:
|`Option<bool>`|
|:-:|
|`{transmute(0x01): std::option::Option<bool>}`|
| :sparkles: ↓↓↓ :sparkles: |
|`std::option::Option::<bool>::Some(true)`|

| `RawVec<u32>` |
|:-:|
| `ByRef { alloc: Allocation { bytes: [4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], relocations: Relocations(SortedMap { data: [] }), undef_mask: UndefMask { blocks: [65535], len: Size { raw: 16 } }, size: Size { raw: 16 }, align: Align { pow2: 3 }, mutability: Not, extra: () }, offset: Size { raw: 0 } }: alloc::raw_vec::RawVec::<u32>`|
| :sparkles: ↓↓↓ :sparkles: |
|`alloc::raw_vec::RawVec::<u32> { ptr: std::ptr::Unique::<u32> { pointer: {0x4 as *const u32}, _marker: std::marker::PhantomData::<u32> }, cap: 0usize, alloc: std::alloc::Global }`|

<hr/>

This PR is a prerequisite for rust-lang#61486, *sort of*, in that we need to be able to pretty-print values in order to even consider how we might mangle them.
We still don't have pretty-printing for constants of reference types, @oli-obk has the necessary support logic in a PR but I didn't want to interfere with that.

<hr/>

Each commit should be reviewed separately, as I've fixed a couple deficiencies along the way.

r? @oli-obk cc @rust-lang/wg-mir-opt @varkor @yodaldevoid
@varkor
Copy link
Member Author

varkor commented Sep 13, 2020

We're permitting signed integers for min_const_generics, so we need to at least handle those.

@varkor
Copy link
Member Author

varkor commented Oct 22, 2020

Signed integers and char are now supported: #77554, so marking this as unblocking.

@michaelwoerister
Copy link
Member

@eddyb I'm trying to understand the issue you mentioned above. Can you give a concrete example that is not covered by the <const> = <type> "p" production? Right now I don't understand what the specific problem is.

@eddyb
Copy link
Member

eddyb commented Apr 30, 2021

@michaelwoerister It's been resolved already (as part of #77554):

if let Some(bits) = val {
// We only print the type if the const can be evaluated.
self = ct.ty.print(self)?;
let _ = write!(self.out, "{}{:x}_", if neg { "n" } else { "" }, bits);
} else {
// NOTE(eddyb) despite having the path, we need to
// encode a placeholder, as the path could refer
// back to e.g. an `impl` using the constant.
self.push("p");
}

That p doesn't have a type before it. So we're now using the "improved" approach I was describing in that comment.

EDIT: the grammar should be updated too.

@eddyb
Copy link
Member

eddyb commented Apr 30, 2021

@michaelwoerister Actually, to answer the more interesting question: <const> = <type> "p" is wrong and largely unworkable - <const> cannot start with <type> without severely overcomplicating the encoding of ADT constants.

So the current implementation only supports <const> starting with <basic-type>. And ideally we'd reuse the non-basic type constructor letters for values. Say, Ai1_i2_i3_E for [1, 2, 3].

Before I realized we'd more or less have to encode "a tree of ADT constructors" for more interesting values, I still thought we could generally do <type> {<hex-nibble>} "_" to encode arbitrary data, but that's flawed.

There are too many factors that make it almost impossible, and also quite undesireable, to have demanglers understand how data is laid out. So only bool, char & integers (and maybe strs in the future) can use raw bytes.

@eddyb
Copy link
Member

eddyb commented Jul 13, 2021

In #87082 (comment) I wrote this, which seems like it would be useful to have in this thread:

We only want to ever allow type-level constants (such as those passed to const params) to be one of:

  1. integer leaves (iN, uN, and also bool and char)
  • if we allow raw pointers, they would only be allowed here, with integer addresses
  • maybe str and [u8] would also go in here for efficiency, unsure yet about that
  1. ADTs of other type-level constants (struct, enum, tuples, arrays, slices, etc.)
  • no C unions, since we can't guarantee we can describe them as pure values
  • any user-defined ADTs have to have be "structurally matchable" (i.e. auto-derived PartialEq+Eq)
  • arguably this includes &T too - even if they are normally thought of as indirection, at the type-level &value acts as a newtype of value, as &T has structural equality that makes it isomorphic to T

(cc @oli-obk - did I miss anything?)

We already have the leaves, what's lacking in the mangling is the second part (the ADTs) - it's not that much design space, since e.g. enum variants can be treated as just structs (i.e. we encode the path to the variant and pretend it's a struct path) - the tricky parts how much aesthetic detail we put in, like whether we encode field names, how differently to encode tuple structs (make them like regular structs or like tuples?), whether we add a special case for str to avoid encoding it as an array, etc. etc.

@eddyb
Copy link
Member

eddyb commented Jul 16, 2021

First attempt at previously suggested constant value mangling approaches, is now up at #87194 (including grammar).

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 26, 2021
…woerister,oli-obk

rustc_symbol_mangling: support structural constants and &str in v0.

This PR should unblock rust-lang#85530 (except for float `const` generics, which AFAIK should've never worked).
(cc `@tmiasko` could the rust-lang#85530 (comment) failures be retried with a quick crater "subset" run of this PR + changing the default to `v0`? Just to make sure I didn't miss anything other than the floats)

The encoding is the one suggested before in e.g. rust-lang#61486 (comment), tho this PR won't by itself finish rust-lang#61486, before closing that we'd likely want to move to `@oli-obk's` "valtrees" (i.e. rust-lang#83234 and other associated work).

<hr>

**EDITs**:
1. switched unit/tuple/braced-with-named-fields `<const-fields>` prefixes from `"u"`/`"T"`/`""` to `"U"`/`"T"`/`"S"` to avoid the ambiguity reported by `@tmiasko` in rust-lang#87194 (comment).

2. `rustc-demangle` PR: rust-lang/rustc-demangle#55

3. RFC amendment PR: rust-lang/rfcs#3161
    * also removed the grammar changes included in that PR, from this description

4. added tests (temporarily using my fork of `rustc-demangle`)

<hr>

r? `@michaelwoerister`
@lcnr
Copy link
Contributor

lcnr commented Jun 24, 2022

to my knowledge this is now integrated in v0 mangling? @eddyb can this be closed?

@lcnr lcnr added F-adt_const_params `#![feature(adt_const_params)]` and removed F-const_generics `#![feature(const_generics)]` labels Jun 24, 2022
@eddyb
Copy link
Member

eddyb commented Jun 24, 2022

An implementation is complete, but I would block this on the RFC PR getting merged (see its discussion):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const-generics Area: const generics (parameters and arguments) C-enhancement Category: An issue proposing an enhancement or a PR with one. F-adt_const_params `#![feature(adt_const_params)]` requires-nightly This issue requires a nightly compiler in some way. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants