Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine hygienic context for "non-atomic" code fragments #50122

Closed
petrochenkov opened this issue Apr 20, 2018 · 7 comments
Closed

How to determine hygienic context for "non-atomic" code fragments #50122

petrochenkov opened this issue Apr 20, 2018 · 7 comments
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@petrochenkov
Copy link
Contributor

Reminder: Syntactic/hygienic context is formally a "chain of expansions" and informally "the place where something is actually written". For example, in this example

macro m($b: expr) {
    a + $b
}

let x = m!(b);

a's context is inside the macro m and b's - outside of the macro.
With Macro 2.0 hygiene names are resolved in locations where they are "actually written".

For "atomic" tokens like identifiers or punctuation signs the context is unambiguous, but complex entities like expressions or types can be combined from tokens introduced in different contexts, look for example at this code ultimately expanding into println!("Hello world!")

macro context_parens($name: tt, $bang: tt, $args: tt) {
    $name $bang ( $args )
}

macro context_hello($name: tt, $bang: tt) {
    context_parens!($name, $bang, "Hello world!")
}

macro context_bang($name: tt) {
    context_hello!($name, !)
}

macro context_println() {
    context_bang!(println)
}

fn main() {
    context_println!();
}

So, what is the "call site" context of the println macro in this case?
Where should we resolve identifiers with call-site hygiene for macros invoked like this?

Contexts of "non-atomic" entities are important for several other reason than determining call-site hygiene, for example in Struct { field1, field2, ..rest } fields fieldN where N > 2 are checked for privacy in the context of ..rest fragment, but that fragment may also be a Frankenstein's monster combined from pieces with different contexts.


Proposed solution:

For each complex entity figure out and document an atomic entity that is "essential" for that complex entity and that serves as a source of hygienic context for the complex entity.

For example, for binary operator expressions the context may be determined by the context of the operator: context($a + $b) = context(+), for the "remaining fields" fragment ..$rest mentioned above the context may be determined by the context of .., etc.

I'm... not sure what that essential atomic token would be for macro invocations, probably ! for bang macros and [] for attribute macros.
(Note that paired delimiters like (), [] and {} always have the same context in a pair).

@petrochenkov petrochenkov added the A-macros-2.0 Area: Declarative macros 2.0 (#39412) label Apr 20, 2018
@CAD97
Copy link
Contributor

CAD97 commented Apr 20, 2018

mod a {
    const A: i32 = 3;
    macro a() { A }
}
mod b {
    const B: i32 = 2;
    macro b() { B }
}
mod sum {
    macro sum($lhs, $rhs) { $lhs!() + $rhs!() }
}
fn whatami() -> i32 {
    use a::a;
    use b::b;
    use sum::sum;
    sum!(a, b)
}

I think it's pretty clear that should work and expand to

fn whatami() -> i32 {
    ::a::A + ::b::B
}

Does it do so with your proposed rule? Is my intuition wrong and this shouldn't make sense? Is there a different way of composing macro imports and call locations such that it would break?

My intuition is that if I name something at the call site, if the macro uses it as an item, it resolves to what it is at the call site. If I name something at the macro def site, it resolves to whatever that name means if that name had been used in a function at that source location.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented Apr 21, 2018

@CAD97
Yes, the example should work (after a couple of added pubs and : idents) and the intuition is correct.
The example never makes use of hygiene contexts of "complex" entities though, only contexts of "atomic "identifiers (e.g. a, b), so this issue doesn't apply.

@arielb1
Copy link
Contributor

arielb1 commented Apr 25, 2018

@petrochenkov

So i thought the most natural solution would be to use the call-site of the path that invoked the println. On the other hand, that would screw with users, because there would be no way to receive a macro from another syntax context and expand it in your own syntax context.

The second option that I currently think is natural is to use the "span of the expression" - the syntax context where the relevant expression was constructed in (is this not a valid concept in some case? this would be context_parens in your example), and also expand that ideal to all cases (including e.g. the privacy of the misc fields in an ExprStruct).

The above might however create some confusion (e.g. privacy in method calls - if the method name and the method call come from different scopes: what is determined by the method name, and what is determined by the method call?).

The main difference between the "span of expression" and "span of characteristic token" is how you pass around expansion sites, which is by either CPS-passing a macro that constructs the expression, or using a "magic" characteristic token.

@arielb1
Copy link
Contributor

arielb1 commented Apr 25, 2018

In "span of expression", you can use

// in mod A

macro foo($expand: ident) {
    $expand!(println!("foo"));
}

// in mod B

macro expand($macro:ident ! $args:tt) {
    $macro ! $args
}
foo!(expand); // this expands `println!("foo")` here.

While in "span of characteristic token", you use

// in mod A

macro foo($bang: tt) {
    println $bang ("foo")
}

// in mod B

foo!(!); // this expands `println!("foo")` here.

@nikomatsakis
Copy link
Contributor

Hmm, @petrochenkov's initial thought of a "span of characteristic token" is indeed what I initially expected, but I see the appeal of @arielb1's "span of the expresion" as well. This obviously affects the question in #50376 as well (which concerns use paths).

I wanted to step back a second and try to establish what our goals are. From my perspective, we should be shooting for two things:

  • Rules we can explain, of course.
  • Rules that mean that "normal macros" behave as expected with respect to hygiene:
    • as a side effect, this should ensure that they work across editions, and that can be a useful guideline.

I was trying to think about the various things that might be tied ultimately to hygiene:

I am wondering whether these distinct uses might introduce competing demands (e.g., perhaps one gives "more natural" results with the span-of-expr approach vs span-of-characteristic-token). I suppose we must also consider macro and macro_rules! somewhat distinctly.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented May 30, 2018

Yes, @arielb1's suggestion is the primary alternative, but I think the "characteristic token" is preferable.

@nikomatsakis

I wanted to step back a second and try to establish what our goals are.

Simplicity, first of all! This covers both "can teach", "can specify" and also implementation complexity.

Unless we a trying to assign the context to something that is never actually written in the source code like #50376, our choice should almost never matter because for a + b context of + and "concatenation context" are almost always same.

In this sense characteristic token context is simpler to implement and explain because it's something "real" and visible rather than an abstract point during expansion process.
"Normal macros" should rarely care about the distinction.

(Note that a + b has two "concatenation contexts" and we need to also decide which of them is the "primary concatenation context" and this is not so obvious for cases like

macro m($a_plus: tts) {
    $a_plus b
}

m!(a +)

)

Others?

Privacy.
Currently, fields inside of ..rest in Struct { a, b, ..rest } should be checked in the context of ..rest.
In the future, EnhancedTM Type Privacy should also use expression contexts to avoid checking "implementation details" of macros while still checking their "outputs".

@XAMPPRocky XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 25, 2018
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 16, 2023
Do not provide suggestions when the spans come from expanded code that doesn't point at user code

Hide invalid proc-macro suggestions and track spans
coming from proc-macros pointing at attribute.

Effectively, unless the proc-macro keeps user spans,
suggestions will not be produced for the code they
produce.

r? `@ghost`

Fix rust-lang#107113, fix rust-lang#107976, fix rust-lang#107977, fix rust-lang#108748, fix rust-lang#106720, fix rust-lang#90557.

Could potentially address rust-lang#50141, rust-lang#67373, rust-lang#55146, rust-lang#78862, rust-lang#74043, rust-lang#88514, rust-lang#83320, rust-lang#91520, rust-lang#104071. CC rust-lang#50122, rust-lang#76360.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 17, 2023
Do not provide suggestions when the spans come from expanded code that doesn't point at user code

Hide invalid proc-macro suggestions and track spans
coming from proc-macros pointing at attribute.

Effectively, unless the proc-macro keeps user spans,
suggestions will not be produced for the code they
produce.

r? ``@ghost``

Fix rust-lang#107113, fix rust-lang#107976, fix rust-lang#107977, fix rust-lang#108748, fix rust-lang#106720, fix rust-lang#90557.

Could potentially address rust-lang#50141, rust-lang#67373, rust-lang#55146, rust-lang#78862, rust-lang#74043, rust-lang#88514, rust-lang#83320, rust-lang#91520, rust-lang#104071. CC rust-lang#50122, rust-lang#76360.
@petrochenkov
Copy link
Contributor Author

petrochenkov commented Jun 24, 2024

Update: the compiler eventually converged on the "span of expression" approach, because it's something natural to implement when you don't know anything about this issue.
It's not necessarily good, but that's something we'll have to live with, most likely.

So I'm going to close this issue in favor of #126763 which is supposed to give spans to "complex" code fragments in a more systematic way.

Without "characteristic tokens" contexts for fragments built from tokens with "heterogeneous" spans, coming from unrelated macros, will not be well defined.
In practice they will gravitate towards the context of the first span node (in terms of #126763).
Such fragments can be generated primarily by proc macros (or complex declarative macros).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-macros-2.0 Area: Declarative macros 2.0 (#39412) C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants