Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to determine hygienic context for "non-atomic" code fragments #50122

Open
petrochenkov opened this Issue Apr 20, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 20, 2018

Reminder: Syntactic/hygienic context is formally a "chain of expansions" and informally "the place where something is actually written". For example, in this example

macro m($b: expr) {
    a + $b
}

let x = m!(b);

a's context is inside the macro m and b's - outside of the macro.
With Macro 2.0 hygiene names are resolved in locations where they are "actually written".

For "atomic" tokens like identifiers or punctuation signs the context is unambiguous, but complex entities like expressions or types can be combined from tokens introduced in different contexts, look for example at this code ultimately expanding into println!("Hello world!")

macro context_parens($name: tt, $bang: tt, $args: tt) {
    $name $bang ( $args )
}

macro context_hello($name: tt, $bang: tt) {
    context_parens!($name, $bang, "Hello world!")
}

macro context_bang($name: tt) {
    context_hello!($name, !)
}

macro context_println() {
    context_bang!(println)
}

fn main() {
    context_println!();
}

So, what is the "call site" context of the println macro in this case?
Where should we resolve identifiers with call-site hygiene for macros invoked like this?

Contexts of "non-atomic" entities are important for several other reason than determining call-site hygiene, for example in Struct { field1, field2, ..rest } fields fieldN where N > 2 are checked for privacy in the context of ..rest fragment, but that fragment may also be a Frankenstein's monster combined from pieces with different contexts.


Proposed solution:

For each complex entity figure out and document an atomic entity that is "essential" for that complex entity and that serves as a source of hygienic context for the complex entity.

For example, for binary operator expressions the context may be determined by the context of the operator: context($a + $b) = context(+), for the "remaining fields" fragment ..$rest mentioned above the context may be determined by the context of .., etc.

I'm... not sure what that essential atomic token would be for macro invocations, probably ! for bang macros and [] for attribute macros.
(Note that paired delimiters like (), [] and {} always have the same context in a pair).

@CAD97

This comment has been minimized.

Copy link
Contributor

CAD97 commented Apr 20, 2018

mod a {
    const A: i32 = 3;
    macro a() { A }
}
mod b {
    const B: i32 = 2;
    macro b() { B }
}
mod sum {
    macro sum($lhs, $rhs) { $lhs!() + $rhs!() }
}
fn whatami() -> i32 {
    use a::a;
    use b::b;
    use sum::sum;
    sum!(a, b)
}

I think it's pretty clear that should work and expand to

fn whatami() -> i32 {
    ::a::A + ::b::B
}

Does it do so with your proposed rule? Is my intuition wrong and this shouldn't make sense? Is there a different way of composing macro imports and call locations such that it would break?

My intuition is that if I name something at the call site, if the macro uses it as an item, it resolves to what it is at the call site. If I name something at the macro def site, it resolves to whatever that name means if that name had been used in a function at that source location.

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented Apr 21, 2018

@CAD97
Yes, the example should work (after a couple of added pubs and : idents) and the intuition is correct.
The example never makes use of hygiene contexts of "complex" entities though, only contexts of "atomic "identifiers (e.g. a, b), so this issue doesn't apply.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Apr 25, 2018

@petrochenkov

So i thought the most natural solution would be to use the call-site of the path that invoked the println. On the other hand, that would screw with users, because there would be no way to receive a macro from another syntax context and expand it in your own syntax context.

The second option that I currently think is natural is to use the "span of the expression" - the syntax context where the relevant expression was constructed in (is this not a valid concept in some case? this would be context_parens in your example), and also expand that ideal to all cases (including e.g. the privacy of the misc fields in an ExprStruct).

The above might however create some confusion (e.g. privacy in method calls - if the method name and the method call come from different scopes: what is determined by the method name, and what is determined by the method call?).

The main difference between the "span of expression" and "span of characteristic token" is how you pass around expansion sites, which is by either CPS-passing a macro that constructs the expression, or using a "magic" characteristic token.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Apr 25, 2018

In "span of expression", you can use

// in mod A

macro foo($expand: ident) {
    $expand!(println!("foo"));
}

// in mod B

macro expand($macro:ident ! $args:tt) {
    $macro ! $args
}
foo!(expand); // this expands `println!("foo")` here.

While in "span of characteristic token", you use

// in mod A

macro foo($bang: tt) {
    println $bang ("foo")
}

// in mod B

foo!(!); // this expands `println!("foo")` here.
@nikomatsakis

This comment has been minimized.

Copy link
Contributor

nikomatsakis commented May 30, 2018

Hmm, @petrochenkov's initial thought of a "span of characteristic token" is indeed what I initially expected, but I see the appeal of @arielb1's "span of the expresion" as well. This obviously affects the question in #50376 as well (which concerns use paths).

I wanted to step back a second and try to establish what our goals are. From my perspective, we should be shooting for two things:

  • Rules we can explain, of course.
  • Rules that mean that "normal macros" behave as expected with respect to hygiene:
    • as a side effect, this should ensure that they work across editions, and that can be a useful guideline.

I was trying to think about the various things that might be tied ultimately to hygiene:

  • Name resolution, of course
    • In the context of a use, this also affects the "global context" (see #50376)
  • Overflow behavior of + etc (checked or unchecked)
    • overflow checks can currently be enable either globally or per crate; but what should happen for macros?
  • Method name resolution
    • what traits are in scope?
  • Are we in an unsafe section?
  • Others?

I am wondering whether these distinct uses might introduce competing demands (e.g., perhaps one gives "more natural" results with the span-of-expr approach vs span-of-characteristic-token). I suppose we must also consider macro and macro_rules! somewhat distinctly.

@petrochenkov

This comment has been minimized.

Copy link
Contributor Author

petrochenkov commented May 30, 2018

Yes, @arielb1's suggestion is the primary alternative, but I think the "characteristic token" is preferable.

@nikomatsakis

I wanted to step back a second and try to establish what our goals are.

Simplicity, first of all! This covers both "can teach", "can specify" and also implementation complexity.

Unless we a trying to assign the context to something that is never actually written in the source code like #50376, our choice should almost never matter because for a + b context of + and "concatenation context" are almost always same.

In this sense characteristic token context is simpler to implement and explain because it's something "real" and visible rather than an abstract point during expansion process.
"Normal macros" should rarely care about the distinction.

(Note that a + b has two "concatenation contexts" and we need to also decide which of them is the "primary concatenation context" and this is not so obvious for cases like

macro m($a_plus: tts) {
    $a_plus b
}

m!(a +)

)

Others?

Privacy.
Currently, fields inside of ..rest in Struct { a, b, ..rest } should be checked in the context of ..rest.
In the future, EnhancedTM Type Privacy should also use expression contexts to avoid checking "implementation details" of macros while still checking their "outputs".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.