New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce proc_macro::Span::source_text #55780

Open
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
10 participants
@ogoffart
Contributor

ogoffart commented Nov 8, 2018

A function to extract the actual source behind a Span.

Background: I would like to use syn in a build.rs script to parse the rust code, and extract part of the source code. However, syn only gives access to proc_macro2::Span, and i would like to get the source code behind that.
I opened an issue on proc_macro2 bug tracker for this feature alexcrichton/proc-macro2#110 and @alexcrichton said the feature should first go upstream in proc_macro. So there it is!

Since most of the Span API is unstable anyway, this is guarded by the same proc_macro_span feature as everything else.

@rust-highfive

This comment has been minimized.

Collaborator

rust-highfive commented Nov 8, 2018

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Nov 12, 2018

So this code seems fine, but I'm not sure from a procedural and stability point of view what is the best way to handle this.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Nov 12, 2018

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 12, 2018

One doubt i had was if we should return None , instead of the macro call inside for span belonging to the call site. (reemit! example in the test)

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Nov 12, 2018

This seems like a reasonable API edition to me and one that we'll want in the long haul. If any procedural macro has whitespace-sensitive parsing associated with it then accessing the source text via means like this is intended to be the main way to actually do the parsing.

I don't think we're on track to stabilize this in the near term, but in terms of a long-term addition I think we'll want this which to me means it's fine to land unstable for now in proc_macro

@dtolnay

This comment has been minimized.

Member

dtolnay commented Nov 12, 2018

We might want to strip comments. What do others think? I can get on board with whitespace-sensitive macro DSLs such as languages that differentiate between a-b and a - b. But I would like macros to be forced to use /// and /** */ for any assignment of meaning to text within comments, with // and /* */ guaranteed to be meaningless.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Nov 12, 2018

I could go either way on comments personally, but one aspect about omitting comments that may be a bit odd is if the difference of byte positions of a span is very different from the length of the source text due to comment removal

@dtolnay

This comment has been minimized.

Member

dtolnay commented Nov 12, 2018

Good call. We could sub out the comment with spaces.

@alexcrichton

This comment has been minimized.

Member

alexcrichton commented Nov 12, 2018

Seems plausible to me!

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Nov 14, 2018

I ... I don't know. If we're going to give the source text, I'm inclined to just give the source text, and let macros do weird things with comments. Let the market decide. =)

e.g., sometimes people add "pre and post conditions" in the form of specially formatted comments. That seems not terrible to me.

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 15, 2018

I think we should keep preserve the comment.

As an usecase, the main reason I'm doing this change is for the cpp crate which extract C++ code. And people use comments in C++ to annotate things for static analyzers. (For example, gcc's -Wimplicit-fallthrough warning understands the /* falls through */ comments in the code.)
(I know that Rust and C++ have different lexing rules regarding comments, but I assume developers can cope with that)

Another usecase would be to print snippets of the code while compiling for better diagnostics. We wants the comments in this case.

@nikomatsakis

This comment has been minimized.

Contributor

nikomatsakis commented Nov 16, 2018

@ogoffart interesting. Makes sense to me.

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 17, 2018

What should I do now?

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 21, 2018

Show resolved Hide resolved src/libproc_macro/lib.rs Outdated
Show resolved Hide resolved src/libproc_macro/lib.rs Outdated

Requested changes done.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 22, 2018

I'm worried about giving guarantees to users about whitespace and comments because that forces alternative Rust compiler implementations into preserving such things rather than just throwing such things away permanently during lexing. In other words, should we give a guarantee, this effectively forces all Rust compilers to use a certain compilation model and makes that part of the specification.

If this was not a guarantee but rather "at the compilers option, you may get whitespace and comments..." then I'd be less worried.

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 23, 2018

That's why it returns an Optional. If the compiler do not have access to the actual source code, it can return None.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 23, 2018

@ogoffart Ah; I thought

It only returns a result if the span corresponds to real source code.

referred only to getting None when the code was produced by macros and such...

Can we clarify this in the documentation somehow that compilers are not required to give you the actual source code even in cases where it's not produced by macros?

@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Nov 23, 2018

It would be good to somehow document this as unstable, "best effort" and restricted to "for diagnostics only".
If the macro succeds then the observable result should only rely on tokens, but not on this text.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 23, 2018

@petrochenkov Yeah; "best effort" / "for diagnostics only" sounds like appropriate wording; thank you <3.

@roblabla

This comment has been minimized.

Contributor

roblabla commented Nov 23, 2018

My specific use-case is a power_assert macro. I want an assertion macro that has the following output:

thread '<main>' panicked at 'assertion failed: bar.val == bar.foo.val
power_assert!(bar.val == bar.foo.val)
              |   |   |  |   |   |
              |   3   |  |   |   2
              |       |  |   Foo { val: 2 }
              |       |  Bar { val: 3, foo: Foo { val: 2 } }
              |       false
              Bar { val: 3, foo: Foo { val: 2 } }
', examples/normal.rs:26

In order to do this, I get the span of the full expression (bar.val == bar.foo.val), and then the span of each internal component. By looking at the Span::start(), I am able to place the labels at the correct position (basically, component.start().column - full.start().column will give me the column the expression starts at within the full expression).

For this to work, Span::start() and the string I print out need to match.

If this was not a guarantee but rather "at the compilers option, you may get whitespace and comments..." then I'd be less worried.

If we don't get whitespace and comments, then we run the risk of having Span::start() become out of sync with the raw text, breaking the above functionality if a comment was put inside the assert macro.

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 23, 2018

@roblabla: do you take in to account the fact that the column is in utf-8 bytes.

   /* 🐘 */  power_assert!(normalize("🐘") /* Éléphant emoji */ == "Éléphant" );

In order to do that, you indeed need to know what exactly is in the comments (how many byte, corresponds to how many code points) (I guess this should be computed with UnicodeWidthStr::width(...))

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 23, 2018

@roblabla

If we don't get whitespace and comments, then we run the risk of having Span::start() become out of sync with the raw text, breaking the above functionality if a comment was put inside the assert macro.

Can you not have some fallback such that the power_assert! macro just gives less good "diagnostics" when Span::start() returns None? It seems to me that you'll have to handle that anyway if power_assert! is done inside a macro (the macro wouldn't be very good if you couldn't...)? Is there some difference in terms of correctness if None is returned here?

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Nov 23, 2018

I added a note that this should not be relied upon, and is only there for diagnostics.

@bors

This comment has been minimized.

Contributor

bors commented Nov 30, 2018

☔️ The latest upstream changes (presumably #49219) made this pull request unmergeable. Please resolve the merge conflicts.

@ogoffart ogoffart force-pushed the ogoffart:span_source_text branch from 31fc090 to e88b0d9 Dec 1, 2018

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Dec 1, 2018

Rebased.

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Dec 6, 2018

Ping.

(Also cc @eddyb as the rebase touches the proc_macro protocol)

@eddyb

This comment has been minimized.

Member

eddyb commented Dec 6, 2018

cc #56474

Also, I'll take a look, but adding methods to the proc_macro protocol should be straight-forward and uneventful most of the time (assuming you only add a single line to src/libproc_macro/bridge per method, and make no other changes).

@eddyb

eddyb approved these changes Dec 6, 2018

@ogoffart

This comment has been minimized.

Contributor

ogoffart commented Dec 7, 2018

adding methods to the proc_macro protocol should be straight-forward and uneventful most of the time

Just wondering if there was no versioning to do, as it seems calling a non-existing function would make rustc panic in an unreachable!.
But I suppose then that using a proc_macro crate that is newer than rustc is not possible.

@eddyb

This comment has been minimized.

Member

eddyb commented Dec 7, 2018

@ogoffart I have considered having some sort of version-like sanity check, but it can be fully automatic, because the problem can only appear at stage0, if somehow the locally built libproc_macro is used with the downloaded compiler (which the build system should prevent anyway).

The expectation is that libproc_macro's built from the same source are ABI-compatible, despite being built by different Rust compilers, so anyone modifying libproc_macro's source shouldn't have to worry about compatibility at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment