Diagnostic Improvements #1600

jsmall-zzz · 2020-11-09T22:10:02Z

When a source location is references a token that is not directly part of a input source file, Slang may not give very helpful diagnostics as to the origin of the error. A somewhat contrived example is in token-paste.slang

#define SOME %
#define THING %

#define A SOME
#define B THING

#define PASTE2(x, y) x##y
#define PASTE(x, y) PASTE2(x, y)

PASTE(A, B)

Slang outputs

token paste(1): error 20001: unexpected '%', expected identifier
tests/diagnostics/token-paste-location.slang(10): note: see token pasted location

That line 10 is the line with the ## macro paste.

Clang produces the following ...

<source>:11:1: error: pasting formed '%%', an invalid preprocessing token
PASTE(A, B)
^
<source>:8:21: note: expanded from macro 'PASTE'
#define PASTE(x, y) PASTE2(x, y)
                    ^
<source>:7:23: note: expanded from macro 'PASTE2'
#define PASTE2(x, y) x##y
                      ^
<source>:11:1: error: expected unqualified-id
<source>:8:21: note: expanded from macro 'PASTE'
#define PASTE(x, y) PASTE2(x, y)
                    ^
<source>:7:22: note: expanded from macro 'PASTE2'
#define PASTE2(x, y) x##y
                     ^
<source>:1:14: note: expanded from macro 'x'
#define SOME %

Gcc displays

<source>:1:14: error: pasting "%" and "%" does not give a valid preprocessing token
    1 | #define SOME %
      |              ^
<source>:7:22: note: in definition of macro 'PASTE2'
    7 | #define PASTE2(x, y) x##y
      |                      ^
<source>:11:1: note: in expansion of macro 'PASTE'
   11 | PASTE(A, B)
      | ^~~~~
<source>:4:11: note: in expansion of macro 'SOME'
    4 | #define A SOME
      |           ^~~~
<source>:11:7: note: in expansion of macro 'A'
   11 | PASTE(A, B)
      |       ^
<source>:1:14: error: expected unqualified-id before '%' token
    1 | #define SOME %
      |              ^
<source>:7:22: note: in definition of macro 'PASTE2'
    7 | #define PASTE2(x, y) x##y
      |                      ^
<source>:11:1: note: in expansion of macro 'PASTE'
   11 | PASTE(A, B)
      | ^~~~~
<source>:4:11: note: in expansion of macro 'SOME'
    4 | #define A SOME
      |           ^~~~
<source>:11:7: note: in expansion of macro 'A'
   11 | PASTE(A, B)
      |       ^

These errors are better at explaining what the error is, as well as how it happened through various macro applications.

There's a few things that we could improve in Slang

As well as listing the line number, we could list the contents of the line.
** Additionally underneath the line, we could display a caret (^) showing where in the line the location is
The macro paste must produce a valid token which %% isn't - Clang states this, Slang says 'unexpected %, expected identifier' - which doesn't seem quite right even
After the error Clang lists the macro expansions that led to the problem - Slang can only go as far as noting where the ## occurred in the source, which may not be very relevant to the specific problem.

So to number the suggested improvements

List the source line underneath the location, along with a caret. It might be nice to have an option to limit the line length, so if the line is too long it only displays the part around where the error occurred. If we had this feature, the max length should be configurable, and switch off-able.
Improvements around reporting of problems with a macro paste - might be as simple as better identifying the issue here
Tracking and display of macro expansion

The first two points are relatively simple to implement. The 3rd point probably deserves some discussion.

The most obvious first stab at implementing the macro expansion feature would be for each macro expansion, to write the tokens from the macro into a SourceFile, and then create a SourceView that uses it. Then use the SourceLocs for the start of each token to identify each token in the output. That stored in the SourceFile there could be a reference to macro that was expanded. Moreover this could also contain information about what tokens are from the macro, and what are from which parameter. There might also require some mechanism to indicate what tokens are due to a paste, and if so of what.

This might be a reasonable first stab at this feature, but a few observations

Every macro expansion will require multiple objects and allocations

At a minimum a SourceView, SourceFile, the buffer holding all the tokens, and other buffers to describe the token origin, source locations

Every parameter token, will need to hold the SourceLoc of it's origin
The SourceManager system arguably assumes some kind of immutability of the source

On point 3 SourceLocs are currently 32 bits. That every expansion will consume a new range of SourceLocs.

For a compilation when using a source file for a request, we could check to see if the timestamp had changed and reload it (and use new SourceLocs). Currently we don't do this - but it might be better overall behavior.

That a SourceManager is associated with a Linkage. That a Linkage may be involved with multiple compilations. That even though the Source may be static - the macro expansions cannot be assumed to be, and so each CompileRequest will expand macros and consume SourceLocs.

What to do?

Transitory Macro Expansion Source Manager

When doing a request, we could create a SourceManager that parented the manger that held source. The idea being that after the compilation we throw away this 'macro expansion' SourceManager. This doesn't quite work though, because if new Source is loaded, it can only be loaded into the 'macro expansion' SourceManager to avoid a clash of SourceLoc runs, and so would repeatedly reload 'static' source.

This could be worked around by having the macro expansion SourceManager take a range that we define cannot be used by it's parent 'source' manager. We could do this by say making 31 bits available of SourceLoc for Source and macro expansion, and use the top bit to determine which is which.

64 bit source Locs

Another tack would be to use 64 bit SourceLoc and perhaps 'not worry' - as such a clash then becomes very unlikely in most typical usage scenarios. Also whilst it is attractive from a simplicity point of view, all of the 'unused' macro expansions in effect become garbage. That garbage not only consumes memory, but increasing compute in terms of remapping back to Source. This is leaving aside the extra memory consumption of doubling the SourceLoc size.

For both these reasons it seems worth the extra effort to avoid this expansion.

SourceFile/View for a macro expansion?

Not only does this require 2 objects when ideally one would suffice, neither of them are really set up to hold the information desired - the macro used, how each token is produced, the SourceLocs of their source etc.

Therefore it may make sense to have a base class that say 'SourceView' and 'MacroExpansionView' (or whatever) derive from.

Serialization

There are already limitation around how SourceLocs are handled with serialization. That during serialization the easiest thing to do would be to replace any 'transitory' SourceLocs (ie from macro pasting) with the nearest source file SourceLoc. If we use bit 31 to identify 'transitory' or actual source, we have a fast way to determine these locations.

Other Details

We might not have to use up a SourceLoc range that is the size of all the text of the tokens concatenated, perhaps we use a SourceLoc range which uses one value for each Token. Doing so provides a more simple mapping from the SourceLoc to say the information about the token (it's then just a simple index).

The text was updated successfully, but these errors were encountered:

natduca · 2023-12-01T19:35:10Z

We're doing a bug bankrupcy --- please file a new bug or reopen if you see a specific thing we should do.

jsmall-zzz mentioned this issue Nov 10, 2020

Include hierarchy output #1595

Merged

natduca closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2023

natduca added the inactive:dropped closed to clear backlog label Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagnostic Improvements #1600

Diagnostic Improvements #1600

jsmall-zzz commented Nov 9, 2020

natduca commented Dec 1, 2023

Diagnostic Improvements #1600

Diagnostic Improvements #1600

Comments

jsmall-zzz commented Nov 9, 2020

What to do?

Transitory Macro Expansion Source Manager

64 bit source Locs

SourceFile/View for a macro expansion?

Serialization

Other Details

natduca commented Dec 1, 2023