Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnostic Improvements #1600

Closed
jsmall-zzz opened this issue Nov 9, 2020 · 1 comment
Closed

Diagnostic Improvements #1600

jsmall-zzz opened this issue Nov 9, 2020 · 1 comment
Labels
inactive:dropped closed to clear backlog

Comments

@jsmall-zzz
Copy link
Contributor

When a source location is references a token that is not directly part of a input source file, Slang may not give very helpful diagnostics as to the origin of the error. A somewhat contrived example is in token-paste.slang

#define SOME %
#define THING %

#define A SOME
#define B THING

#define PASTE2(x, y) x##y
#define PASTE(x, y) PASTE2(x, y)

PASTE(A, B)

Slang outputs

token paste(1): error 20001: unexpected '%', expected identifier
tests/diagnostics/token-paste-location.slang(10): note: see token pasted location

That line 10 is the line with the ## macro paste.

Clang produces the following ...

<source>:11:1: error: pasting formed '%%', an invalid preprocessing token
PASTE(A, B)
^
<source>:8:21: note: expanded from macro 'PASTE'
#define PASTE(x, y) PASTE2(x, y)
                    ^
<source>:7:23: note: expanded from macro 'PASTE2'
#define PASTE2(x, y) x##y
                      ^
<source>:11:1: error: expected unqualified-id
<source>:8:21: note: expanded from macro 'PASTE'
#define PASTE(x, y) PASTE2(x, y)
                    ^
<source>:7:22: note: expanded from macro 'PASTE2'
#define PASTE2(x, y) x##y
                     ^
<source>:1:14: note: expanded from macro 'x'
#define SOME %

Gcc displays

<source>:1:14: error: pasting "%" and "%" does not give a valid preprocessing token
    1 | #define SOME %
      |              ^
<source>:7:22: note: in definition of macro 'PASTE2'
    7 | #define PASTE2(x, y) x##y
      |                      ^
<source>:11:1: note: in expansion of macro 'PASTE'
   11 | PASTE(A, B)
      | ^~~~~
<source>:4:11: note: in expansion of macro 'SOME'
    4 | #define A SOME
      |           ^~~~
<source>:11:7: note: in expansion of macro 'A'
   11 | PASTE(A, B)
      |       ^
<source>:1:14: error: expected unqualified-id before '%' token
    1 | #define SOME %
      |              ^
<source>:7:22: note: in definition of macro 'PASTE2'
    7 | #define PASTE2(x, y) x##y
      |                      ^
<source>:11:1: note: in expansion of macro 'PASTE'
   11 | PASTE(A, B)
      | ^~~~~
<source>:4:11: note: in expansion of macro 'SOME'
    4 | #define A SOME
      |           ^~~~
<source>:11:7: note: in expansion of macro 'A'
   11 | PASTE(A, B)
      |       ^

These errors are better at explaining what the error is, as well as how it happened through various macro applications.

There's a few things that we could improve in Slang

  • As well as listing the line number, we could list the contents of the line.
    ** Additionally underneath the line, we could display a caret (^) showing where in the line the location is
  • The macro paste must produce a valid token which %% isn't - Clang states this, Slang says 'unexpected %, expected identifier' - which doesn't seem quite right even
  • After the error Clang lists the macro expansions that led to the problem - Slang can only go as far as noting where the ## occurred in the source, which may not be very relevant to the specific problem.

So to number the suggested improvements

  1. List the source line underneath the location, along with a caret. It might be nice to have an option to limit the line length, so if the line is too long it only displays the part around where the error occurred. If we had this feature, the max length should be configurable, and switch off-able.
  2. Improvements around reporting of problems with a macro paste - might be as simple as better identifying the issue here
  3. Tracking and display of macro expansion

The first two points are relatively simple to implement. The 3rd point probably deserves some discussion.

The most obvious first stab at implementing the macro expansion feature would be for each macro expansion, to write the tokens from the macro into a SourceFile, and then create a SourceView that uses it. Then use the SourceLocs for the start of each token to identify each token in the output. That stored in the SourceFile there could be a reference to macro that was expanded. Moreover this could also contain information about what tokens are from the macro, and what are from which parameter. There might also require some mechanism to indicate what tokens are due to a paste, and if so of what.

This might be a reasonable first stab at this feature, but a few observations

  1. Every macro expansion will require multiple objects and allocations
  • At a minimum a SourceView, SourceFile, the buffer holding all the tokens, and other buffers to describe the token origin, source locations
  1. Every parameter token, will need to hold the SourceLoc of it's origin
  2. The SourceManager system arguably assumes some kind of immutability of the source

On point 3 SourceLocs are currently 32 bits. That every expansion will consume a new range of SourceLocs.

For a compilation when using a source file for a request, we could check to see if the timestamp had changed and reload it (and use new SourceLocs). Currently we don't do this - but it might be better overall behavior.

That a SourceManager is associated with a Linkage. That a Linkage may be involved with multiple compilations. That even though the Source may be static - the macro expansions cannot be assumed to be, and so each CompileRequest will expand macros and consume SourceLocs.

What to do?

Transitory Macro Expansion Source Manager

When doing a request, we could create a SourceManager that parented the manger that held source. The idea being that after the compilation we throw away this 'macro expansion' SourceManager. This doesn't quite work though, because if new Source is loaded, it can only be loaded into the 'macro expansion' SourceManager to avoid a clash of SourceLoc runs, and so would repeatedly reload 'static' source.

This could be worked around by having the macro expansion SourceManager take a range that we define cannot be used by it's parent 'source' manager. We could do this by say making 31 bits available of SourceLoc for Source and macro expansion, and use the top bit to determine which is which.

64 bit source Locs

Another tack would be to use 64 bit SourceLoc and perhaps 'not worry' - as such a clash then becomes very unlikely in most typical usage scenarios. Also whilst it is attractive from a simplicity point of view, all of the 'unused' macro expansions in effect become garbage. That garbage not only consumes memory, but increasing compute in terms of remapping back to Source. This is leaving aside the extra memory consumption of doubling the SourceLoc size.

For both these reasons it seems worth the extra effort to avoid this expansion.

SourceFile/View for a macro expansion?

Not only does this require 2 objects when ideally one would suffice, neither of them are really set up to hold the information desired - the macro used, how each token is produced, the SourceLocs of their source etc.

Therefore it may make sense to have a base class that say 'SourceView' and 'MacroExpansionView' (or whatever) derive from.

Serialization

There are already limitation around how SourceLocs are handled with serialization. That during serialization the easiest thing to do would be to replace any 'transitory' SourceLocs (ie from macro pasting) with the nearest source file SourceLoc. If we use bit 31 to identify 'transitory' or actual source, we have a fast way to determine these locations.

Other Details

We might not have to use up a SourceLoc range that is the size of all the text of the tokens concatenated, perhaps we use a SourceLoc range which uses one value for each Token. Doing so provides a more simple mapping from the SourceLoc to say the information about the token (it's then just a simple index).

@natduca
Copy link

natduca commented Dec 1, 2023

We're doing a bug bankrupcy --- please file a new bug or reopen if you see a specific thing we should do.

@natduca natduca closed this as not planned Won't fix, can't repro, duplicate, stale Dec 1, 2023
@natduca natduca added the inactive:dropped closed to clear backlog label Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive:dropped closed to clear backlog
Projects
None yet
Development

No branches or pull requests

2 participants