Replies: 3 comments 12 replies
-
Thanks, @yassere for writing all these questions down. I've mainly taken a look at Roslyn and Swift and want to give examples of their APIs. This might become useful when discussing the different approaches. Roslyn / Swift APIsQuery basedprivate static void AnalyzeTree(SyntaxTreeAnalysisContext context)
{
var tree = context.Tree;
var root = tree.GetRoot(context.CancellationToken);
foreach (var ifStmt in root.DescendantNodes().OfType<IfStatementSyntax>())
{
if (ifStmt.Statement is EmptyStatementSyntax)
{
context.ReportDiagnostic(Diagnostic.Create(Rule, ifStmt.GetLocation()));
}
}
}
VisitorRead-only tree visitor public class Test: CSharpSyntaxVisitor<Signal>
{
public override Signal VisitIfStatement(IfStatementSyntax node)
{
base.VisitIfStatement(node);
if (node.Statement is EmptyStatementSyntax)
{
return Signal.REPORT;
}
return Signal.CONTINUE;
}
}
Open Questions
RewriterCreates a new tree private class Rewriter : CSharpSyntaxRewriter
{
private readonly SemanticDocument _document;
private readonly Func<SyntaxNode, bool> _predicate;
private readonly CancellationToken _cancellationToken;
public Rewriter(
SemanticDocument document,
Func<SyntaxNode, bool> predicate,
CancellationToken cancellationToken)
{
_document = document;
_predicate = predicate;
_cancellationToken = cancellationToken;
}
private ExpressionSyntax SimplifyInvocation(InvocationExpressionSyntax invocation)
{
var expression = invocation.Expression;
if (expression is MemberAccessExpressionSyntax memberAccess)
{
var symbolMap = SemanticMap.From(_document.SemanticModel, memberAccess.Expression, _cancellationToken);
var anySideEffects = symbolMap.AllReferencedSymbols.Any(s =>
s.Kind is SymbolKind.Method or SymbolKind.Property);
if (anySideEffects)
{
var annotation = WarningAnnotation.Create("Warning: Expression may have side effects. Code meaning may change.");
expression = expression.ReplaceNode(memberAccess.Expression, memberAccess.Expression.WithAdditionalAnnotations(annotation));
}
}
return expression.Parenthesize()
.WithAdditionalAnnotations(Formatter.Annotation);
}
public override SyntaxNode VisitSimpleLambdaExpression(SimpleLambdaExpressionSyntax node)
{
if (CanSimplify(_document, node, _cancellationToken))
{
var invocation = TryGetInvocationExpression(node.Body);
if (invocation != null)
{
return SimplifyInvocation(invocation);
}
}
return base.VisitSimpleLambdaExpression(node);
}
}
Personal PreferenceI personally very much like Roslyn's "query based" API because I'm not very used to writing and thinking in visitors and it often requires storing state on the visitor, not allowing you to keep all state local to your function. I also very much like that the Rewritter isn't changing the tree in place but instead produces a new one. It avoids many of the visibility issues (part of because it's visiting depth-first). |
Beta Was this translation helpful? Give feedback.
-
ValidationRoslyn uses a generated
public override SyntaxNode? VisitAliasQualifiedName(AliasQualifiedNameSyntax node)
=> node.Update((IdentifierNameSyntax?)Visit(node.Alias) ?? throw new ArgumentNullException("alias"), VisitToken(node.ColonColonToken), (SimpleNameSyntax?)Visit(node.Name) ?? throw new ArgumentNullException("name")); |
Beta Was this translation helpful? Give feedback.
-
If we're considering not using the mutable API of rowan, we may want to have something like SyntaxAnnotation and TrackNodes from Roslyn so we can have a way to keep track of specific nodes across transformations. |
Beta Was this translation helpful? Give feedback.
-
These are some notes and open questions regarding syntax tree transforms and designing the infrastructure equivalent to the original Rome compiler. In building the new higher-performance version of Rome, we may choose to make different tradeoffs compared to the original version (referred to below as RomeJS). For context, here are some (simplified) characteristics of how syntax tree transformations work in RomeJS:
path.parent
.ancestryPaths
but the nodes in those paths might be stale and are not necessarily ancestors of the current node. In other words,path.node
might not belong to the same tree aspath.ancestryPaths[1].node
. This is a drawback of the prior optimization.For the new version of Rome, we've chosen to use a concrete syntax tree (CST) which losslessly represents source code and allows for fine-grained control of syntax transformations. For operations where a CST would be cumbersome, the syntax tree could be lowered into a more suitable intermediate representation (IR).
The CST is built using a forked version of
rowan
which has a convenient traversal API with methods likeparent()
,ancestors()
,siblings()
, andchildren()
available directly on syntax nodes. The entire CST is reachable from any node. This property relies on having parent pointers on syntax nodes.Questions
What sort of transforms should operate on a CST?
When should the compiler perform additional traversals?
Should the compiler call multiple independent visitors during a single traversal?
During traversal, should visitors receive a live view of the syntax tree?
ancestryPaths
) is passed to subsequent visitors during the same traversal.How should visitors report diagnostics and perform transformations?
Should lint rules have a way to avoid computing auto-fixes when they aren't necessary?
Should it be possible for a single transform to replace multiple independent nodes?
Do CST transforms need to retain source mapping information?
Should there be a way for a visitor to signal that it doesn't care about certain subtrees?
Should there be a lint rule API that isn't based on visiting nodes?
If, when, and how to validate transformed syntax trees?
This is not a comprehensive document. There are certainly more questions to discuss, and it's likely we'll want to use code examples to explore these questions.
Beta Was this translation helpful? Give feedback.
All reactions