Skip to content

C# Rosyln

Louis Kueh edited this page Jan 17, 2019 · 4 revisions

Compilers build deep understanding of code they are processing.

However knowledge is unavialble to anyone except the ones who implement.

Increasing reliance on IDE, IntelliSense

  • Find all references
  • Go to definitions

Code analysis to improve code quality, need to access more of deep code knowledge compilers poses.

Roslyn opens up the black box and has access to this knowledge, APIs to use for code related task in tools and aplications.

Opportunities for

  • meta-programming
  • code generation

Compiler pipeline

image

  • Parse - source tokenized and pared into syntax following grammar
  • Declaration Phase - declarations from source and imported meta data analyzed to form symbols
  • Bind Phase - identifies in code are matched to symbols
  • Emit Phase - all information built up by compiler is emitted as assembly

API allows information at each stage in time.

  • The parsing phase is exposed as a syntax tree
  • The declaration phase as a hierarchical symbol table
  • The binding phase as a model that exposes the result of the compiler’s semantic analysis
  • The emit phase as an API that produces IL byte codes.

image

Compiler APIs

Synatic/Semantic information exposed at each phase of the compiler pipeline

Diagnostic APIs

Diagnostics covering syntax, semantic, definite assignment errors. Allows user defined analyzers to be plugged into compilation.

Integrates well with MSBuild/Visual Studio tools.

Workspace API

Organizes information about the project into a single object model. Allows direct access to compiler models without needed to parse fiels or configure/manage project dependencies.

Surfaces common APIS such as Find all references, Formatting, Code generation APIs.

Working with Syntax

Syntax trees represent the lexical and syntatix structure of the source code

  • Allows tools (IDE, analysis tools) to see and process syntactic structure of source code in a users project
  • Enable tools to modify/rearrange source code

Syntax Trees

  • Compilation

  • Code analysis

  • Binding

  • Refactoring All parts of source code is identified and categorized into well known structural language elements.

  • All source information (text, lexicial token, whitespace, errors)

lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an assigned and thus identified meaning).

  • 2 way street. Round trip back to the text it was parsed from. Created equivalent text, and a new tree edits the text.
  • Immutable, Threadsafe.

A syntax tree is a tree data structure, made of nodes, tokens and trivia.

Syntax Nodes

  • Declarations
  • Statements
  • Clauses
  • Expressions

Non terminal, always have children. E.g. BinaryExpressionSyntax node class has three additional properties specific to binary operators: Left, OperatorToken, and Right. The type of Left and Right is ExpressionSyntax, and the type of OperatorToken is SyntaxToken.

Syntax Tokens

Smallest fragments of code.

  • Integer literal token

Syntax Trivia

Insignificant. E.g. whitespace, comments, preprocessor directives.

  • Leading/Trailing Trivia collections.

Clone this wiki locally