/
c-sharp
148 lines (106 loc) · 4.84 KB
/
c-sharp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
* MCS: The Ximian C# compiler
MCS began as an experiment to learn the features of C# by
writing a large C# program. MCS is currently able to parse C#
programs and create an internal tree representation of the
program. MCS can parse itself.
Work is progressing quickly on various fronts in the C#
compiler. Recently I started using the System.Reflection API
to load system type definitions and avoid self-population of
types in the compiler and dropped my internal Type
representation in favor of using the CLI's System.Type.
** Phases of the compiler
The compiler has a number of phases:
<ul>
* Lexical analyzer: hand-coded lexical analyzer that
provides tokens to the parser.
* The Parser: the parser is implemented using Jay (A
Berkeley Yacc port to Java, that I ported to C#).
The parser does minimal work and syntax checking,
and only constructs a parsed tree.
Each language element gets its own class. The code
convention is to use an uppercase name for the
language element. So a C# class and its associated
information is kept in a "Class" class, a "struct"
in a "Struct" class and so on. Statements derive
from the "Statement" class, and Expressions from the
Expr class.
* Parent class resolution: before the actual code
generation, we need to resolve the parents and
interfaces for interface, classe and struct
definitions.
* Semantic analysis: since C# can not resolve in a
top-down pass what identifiers actually mean, we
have to postpone this decision until the above steps
are finished.
* Code generation: nothing done so far, but I do not
expect this to be hard, as I will just use
System.Reflection.Emit to generate the code.
</ul>
<a name="tasks">
** Current pending tasks
Simple tasks:
<ul>
* Array declarations are currently being ignored,
* PInvoke declarations are not supported.
* Pre-processing is not supported.
* Attribute declarations and passing currently ignored.
* Compiler does not pass around line/col information from tokenizer for error reporting.
* Jay does not work correctly with `error'
productions, making parser errors hard to point. It
would be best to port the Bison-To-Java compiler to
become Bison-to-C# compiler (bjepson@oreilly.com
might have more information)
</ul>
Critical tasks:
<ul>
* Resolve "base" classes and "base" interfaces for
classes, structs and interfaces.
Once this is done, we can actually do the semantic
analysis, because otherwise we do not know who our
parents are.
</ul>
Interesting and Fun hacks to the compiler:
<ul>
* Finishing the JB port from Java to C#. If you are
interested in working on this, please contact Brian
Jepson (bjepson at oreilly d-o-t com).
More on JB at: <a href="http://www.cs.colorado.edu/~dennis/software/jb.html">
http://www.cs.colorado.edu/~dennis/software/jb.html</a>
JB will allow us to move from the Berkeley Yacc
based Jay to a Bison-based compiler (better error
reporting and recovery).
* Semantic Analysis: Return path coverage and
initialization before use coverage are two great
features of C# that help reduce the number of bugs
in applications. It is one interesting hack.
* TypeRefManager. This exists currently in its infancy only.
* Enum resolutions: it is another fun hack, as enums can be defined
in terms of themselves (<tt>enum X { a = b + 1, b = 5 }</tt>).
</ul>
** Questions and Answers
Q: Why not write a C# front-end for GCC?
A: I wanted to learn about C#, and this was an exercise in this
task. The resulting compiler is highly object-oriented, which has
lead to a very nice, easy to follow and simple implementation of
the compiler.
I found that the design of this compiler is very similar to
Guavac's implementation.
Targeting the CIL/MSIL byte codes would require to re-architecting
GCC, as GCC is mostly designed to be used for register machines.
The GCC Java engine that generates Java byte codes cheats: it does
not use the GCC backend; it has a special backend just for Java, so
you can not really generate Java bytecodes from the other languages
supported by GCC.
Q: If your C# compiler is written in C#, how do you plan on getting
this working on a non-Microsoft environment.
We will do this through an implementation of the CLI Virtual
Execution System for Unix (our JIT engine).
Q: Do you use Bison?
A: No, currently I am using Jay which is a port of Berkeley Yacc to
Java that I later ported to C#. This means that error recovery is
not as nice as I would like to, and for some reason error
productions are not being caught.
In the future I want to port one of the Bison/Java ports to C# for
the parser.
You might also want to look at the <a href="faq.html#gcc">GCC</a>
section on the main FAQ