-
Notifications
You must be signed in to change notification settings - Fork 3.8k
/
xml-classes
executable file
·326 lines (230 loc) · 13 KB
/
xml-classes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
* XML Classes
** Abstract
XML library is used by several areas of Mono such as ADO.NET and XML
Digital Signature (xmldsig). Here I write about System.Xml.dll and
related tools. This page won't include any classes which are in other
assemblies such as XmlDataDocument.
Note that current corlib has its own XML parser class (Mono.Xml.MiniParser).
Basically System.XML.dll feature is almost finished, so I write this
document mainly for bugs and improvement hints.
** System.Xml namespace
*** Document Object Model (Core)
DOM implementation has finished and our DOM implementation scores better
than MS.NET as to the NIST DOM test results (it is ported by Mainsoft
hackers and in our unit tests).
*** Xml Writer
Here XmlWriter almost equals to XmlTextWriter. If you want to see
another implementation, check XmlNodeWriter.cs and DTMXPathDocumentWriter.cs
in System.XML sources.
XmlTextWriter is completed, though it looks a bit slower than MS.NET (I
tried 1.1).
*** XmlResolver
Currently XmlTextReader uses specified XmlResolver. If nothing was supplied,
then it uses XmlUrlResolver. XmlResolver is used to parse external DTD,
importing XSL stylesheets and schemas etc.
XmlSecureResolver, which is introduced in MS .NET Framework 1.1 is basically
implemented, but it requires CAS (code access security) feature. We need to
fixup this class after ongoing CAS effort works.
You might also be interested in an improved <a href="http://codeblogs.ximian.com/blogs/benm/archives/000039.html">XmlCachingResolver</a> by Ben Maurer.
If even one time download is not acceptable, you can use <a href="http://primates.ximian.com/~atsushi/XmlStoredResolver.cs">this one</a>.
*** XmlNameTable
NameTable itself is implemented. It should be actually used in several
classes. Currently it makes sense if compared names are both in the table,
they should be simply compared using ReferenceEquals(). We have done where
it seems possible e.g. in XmlNamespaceManager (in .NET 2.0 methods; if the
build is not NET_2_0, it will be used internally).
NameTable also needs performance improvement. Optimization hackings are
welcome.
*** Xml Stream Reader
When we are using ASCII document, we don't care which encoding we are using.
However, XmlTextReader must be aware of the specified encoding in XML
declaration. So we have internal XmlStreamReader class (and currently
XmlInputStream class. This may disappear since XmlStreamReader is enough to
handle this problem).
However, there used to be some problems in these classes on reading network
stream (especially on Linux). However, this might be already fixed with
some network stream bugfixes.
*** XML Reader
XmlTextReader, XmlNodeReader and XmlValidatingReader are almost finished.
<ul>
* All OASIS conformance test passes as Microsoft does. Some
W3C tests fail, but it looks better.
* Entity expansion and its well-formedness check is incomplete.
It incorrectly allows divided content models. It incorrectly
treats its Base URI, so some dtd fails.
* I won't add any XDR support on XmlValidatingReader. (I haven't
ever seen XDR used other than Microsoft's BizTalk Server 2000,
and Now they have 2002 with XML Schema support)
</ul>
XmlTextReader and XmlValidatingReader should be faster than now. Currently
XmlTextReader looks nearly twice as slow as MS.NET, and XmlValidatingReader
(which uses this slow XmlTextReader) looks nearly three times slower. (Note
that XmlValidatingReader won't be slow as itself. It uses schema validating
reader and dtd validating reader.)
**** Some Advantages
The design of Mono's XmlValidatingReader is radically different from
that of Microsoft's implementation. Under MS.NET, DTD content validation
engine is in fact simple replacement of XML Schema validation engine.
Mono's DTD validation is designed fully separate and does validation
as normal XML parser does. For example, Mono allows non-deterministic DTD.
Another advantage of this XmlValidatingReader is support for *any* XmlReader.
Microsoft supports only XmlTextReader (this bug will be fixed in VS 2005,
taking shape of XmlFactory).
<del>I added extra support interface named "IHasXmlParserContext", which is
considered in XmlValidatingReader.ResolveEntity(). </del><ins>This is now
made as internal interface.</ins> Microsoft failed to design XmlReader
so that XmlReader cannot be subtree-pluggable (i.e. wrapping use of other
XmlReader) since XmlParserContext shoud be supplied for DTD information
support (e.g. entity references cannot be expanded) and namespace manager.
(In .NET 2.0, Microsoft also supported similar to IHasXmlParserContext,
named IXmlNamespaceResolver, but it still does not provide DTD information.)
We also have RELAX NG validating reader. See mcs/class/Commons.Xml.Relaxng.
** System.Xml.Schema
*** Summary
Basically it is completed. We can compile complex and simple types, refer to
external schemas, extend or restrict other types, or use substitution groups.
You can test how current schema validation engine is complete (incomplete)
by using standalone test module
(see mcs/class/System.XML/Test/System.Xml.Schema/standalone_tests).
At least in my box, msxsdtest fails only 30 cases with bugfixed catalog -
this score is better than that of Microsoft implementation.
*** Schema Object Model
Completed, except for some things to be fixed:
<ul>
* Complete facet support. Currently some of them is missing.
Recently David Sheldon is doing several fixes on them.
* ContentTypeParticle for pointless xs:choice is incomplete
(It is because fixing this arose another bugs in
compilation. Interestingly, MS.NET also fails around here,
so it might be nature of ContentTypeParticle design)
* Some derivation by restriction (DBR) handling is incorrect.
</ul>
*** Validating Reader
XML Schema validation feature is (currently) implemented on
Mono.Xml.Schema.XsdValidatingReader, which is internally used in
XmlValidatingReader.
Basically this is implemented and actually its feature is almost complete,
but I have only did validation feature testing. So we have to write more
tests on properties, methods, and events (validation errors).
** System.Xml.Serialization
Lluis rules ;-)
Well, in fact XmlSerializer is almost finished and is on bugfix phase.
However, we appliciate more tests. Please try
<ul>
* System.Web.Services to invoke SOAP services.
* xsd.exe and wsdl.exe to create classes.
</ul>
And if any problems were found, please file it to bugzilla.
Lluis also built interesting standalone test system placed under
mcs/class/System.Web.Services/Test/standalone.
You might also interested in genxs, which enables you to create custom
XML serializer. This is not included in Microsoft.NET.
See <a
href="http://primates.ximian.com/~lluis/blog/archives/000120.html">here</a>
and manpages for details. Code files are in mcs/tools/genxs.
** System.Xml.XPath and System.Xml.Xsl
There are two XSLT implementations. One and historical implementation is
based on libxslt (aka Unmanaged XSLT). Now we uses fully implemented and
managed XSLT by default. To use Unmanaged XSLT, set MONO_UNMANAGED_XSLT
environment value (any value is acceptable).
As for Managed XSLT, we support msxsl:script.
It would be nice if we can support <a href="http://www.exslt.org/">EXSLT</a>.
<a href="http://msdn.microsoft.com/WebServices/default.aspx?pull=/library/en-us/dnexxml/html/xml05192003.asp">Microsoft has tried to do some of them</a>,
but it is not good code since it depends on internal concrete derivatives of
XPathNodeIterator classes.
In general, .NET's "extension objects" (including msxsl:script) is not
useful to return node-sets (MS XSLT implementation rejects just overriden
XPathNodeIterator, but accepts only their hidden classes. And are the same
in Mono though classes are different), so if we support EXSLT, it has to
be done inside our System.XML.dll. Volunteers are welcome.
Our managed XSLT implementation is slower than MS XSLT for some kind of
stylesheets, and faster for some.
** System.Xml and ADO.NET v2.0
Microsoft released the second beta version of .NET Framework 2.0 with
Visual Studio 2005 alpha version. They are only available on MSDN
_subscriber_ download (i.e. it is not publicly downloadable yet). It
contains several new classes.
There are two assemblies related to System.Xml v2.0; System.Xml.dll and
System.Data.SqlXml.dll (here I treat sqlxml.dll as part of System.Xml v2.0,
but note that it is also one of the ADO.NET 2.0 feature). There are several
namespaces such as MS.Internal.Xml and System.Xml. Note that .NET Framework
is pre-release version so that they are subject to change.
System.Xml 2.0 contains several features such as:
<ul>
* new XPathNavigator and XPathDocument
* XML Query
* XmlAdapter
* XSLT IL generator (similar to Apache XSLTC) - it is
internal use
</ul>
Tim Coleman started ADO.NET 2.0 related works. Currently I have no plan to
implement System.Xml v2.0 classes and won't touch with them immediately,
but will start in some months. If any of you wants to try this frontier,
we welcome your effort.
*** New XPathNavigator
System.Xml v2.0 implementation will be started from new XPathDocument and
XPathNavigator implementations (they are called as XPathDocument2 and
XPathNavigator2, and they were very different from existing one). First,
its document structure and basic navigation feature will be implemented.
And next, XPath2 engine should be implemented (XPathNavigator2 looks very
different from XPathNavigator).
There are some trivial tasks such as schema validation (we have
<a href="http://www24.brinkster.com/ginga/XPathDocumentReader.cs.txt">
XPathDocumentReader</a> that just wraps XPathNavigator, and our
XmlValidatingReader can accept any XmlReader).
*** XML Query
XML Query is a new face XML data manipulation language (well, at least new
face in .NET world). It is similar to SQL, but intended to manipulate and to
support XML. It is similar to XSLT, but extended to support new features
such as XML Schema based datatypes.
XML Query implementation can be found mainly in System.Xml.Query and
MS.Internal.Xml.Query namespaces. Note that they are in
System.Data.SqlXml.dll.
MSDN documentation says that there are two kind of API for XML Query: High
Level API and Low Level API. At the time of this beta version, the Low Level
API is described not released yet (though it may be MS.Internal.Xml.*
classes). However, to implement the High Level API, the Low Level API will
be used. They looks to have interesting class structures in MS.Internal.Xml
related stuff, so it would be nice (and I will) start to learn about them.
They looks to have IL generator classes, but it might be difficult to
start from them.
*** System.Data.Mapping
System.Data.Mapping and System.Data.Mapping.RelationalSchema are the
namespaces for mapping support between database and xml. This is at
stubbing phase (incomplete as yet).
*** XmlAdapter
XmlAdapter is used to support XML based query and update using (new)
XPathDocument and XPathNavigator. This class is designed to synthesize
ADO.NET and System.Xml. It connects to databases, and querys data in XML
shape into XPathDocument, using Mapping schema above. This must be
done after several classes such as XPathDocument and MappingSchema.
** Miscellaneous Class Libraries
*** RELAX NG
I implemented an experimental RelaxngValidatingReader. It is still not
complete, for example some simplification stuff (see RELAX NG spec
chapter 4; especially 4.17-19) and some constraints (especially 7.3).
See mcs/class/Commons.Xml.Relaxng/README for details.
It supports custom datatype handling. Right now, you can use XML schema
datatypes ( http://www.w3.org/2001/XMLSchema-datatypes ) as well
as RELAX NG default datatypes (as used in relaxng.rng).
In Commons.Xml.Relaxng.dll, there is also RELAX NG Compact Syntax support.
See Commons.Xml.Relaxng.Rnc.RncParser class.
I am planning improvements (giving more kind error messages, and even
object mapping), but it won't be come true until Mono 1.0 release.
** Tools
*** xsd.exe
See <a href="ado-net.html">ADO.NET page</a>.
Microsoft has another inference class from XmlReader to XmlSchemaCollection
(Microsoft.XsdInference). It may be useful, but it won't be so easy.
** Miscellaneous
*** Mutual assembly dependency
Sometimes I hear complain about System.dll and System.Xml.dll mutual
dependency: System.dll references to System.Xml.dll (e.g.
System.Configuration.ConfigXmlDocument extended from XmlDocument), while
System.Xml.dll vice versa (e.g. XmlUrlResolver.ResolveUri takes System.Uri).
Since they are in public method signatures, so at least we cannot get rid
of these mutual references.
Nowadays System.Xml.dll is built using incomplete System.dll (lacking
System.Xml dependent classes such as ConfigXmlDocument). Full System.dll
is built after System.Xml.dll is done.
Note that you still need System.dll to run mcs.