[Request] Online validation #299

Closed
Merg1255 opened this Issue Mar 6, 2012 · 46 comments

4 participants

@Merg1255

Hi! Is there a way a tool that validates xml schemas could be transported to JS?

Thanks

@kripken
Owner

If you find one, and make a little (C/C++) testcase I can work against, I'll try to compile it :)

@Merg1255

well, is php included in emscripten? bit.ly/7BUvST
http://php.net/manual/en/domdocument.schemavalidate.php
the php link also includes some examples code.

@Merg1255

is the above information good for the compilation?
We could search for other libraries if you want.

@jterrace

I'd recommend libxml2

@Merg1255

libxml has not implemented the very feature of this topic, xml schema validation, so i'd suggest working with one of the libraries provided above.

@jterrace

huh? libxml has xml schema validation built into the library. In fact, php's schema validation uses libxml. You should try compiling libxml to JS with emscripten.

@Merg1255

"A partial implementation of XML Schemas Part 1: Structure is being worked on but it would be far too early to make any conformance statement about it at the moment."

does this mean that php's xml validation is "partial" as the above sentence says?..

@Merg1255

it's weired why php uses this library and the website says that sentence. Also, there's this library, http://lxml.de/validation.html#xmlschema

@jterrace

lxml is a python library---which also uses libxml

@Merg1255

yep, i know :) i'm just including it, and let kripken decide which one is good for compilation

@kripken
Owner

I don't know anything about XML libraries ;) If you guys can tell me which is the right one to try to compile that would be better. And also include a testcase.

@jterrace

libxml is definitely the library to use

@Merg1255

yep, use libxml if it's easier for you to work in c. if python is better, use the lxml. :)
here are some pages that you might find helpful:

http://knol2share.blogspot.com/2009/05/validate-xml-against-xsd-in-c.html
http://www.acooke.org/cute/libxml2Cre0.html (includes code from: http://wiki.njh.eu/XML-Schema_validation_with_libxml2)
http://xmlsoft.org/tutorial/xmltutorial.pdf (the sample documents)
http://libxmlplusplus.sourceforge.net/ (points to examples that also catch exceptions)

let me know if you need any hints on xml :)

@kripken
Owner

Well, I built libxml here

https://github.com/kripken/xml.js

and I have

$ ls -al ./libxml2-2.7.8/.libs/libxml2.so.2.7.8 
-rwxrwxr-x 1 alon alon 4510028 2012-03-15 10:33 ./libxml2-2.7.8/.libs/libxml2.so.2.7.8

but I have no idea how to test it...

@jterrace

Check out this page:
http://xmlsoft.org/xmldtd.html

Specifically, the section "How to validate". Maybe you can build the xmllint program?

@Merg1255

hahah kripken I think I understand what you mean :)) no worries, we'll guide you through.

The idea is simple: schema validation means you got an xml file (.xml) and you validate it against a schema file (.xsd most commonly). The schema file is a rules file telling the parser how the xml file should be like, what elements must it have, what attributes, etc. It's like a form of html parsing basically, because html is xml.

Let's see a testcase. Browse here: http://knol2share.blogspot.com/2009/05/validate-xml-against-xsd-in-c.html
Download from the links the files test.xml and test.xsd, these are the xml file and it's schema

Now, the c code needed for it to work is the file xmlvalidation.c, where you can see you set the two files and try to validate them. So I think if you build this c file (which includes a header file found here https://github.com/kripken/xml.js/tree/master/libxml2-2.7.8/include/libxml), we will be able to test this case in JS and html.

EDIT: the same page mentions an alternative way:
FYI
you can also validate XML against XSD using the following command:
xmllint --noout --schema test.xsd test.xml

@Merg1255

here's the list of many xml files and their corresponding xsds: https://github.com/kripken/xml.js/tree/master/libxml2-2.7.8/test/schemas

also, this is a tiny python example: https://github.com/kripken/xml.js/blob/master/libxml2-2.7.8/python/tests/schema.py
so, that's what we expect from a validator, set the two files and if there are any errors, display the corresponding lines of the xml file and the message of the validator.

let us know for any news.

@kripken
Owner

Is all you want here a "yes/no" answer whether some XML validates against a schema? Should be simple to make that interface accessible to JS. Or is there some more complex API that is necessary?

@Merg1255

well, even having a "no/yes" is good. But the ideal would be to be able to display where the error has occured, like in this website: http://www.corefiling.com/opensource/schemaValidate.html . You can check how this system works if you take a sample xml file and its xsd validation file (from the examples above), and you modify the xml file so that you can see where the error in validation occured.

If that helps you, php which uses libxml can easily do this as shown here: http://www.php.net/manual/en/domdocument.schemavalidate.php#62032.

@kripken
Owner

I compiled xmllint. Looks like it works:

$ js xmllint.js --noout --schema test.xsd test.xml
test.xml validates
$ js xmllint.js --noout --schema test.xsd test.xsd
test.xsd:1: element schema: Schemas validity error : Element '{http://www.w3.org/2001/XMLSchema}schema': No matching global declaration available for the validation root.
test.xsd fails to validate

Validating is expected to fail in the second case because we load the schema and try to validate it. But this shows you can get error messages too.

Looks like all that is left is to make a nice JS wrapper around this. Anyone want to try?

@Merg1255

me! me! :-)
just give me a little first steps how to.

@kripken
Owner

Basically, we would want to write a nice JS wrapper that, when called with a schema and an xml file as JS strings,

  1. Sets those files' data into the emscripten filesystem for test.xml and test.xsd. See the Tutorial bit on files and https://github.com/kripken/emscripten/wiki/Filesystem-Guide
  2. Set Module.print to a function that saves the print output for us.
  3. Run the command.
  4. Analyze the print output we saved. If it says it validated, return true. Otherwise parse the errors a bit.

Overall it would look like

function validateXML(xml, schema) {
  var Module = {
    preRun: function() {
      // set up files
    },
    print: function() {
      // set up print capture
    }
  };

  [.. the compiled xmllint..]

  // code to process the captured output
}

You can see examples of all those things in tests/runner.py, for example search for preRun to see how it sets up files etc.

@kripken
Owner

We would also need to set Module.arguments to ['--noout', '--schema', 'test.xsd', 'test.xsd'] - so it works exactly as if run on the commandline.

@Merg1255

kripken, i'm getting the error, FS.CreateDataFile is not a function. Actually, i've set to output to the console, like this:

    function validateXML(xml, schema) {
          var Module = {
            preRun: function() {
              FS.createDataFile('/', 'test.xml', xml, true, true);
              FS.createDataFile('/', 'test.xsd', schema, true, true);
            },
            arguments: ['--noout', '--schema', 'test.xsd', 'test.xml'],
            print: function(text) {
                console.log(text);
            }
          };
        }

but whenever I include the compiled xmllint, either with a src script or inline, it just executes and outputs the default xmllink to the console. It's like the preferences set are not set. What should I change?

@kripken
Owner

That looks ok. Maybe some other problem in the entire file, can you link to it?

@Merg1255

here it is: https://gist.github.com/47804b9a6b6ea7d417f0
note that i've moved a line from xmllint.js so that we dont have to place the entire compiled code inside our script. With this config, it should be able to run and notify us about the validation.

In the command console, you can set Module.arguments, so you can run

gY(Module.arguments = ['--version']);

which will show the libxml version. But it doesnt seem to be getting other arguments, plus FS.createDataFile is undefined.
Should the code be able to run with cwrap? it's a bit strange.

@kripken
Owner

(Hmm it might be easier to do this a non-closure compiled/minified version.)

createDataFile requires binary data, that is, an array of values in 0-255, and not a JS string. cwrap fixes that, so it would help here, but it wraps compiled C code and not a new function as here, so I don't think it would work.

Edit: Looks like we already expose Module.intArrayFromString, which converts a string to an array of numbers. So using that here should work.

@Merg1255

the thing about createDataFile is not that it doesnt handle the input you give it, but it doesnt exist inside the code. same thing for Module.intArrayFromString (is it inside the code?). Any suggestions on how to proceed?..

@kripken
Owner

Closure minifies the names. We can work with a non-closure build here, that would be simpler. Is that good enough for what you want for xml.js? (closure makes the code smaller and faster)

@Merg1255

sure :-) just let us have a normal version, so we know which functions we are working at, and then we might use a js compression tool to minify it, after we'll make it work.

isn't it strange though that the minified version can accept one argument, but not more? i'd like to see what the normal version will do.

@Merg1255

after a few more checks in the minified version, seems like if we add more parameters it does pass a soft of argument, but it says that this option is not defined (xmllint output). any news on this?

@kripken
Owner

Minification stuff is sometimes annoying... had to debug a small issue with it now. But I pushed a working optimized build now to xml.js. To test it, open test.html there.

Currently it just returns the text that xmllint output. We should parse the output a bit, if it says it validated we should return true, otherwise a list of errors or an exception. Maybe we should return an object with properties succeeded yes/no, and errors if they existed? Pull requests to xml.js welcome.

@Merg1255

marvelous, i've checked it with some complex files and it catches all exceptions. Great work kripken!! :-)
the minified xml.js file is 2mb, and if zipped it's 500+kb, it will be great when your other project about opening zip files would be ready. ;)

one question: what is pre.js?

@kripken
Owner

pre.js is some code that needs to be optimized with the compiled code (used during building, so you can ignore it for just using the final JS file). I'll write a blogpost about this tomorrow to explain more.

About compression,

https://github.com/kripken/lzma.js

already works and gives better compression than gzip. It's GPL though, so I'll probably do zlib too eventually.

@Merg1255

i've actually moved the Module object outside of xmllint.js so it's easier to just pass an object to the function and modify it in the html page, so for this project embedding pre.js might not be needed (it works fine as it is).

about compression,

  • a workers version might be better, at least for compression
  • could you add an example of accessing data inside a zip file?

thanks!

@kripken
Owner

How does using a zip file work with libxml? It required libz to build. Maybe it already works? Or is a different API call needed? Alternatively you can just unzip in JS and send that to xml.js.

@Merg1255

hm... i'll try some things about it and let you know.

@syssgx

hello everyone! kripken I saw your twitter post today and found this tool that I had been looking for on the net :)
good job to all participants, this is a nice project.

hey, I can create a wrapper for this one, for the github pages. Would you be interested in it?

syssgx

@Merg1255

yes! since i'm busy with other things right now, that would be great!

@syssgx

I have a working site ready, it's almost complete with UI interface. Shall I make a pull request for it kripken?

@syssgx

Hi kripken!
A JS xml validation system is online here: http://syssgx.github.com/xml.js/
a few more things to change and it's ready.
let me know your views. :)

syssgx

@kripken
Owner

Works great, @syssgx ! Nice work :) Adding a link from the project page.

@syssgx

thank you @kripken :)

@syssgx

A new modified version has come online. If you upload on xml.js the customized file, there is now the option to set filenames for the file system.

@kripken
Owner

Very nice :)

Any chance you'd want to make a nice demo page for some other emscripten stuff? All the demo pages I made are ugly, for example http://syntensity.com/static/sql.html ;)

@syssgx

Thanks for the suggestion kripken :)
The thing is that there are currently some things I have to do, related to xml.js. I'll let you know if it's possible.

syssgx

@Merg1255 Merg1255 closed this Apr 1, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment