New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something is too slow... #11
Comments
Oh, and, just for kicks, the same code implemented in PHP:
is even faster still: 1.703u 0.290s 0:01.99 100.0% 3963+129477k 0+0io 0pf+0w |
The issue is that yajl-tcl takes the result of yajl's parse and produces a straight sort of left-to-right output that looks like map_open map_key glossary map_open map_key title string {example glossary} map_key GlossDiv map_open map_key title string S map_key GlossList map_open map_key GlossEntry map_open map_key ID string SGML map_key SortAs string SGML map_key GlossTerm string {Standard Generalized Markup Language} map_key Acronym string SGML map_key Abbrev string {ISO 8879:1986} map_key GlossDef map_open map_key para string {A meta-markup language, used to create markup languages such as DocBook.} map_key GlossSeeAlso array_open string GML string XML array_close map_close map_key GlossSee string markup map_close map_close map_close map_close map_close yajl::json2dict is written in Tcl and does a considerable amount of manipulation of that parse to produce the dict. That's the source of the slowness. For sure. To speed it up to be reasonably competitive (I don't know if it would be faster or slower than json-c), yajl-tcl would need a new "parse2dict" C method added to yajltcl_yajlObjectObjCmd in generic/yajltcl.c that would direcctly build the dict using the Tcl C calls for manipulating dicts such as Tcl_DictObjPut or Tcl_DictObjPutKeyList. |
Oh, I see. Well, the Pure TCL implementation currently in tcllib has the "excuse" of being, well, pure TCL. But, if compiling is already required for yajl-tcl, then, perhaps, it should be doing everything in C? |
In the interests of benchmarking in the mean time, could you, perhaps, rewrite the Tcl code-snippet I posted to use only the C-methods of yajl-tcl to extract the users-subtree of the parsed JSON? That would make it easier to separate the parsing from dictionary-creating performance... Thanks! |
We originally wrote yajl-tcl to generate JSON quickly. We added parsing later, and the output of the parse is a direct analogue to what's fed to the generator. So the first use of the yajl-tcl parser was to take some JSON that we desired to generate the equivalent of and produce the parse stream that could then be modified to create the matching JSON output with values substituted as desired. So the parser was effectively a tool to help with generation. I am personally not a huge fan of dicts. To me they are often either too much or too little or, sometimes, both. As to whether everything "should" be done in C, it's a function of need, and desire. I'm pretty sure I see how to do it. I'm somewhat interested in doing it to satisfy my curiosity as to how it will turn out. It could, for that matter, be coded to produce a hierarchy of namespaces with arrays as Tcl arrays, which might be kind of cool. If you want to just see how fast the raw parse is, try something like package require yajltcl
yajl create yajlparser
foreach f $argv {
set fd [open $f]
set d [yajlparser parse [read $fd]]
close $fd
} It won't produce a dict but it will produce the left-to-right parse I referred to earlier. |
The yajl::json2dict method was added just to make yajl-tcl be a drop in replacement for applications already using json::json2dict, so its primary goal was just to be interface compatible and faster than that. Making yajl::json2dict even faster by rewriting it in pure C would be an excellent and welcome improvement however. |
All right, well, I added a pure C "parse2dict" method to the yajtcl object. It's on the master branch. I've only tested it a little bit but it produces a character-for-character identical parse of the contents of playpen/foo.json as ::yajl::json2dict. Timing the two routines parsing a variable containing the contents of that file, parse2dict is 38X faster than ::yajl::json2dict. After we gain confidence in the code we can update ::yajl::json2dict to use it. |
The test cases run by tests/dict.tcl show a difference on null values: Expected: moo cow pig oink rabbit null Actual: moo cow pig oink rabbit null Actual2: moo cow pig oink rabbit {{}} Input: {"moo": "cow", "pig": "oink", "rabbit" : null} FAILED |
For speed comparison purposes, here is the relative timing difference of one of my tests: tcllib took 11272571 clicks |
The figures certainly look impressive -- as does the overnight turn-around... Will test here soon. Thank you! |
Hello! I needed to parse a collection of large JSON files and performance of the pure-Tcl json::json2dict was unsatisfactory.
Unaware of yajl-tcl I wrote my own -- which uses json-c for the actual heavy-lifting the way you are using yajl.
Only after I was done did it occur to me to search for existing C-implementations of JSON-parsing -- and I found yours.
I then compared the performance and now have the following numbers. All tests used the following script on 12 JSON-files (total of over 60Mb):
This is, actually, what I needed to do -- extract the "users" part of all JSON-files, collect all such users into an array and print the array at the end.
The performance, as reported by tcsh's time-command (note the times and the memory-use):
As you can see, the json-c based implementation is dramatically faster than both the Pure TCL and the yajl based ones, even if it uses some more memory than the latter. I doubt, there is anything magic about my code -- the performance differences are, likely, attributable to the differences in the underlying JSON-parsers (json-c vs. yajl).
Maybe, json-c is using a hash-table, where yajl (or you?) are using a regular array? This would explain the higher memory use...
In any case, this is something you may wish to investigate closer.
Edit: tests where done with tclsh8.6 as provided by FreeBSD lang/tcl86 port on a FreeBSD-9.2/i386.
The text was updated successfully, but these errors were encountered: