New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performances? #172
Comments
If you could write a test directly using the JSONLD-Java APIs I could help. The numbers from my testing over 8000 synthetic/random triples are roughly the following, which seem to roughly match up with your numbers:
Testing compaction is difficult without concrete test data from you. Ie, the JSON-LD context you are using and the exact triples. There may be ways of improving performance slightly, but in the end, the complexity of the JSON-LD algorithms may still limit performance. By comparison, there are no algorithms/transformations involved in any of the other RDF serialisations, other than having to have everything in memory to pretty-print turtle/RDF-XML, and you should use them in preference to JSON-LD if you are concerned about performance and you need to serialise arbitrary RDF triples. If you have a known schema, you could generate JSON (that is valid JSON-LD) directly using the Jackson (or other JSON library) APIs much faster by hand. The complexity is all in the JSON-LD API's themselves so avoiding them will help. One major difficulty with JSON is the requirement that everything be in-memory before serialising if you want concise documents, which is the reason that JSON is only really used for API results that tend to be very small and others uses streaming-friendly formats. |
Hi, With a file similar to the one I was using in my previous experiment (and that I enclose here, "slow.jsonld"), I get the following results (export from jena, with a small hack to choose the output form) model.size() 7559 again, a very big difference with performances in turtle, but the point is that the relative time of compaction is not the important factor here as, if I understand correctly, the expanded output format corresponds to the basic one, and compacting begins with the same operations as the ones done for the expanded format. But I also noticed that with other files, results are very different - much better. I investigated the differences, and I found that the factor that slows things done in the previous file is related to the fact that several nodes have a property which has a lot of values. I removed these statements from the file, and here is what I got: model.size() 1177 wow, that is fast! Of course, there are a lot less triples (1177 vs 7559), but the gain in times is clearly not proportional to the reduction in size: were it the case, the time for the EXPANDED should be ~25 ms, not just 4ms! So I suspected that this could be related to some iteration over a list, and I found where this happens in the code: JsonLdAPI.fromRDF, at line 1857:
the mergeValue has the following test, to ensure that a given value is not added twice:
we're in the case where deepContains is called, and deepContains iterate over the items in values. Hence the poor performance with my file. To check it, I modified line 1857 in JsonLdAPI to call a modified version of mergeValue - a laxMergeValue, that doesn't verify whether value is already in values before adding it. Here is the time that I get with the first file (the big "slow" one): JSON-LD/pretty EXPANDED:9 ms same time as turtle for the expanded format ! But it is OK to use this "laxMergeValue" instead of mergeValue at line 1857 of JsonLdAPI? Well, I'll leave it to the persons who knows the code, but I think that it could be the case, as it seems to be about adding the triple node predicate value to the list of values of the property predicate of the node subject. Anyway, I am sure that it is possible to fix this iteration over the items of a list, that can have a very negative impact on the performances of the API. Best Regards, fps |
Thanks for looking into it further. I will see what I can do about that check (hopefully without breaking the spec!) |
I can't seem to replicate your results locally, as removing the deepContains call doesn't seem to have any effect. I can't remove the entire if statement and just always add values as that breaks at least 19 of the conformance tests. Can you open a pull request with your proposed changes and I will see if I can replicate it with your version of the code. |
Released jsonld-java-0.8.3 with this fix in it, should be on Maven Central in a few hours |
Hi,
working with Jena, I ran some tests to compare the performances when serializing RDF data. As it turns out, JSON-LD serialization seems to be rather slow - ~ 20 times slower than turtle.
Here are my results, outputing a graph of 8000 triples (several runs, after warming up everything, writing to /dev/null):
JSON-LD/pretty : 237 ms
JSON-LD/flat : 225 ms
RDF/XML/pretty : 104 ms
RDF/XML/plain : 54 ms
Turtle/blocks : 11 ms
Turtle/flat : 13 ms
Turtle/pretty : 11 ms
N-Triples/utf-8 : 6 ms
The backend of my service uses RDF (Jena). Almost a quarter of a second to return a typical result is too slow.
Are there ways to improve it ?
The text was updated successfully, but these errors were encountered: