Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performances? #172

Closed
fpservant opened this issue Apr 10, 2016 · 5 comments
Closed

Performances? #172

fpservant opened this issue Apr 10, 2016 · 5 comments

Comments

@fpservant
Copy link

Hi,
working with Jena, I ran some tests to compare the performances when serializing RDF data. As it turns out, JSON-LD serialization seems to be rather slow - ~ 20 times slower than turtle.

Here are my results, outputing a graph of 8000 triples (several runs, after warming up everything, writing to /dev/null):
JSON-LD/pretty : 237 ms
JSON-LD/flat : 225 ms
RDF/XML/pretty : 104 ms
RDF/XML/plain : 54 ms
Turtle/blocks : 11 ms
Turtle/flat : 13 ms
Turtle/pretty : 11 ms
N-Triples/utf-8 : 6 ms

The backend of my service uses RDF (Jena). Almost a quarter of a second to return a typical result is too slow.

Are there ways to improve it ?

@ansell
Copy link
Member

ansell commented Apr 23, 2016

If you could write a test directly using the JSONLD-Java APIs I could help.

The numbers from my testing over 8000 synthetic/random triples are roughly the following, which seem to roughly match up with your numbers:

  • RDF triples to JSON-LD in-memory (RDFDataset -> Map<String, Object>) : 60ms
  • JSON-LD in-memory to String (non-pretty-print): 30ms
  • JSON-LD in-memory to String (pretty-print): 40ms
  • JSON-LD expansion (Map<String, Object> to Map<String, Object>): 60ms

Testing compaction is difficult without concrete test data from you. Ie, the JSON-LD context you are using and the exact triples.

There may be ways of improving performance slightly, but in the end, the complexity of the JSON-LD algorithms may still limit performance. By comparison, there are no algorithms/transformations involved in any of the other RDF serialisations, other than having to have everything in memory to pretty-print turtle/RDF-XML, and you should use them in preference to JSON-LD if you are concerned about performance and you need to serialise arbitrary RDF triples.

If you have a known schema, you could generate JSON (that is valid JSON-LD) directly using the Jackson (or other JSON library) APIs much faster by hand. The complexity is all in the JSON-LD API's themselves so avoiding them will help. One major difficulty with JSON is the requirement that everything be in-memory before serialising if you want concise documents, which is the reason that JSON is only really used for API results that tend to be very small and others uses streaming-friendly formats.

@ansell ansell closed this as completed Apr 23, 2016
@fpservant
Copy link
Author

fpservant commented May 2, 2016

Hi,
Thanks for the reply. I made some more tests, measuring the times for the different kinds of outputs. I think that I have found some interesting results. It happens that the results vary widely depending on the content of the file: the file I have been working with has something special that explains a large part of the performance problem I have pointed. I think that a change can be made in the code that would solve it.

With a file similar to the one I was using in my previous experiment (and that I enclose here, "slow.jsonld"),

jsonldperfs.zip

I get the following results (export from jena, with a small hack to choose the output form)

model.size() 7559
JSON-LD/pretty EXPANDED:162 ms
JSON-LD/pretty COMPACTED:186 ms
JSON-LD/pretty FLATTENED:339 ms
JSON-LD/flat EXPANDED:162 ms
JSON-LD/flat COMPACTED:184 ms
JSON-LD/flat FLATTENED:337 ms
RDF/XML/pretty:89 ms
RDF/XML/plain:51 ms
Turtle/blocks:8 ms
Turtle/flat:12 ms
Turtle/pretty:9 ms
N-Triples/utf-8:2 ms

again, a very big difference with performances in turtle, but the point is that the relative time of compaction is not the important factor here as, if I understand correctly, the expanded output format corresponds to the basic one, and compacting begins with the same operations as the ones done for the expanded format.

But I also noticed that with other files, results are very different - much better. I investigated the differences, and I found that the factor that slows things done in the previous file is related to the fact that several nodes have a property which has a lot of values. I removed these statements from the file, and here is what I got:

model.size() 1177
JSON-LD/pretty EXPANDED:4 ms
JSON-LD/pretty COMPACTED:12 ms
JSON-LD/pretty FLATTENED:15 ms
JSON-LD/flat EXPANDED:4 ms
JSON-LD/flat COMPACTED:11 ms
JSON-LD/flat FLATTENED:15 ms
RDF/XML/pretty:21 ms
RDF/XML/plain:11 ms
Turtle/blocks:1 ms
Turtle/flat:2 ms
Turtle/pretty:3 ms
N-Triples/utf-8:1 ms

wow, that is fast! Of course, there are a lot less triples (1177 vs 7559), but the gain in times is clearly not proportional to the reduction in size: were it the case, the time for the EXPANDED should be ~25 ms, not just 4ms!

So I suspected that this could be related to some iteration over a list, and I found where this happens in the code: JsonLdAPI.fromRDF, at line 1857:

                // 3.5.6+7)
                JsonLdUtils.mergeValue(node, predicate, value);

the mergeValue has the following test, to ensure that a given value is not added twice:

        if ("@list".equals(key)
                || (value instanceof Map && ((Map<String, Object>) value).containsKey("@list"))
                || !deepContains(values, value)) {
            values.add(value);

we're in the case where deepContains is called, and deepContains iterate over the items in values. Hence the poor performance with my file. To check it, I modified line 1857 in JsonLdAPI to call a modified version of mergeValue - a laxMergeValue, that doesn't verify whether value is already in values before adding it.

Here is the time that I get with the first file (the big "slow" one):

JSON-LD/pretty EXPANDED:9 ms
JSON-LD/pretty COMPACTED:37 ms
JSON-LD/pretty FLATTENED:193 ms
JSON-LD/flat EXPANDED:8 ms
JSON-LD/flat COMPACTED:31 ms
JSON-LD/flat FLATTENED:187 ms
RDF/XML/pretty:88 ms
RDF/XML/plain:49 ms
Turtle/blocks:9 ms
Turtle/flat:12 ms
Turtle/pretty:10 ms
N-Triples/utf-8:3 ms

same time as turtle for the expanded format !

But it is OK to use this "laxMergeValue" instead of mergeValue at line 1857 of JsonLdAPI? Well, I'll leave it to the persons who knows the code, but I think that it could be the case, as it seems to be about adding the triple

node predicate value

to the list of values of the property predicate of the node subject. Anyway, I am sure that it is possible to fix this iteration over the items of a list, that can have a very negative impact on the performances of the API.

Best Regards,

fps

@ansell
Copy link
Member

ansell commented May 2, 2016

Thanks for looking into it further. I will see what I can do about that check (hopefully without breaking the spec!)

@ansell ansell reopened this May 2, 2016
@ansell
Copy link
Member

ansell commented May 6, 2016

I can't seem to replicate your results locally, as removing the deepContains call doesn't seem to have any effect. I can't remove the entire if statement and just always add values as that breaks at least 19 of the conformance tests. Can you open a pull request with your proposed changes and I will see if I can replicate it with your version of the code.

@ansell
Copy link
Member

ansell commented May 18, 2016

Released jsonld-java-0.8.3 with this fix in it, should be on Maven Central in a few hours

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants