Speed improvements for big json with many nested levels (my json is 500 000 character) #131

Closed
sekretar opened this Issue Apr 1, 2015 · 33 comments

Projects

None yet

3 participants

@sekretar
sekretar commented Apr 1, 2015

I can't find any node.js examples how to use this module "async" ?
I see in code that async is supported but can't find way to use it.

Any tips?

@edi9999
Member
edi9999 commented Apr 1, 2015

The docxtemplater specific code should be fast enough to not need to be async.

However, loading a zip in the browser can be slow on a browser, and the library docxtemplater depends on will have the possibility to be used asynchronously in future versions. (see Stuk/jszip#195)

I will update docxtemplater at the time that jszip updates this

@edi9999 edi9999 closed this Apr 1, 2015
@edi9999
Member
edi9999 commented Apr 1, 2015

In the code sample I give in the readme:

fs=require('fs')
Docxtemplater = require('docxtemplater');

//This can be done asynchronously
content = fs
    .readFileSync(__dirname+"/input.docx","binary")

//This will be async in future versions (eg probably Docxtemplater.load)
doc=new Docxtemplater(content);

//This will stay sync
doc.setData({
    "first_name":"Hipp",
    "last_name":"Edgar",
    "phone":"0652455478",
    "description":"New Website"
});

//This will very probably stay sync
//apply them (replace all occurences of {first_name} by Hipp, ...)
doc.render();


var buf = doc.getZip() //getZip returns an instance of jszip, so if jszip supplies a method to generate Asynchronously, this code will be async
             .generate({type:"nodebuffer"});

// This can be made async
fs.writeFileSync(__dirname+"/output.docx",buf);
@sekretar
sekretar commented Apr 2, 2015

Thank you Ed.,

It takes 2-3 minutes to generate my docx file (almost 600 pages).
I have json with almost 500.000 characters and 4 different nested levels. That set of data is run against table (see image)
clipboard01

and this is code I use. I see nothing wrong here

                fs=require("fs");
                Docxtemplater = require("../node_modules/docxtemplater");
                if(exists){
                    content = fs.readFileSync(file,"binary")
                }
                else{
                    content = fs.readFileSync(defaultTemplate,"binary")
                }
                doc=new Docxtemplater(content);
                doc.setData(jDataset2);
                doc.render();
                var buf = doc.getZip().generate({type:"nodebuffer"});
                fs.writeFileSync(req.session.fileUploadPath + "policyprint.docx",buf); 

Question:

Maybe I can render this on client? Not on server side? Any suggestions?

@edi9999
Member
edi9999 commented Apr 3, 2015

Ok, I suggest you to measure the time of all method calls you do. Something like:

Now = new Date()

Fs.readfilesync ...

Console. Log ( Now - new Date())

Can you then post the results to see what method takes how much time.

Also, is your docx only text or also other media (images , ...)

@sekretar
sekretar commented Apr 8, 2015

Hi,
I didn't measure time, but it is 2-3 minutes for sure. Anyway same thing happens when I switch to render this on webpage (client side).

And no, there is no media files.

@edi9999 edi9999 added the question label Apr 8, 2015
@edi9999
Member
edi9999 commented Apr 8, 2015

What I meant in my last post was to ask you to measure how long each line of code is taking. I strongly suspect jszip and fs.readfilesync / writefilesync to be the most time consuming.

The setTags method will be instant, and I suspect render to be relatively fast no matter the docx size

@sekretar
sekretar commented Apr 9, 2015

OK, this is my JS code and I've added some measurement. Results are after //

I got almost same issue with nodejs

$.post('/xxx_SoaExport',
null,
function(result){
//show JSON length
console.log((JSON.stringify(result)).length) //RESULT: 457596
//set data
var start = new Date().getTime();
doc.setData(result);
var end = new Date().getTime();
var time = end - start;
console.log('doc.setData(result): ' + time); //RESULT: doc.setData(result): 0
//render
start = new Date().getTime();
doc.render();
end = new Date().getTime();
time = end - start;
console.log('doc.render: ' + time); //RESULT: doc.render: 139060
//get zip
start = new Date().getTime();
out=doc.getZip().generate({type:"blob"})
end = new Date().getTime();
time = end - start;
console.log('out=doc.getZip().generate: ' + time); //RESULT: out=doc.getZip().generate: 1749
saveAs(out,"output.docx");
commonFunctions.hideLoaderDivSecond();
}
);

@edi9999 edi9999 reopened this Apr 9, 2015
@edi9999
Member
edi9999 commented Apr 9, 2015

Ok, I didn't expect that the render method took 139 seconds.

Can you send me your docx via email ( on my profile) so that I can see what is taking so much time ?

@sekretar
sekretar commented Apr 9, 2015

I've send you piece of JSON and my docx template.

@edi9999
Member
edi9999 commented Apr 13, 2015

Hi, I have tried to find out where the slow code might be situated using dtrace http://blog.nodejs.org/2012/04/25/profiling-node-js/

But it seems dtrace is not working on my machine. I have tried other methods without any success.

I thought the bottleneck was the creation of the subxmltemplater (when they are loops, many instances of xmltemplater are created, where we could reuse the instance multiple times as it has the same content , but just different tags). However, this didn't have any impact on the time of creation for your example so for now, I just don't know any further.

@sekretar

OK Ed,

thank you for your time.

I will still use this :)

@edi9999 edi9999 changed the title from async example to Speed improvements for big json with many nested levels (my json is 500 000 character) Apr 22, 2015
@edi9999 edi9999 added the enhancement label Jul 27, 2015
@edi9999
Member
edi9999 commented Nov 11, 2015

docxtemplater v1.1.0 has been released with some speed improvements (here is the full changelog : https://github.com/open-xml-templating/docxtemplater/blob/master/CHANGELOG.md)

Can you update and tell if it works faster now ?

@edi9999 edi9999 closed this Nov 15, 2015
@sculver-affirma

I am experiencing the exact same issue, looks like any updates since have not fixed this issue with large JSON Objects. Unfortunately render() is taking forever and sometimes doesn't finish; it also freezes the browser UI

@edi9999 edi9999 reopened this Mar 3, 2016
@sculver-affirma

json-test.txt
The attatched JSON object (3505 lines) is taking about 4 minutes to render(). Also attached the Template file.
template.docx

@edi9999
Member
edi9999 commented Mar 3, 2016

Thanks for sending a sample, I will reproduce it to find the source of the latency.

I would first like to find out if the rendering is linear for the size of the input ( if that is not the case, they is still a bug). If the algorithm is indeed linear, it will be time to do a compilation step before the rendering of the document (which is the way I saw the library going in the long term anyway)

@edi9999
Member
edi9999 commented Mar 3, 2016

Are you using version 2.0 ?

@sculver-affirma

Thanks. Yes version 2.0

@edi9999
Member
edi9999 commented Mar 4, 2016

The render function takes 4 seconds with node on my machine. However, I see big ram consumption (700mb). Maybe the ram consumption explains the extreme slowness in the browser, since they are limited by the process.

The loop time seems to be non linear after testing multiple values, I'll see what can be done

@sculver-affirma

I'm using IE11 with 8GB RAM. Its been tested on 4 different machines and all are experiencing the same issue.

@edi9999
Member
edi9999 commented Mar 4, 2016

Just for testing, can you try in another browser like Firefox or chrome ?

@sculver-affirma

Just tested it in Chrome; takes just a few seconds. IE 10 and IE 11 taking minutes.

@sculver-affirma

Should the script be compatible with IE?

@edi9999
Member
edi9999 commented Mar 11, 2016

What do you mean ?

@sculver-affirma

It seems the issue is only with Internet Explorer, and since many users use IE it seems like a bug. I also tried in IE Edge and it takes minutes to render

@edi9999
Member
edi9999 commented Mar 12, 2016

No, the issue is for all js engines, node, chrome Firefox,... The problem is just more visible in IE.

Anyway, I think I found the bottleneck, I will publish a new version over the weekend

@edi9999
Member
edi9999 commented Mar 12, 2016

I created a new version, 2.1. It should be fixed now.

@edi9999
Member
edi9999 commented Mar 15, 2016

Has this solved your issue @sculver-affirma ?

@sculver-affirma

We've scheduled to test the new script today. Will let you know. thanks.

@sculver-affirma

Looks like that did the trick; much faster now. thank you for your help.

@edi9999
Member
edi9999 commented Mar 16, 2016

Just to know : how much faster ? I'd like to write a blog article about the optimisations.

@sculver-affirma

Before it was taking between 1-5 minutes to render depending on size of the data. Now it take about 3-10 seconds

@edi9999
Member
edi9999 commented Jul 3, 2016

I wrote a blog post about the speed gain : http://javascript-ninja.fr/optimizing-speed-in-node-js/

@edi9999 edi9999 closed this Jul 3, 2016
@edi9999
Member
edi9999 commented Aug 20, 2016

Hi, I'm asking docxtemplater users to send their docx and data (in an anonymised form) to create integration tests so that these issues don't appear again, can you help ? See #244

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment