New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed improvements for big json with many nested levels (my json is 500 000 character) #131

Closed
sekretar opened this Issue Apr 1, 2015 · 33 comments

Comments

Projects
None yet
3 participants
@sekretar

sekretar commented Apr 1, 2015

I can't find any node.js examples how to use this module "async" ?
I see in code that async is supported but can't find way to use it.

Any tips?

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Apr 1, 2015

Member

The docxtemplater specific code should be fast enough to not need to be async.

However, loading a zip in the browser can be slow on a browser, and the library docxtemplater depends on will have the possibility to be used asynchronously in future versions. (see Stuk/jszip#195)

I will update docxtemplater at the time that jszip updates this

Member

edi9999 commented Apr 1, 2015

The docxtemplater specific code should be fast enough to not need to be async.

However, loading a zip in the browser can be slow on a browser, and the library docxtemplater depends on will have the possibility to be used asynchronously in future versions. (see Stuk/jszip#195)

I will update docxtemplater at the time that jszip updates this

@edi9999 edi9999 closed this Apr 1, 2015

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Apr 1, 2015

Member

In the code sample I give in the readme:

fs=require('fs')
Docxtemplater = require('docxtemplater');

//This can be done asynchronously
content = fs
    .readFileSync(__dirname+"/input.docx","binary")

//This will be async in future versions (eg probably Docxtemplater.load)
doc=new Docxtemplater(content);

//This will stay sync
doc.setData({
    "first_name":"Hipp",
    "last_name":"Edgar",
    "phone":"0652455478",
    "description":"New Website"
});

//This will very probably stay sync
//apply them (replace all occurences of {first_name} by Hipp, ...)
doc.render();


var buf = doc.getZip() //getZip returns an instance of jszip, so if jszip supplies a method to generate Asynchronously, this code will be async
             .generate({type:"nodebuffer"});

// This can be made async
fs.writeFileSync(__dirname+"/output.docx",buf);
Member

edi9999 commented Apr 1, 2015

In the code sample I give in the readme:

fs=require('fs')
Docxtemplater = require('docxtemplater');

//This can be done asynchronously
content = fs
    .readFileSync(__dirname+"/input.docx","binary")

//This will be async in future versions (eg probably Docxtemplater.load)
doc=new Docxtemplater(content);

//This will stay sync
doc.setData({
    "first_name":"Hipp",
    "last_name":"Edgar",
    "phone":"0652455478",
    "description":"New Website"
});

//This will very probably stay sync
//apply them (replace all occurences of {first_name} by Hipp, ...)
doc.render();


var buf = doc.getZip() //getZip returns an instance of jszip, so if jszip supplies a method to generate Asynchronously, this code will be async
             .generate({type:"nodebuffer"});

// This can be made async
fs.writeFileSync(__dirname+"/output.docx",buf);
@sekretar

This comment has been minimized.

Show comment
Hide comment
@sekretar

sekretar Apr 2, 2015

Thank you Ed.,

It takes 2-3 minutes to generate my docx file (almost 600 pages).
I have json with almost 500.000 characters and 4 different nested levels. That set of data is run against table (see image)
clipboard01

and this is code I use. I see nothing wrong here

                fs=require("fs");
                Docxtemplater = require("../node_modules/docxtemplater");
                if(exists){
                    content = fs.readFileSync(file,"binary")
                }
                else{
                    content = fs.readFileSync(defaultTemplate,"binary")
                }
                doc=new Docxtemplater(content);
                doc.setData(jDataset2);
                doc.render();
                var buf = doc.getZip().generate({type:"nodebuffer"});
                fs.writeFileSync(req.session.fileUploadPath + "policyprint.docx",buf); 

Question:

Maybe I can render this on client? Not on server side? Any suggestions?

sekretar commented Apr 2, 2015

Thank you Ed.,

It takes 2-3 minutes to generate my docx file (almost 600 pages).
I have json with almost 500.000 characters and 4 different nested levels. That set of data is run against table (see image)
clipboard01

and this is code I use. I see nothing wrong here

                fs=require("fs");
                Docxtemplater = require("../node_modules/docxtemplater");
                if(exists){
                    content = fs.readFileSync(file,"binary")
                }
                else{
                    content = fs.readFileSync(defaultTemplate,"binary")
                }
                doc=new Docxtemplater(content);
                doc.setData(jDataset2);
                doc.render();
                var buf = doc.getZip().generate({type:"nodebuffer"});
                fs.writeFileSync(req.session.fileUploadPath + "policyprint.docx",buf); 

Question:

Maybe I can render this on client? Not on server side? Any suggestions?

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Apr 3, 2015

Member

Ok, I suggest you to measure the time of all method calls you do. Something like:

Now = new Date()

Fs.readfilesync ...

Console. Log ( Now - new Date())

Can you then post the results to see what method takes how much time.

Also, is your docx only text or also other media (images , ...)

Member

edi9999 commented Apr 3, 2015

Ok, I suggest you to measure the time of all method calls you do. Something like:

Now = new Date()

Fs.readfilesync ...

Console. Log ( Now - new Date())

Can you then post the results to see what method takes how much time.

Also, is your docx only text or also other media (images , ...)

@sekretar

This comment has been minimized.

Show comment
Hide comment
@sekretar

sekretar Apr 8, 2015

Hi,
I didn't measure time, but it is 2-3 minutes for sure. Anyway same thing happens when I switch to render this on webpage (client side).

And no, there is no media files.

sekretar commented Apr 8, 2015

Hi,
I didn't measure time, but it is 2-3 minutes for sure. Anyway same thing happens when I switch to render this on webpage (client side).

And no, there is no media files.

@edi9999 edi9999 added the question label Apr 8, 2015

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Apr 8, 2015

Member

What I meant in my last post was to ask you to measure how long each line of code is taking. I strongly suspect jszip and fs.readfilesync / writefilesync to be the most time consuming.

The setTags method will be instant, and I suspect render to be relatively fast no matter the docx size

Member

edi9999 commented Apr 8, 2015

What I meant in my last post was to ask you to measure how long each line of code is taking. I strongly suspect jszip and fs.readfilesync / writefilesync to be the most time consuming.

The setTags method will be instant, and I suspect render to be relatively fast no matter the docx size

@sekretar

This comment has been minimized.

Show comment
Hide comment
@sekretar

sekretar Apr 9, 2015

OK, this is my JS code and I've added some measurement. Results are after //

I got almost same issue with nodejs

$.post('/xxx_SoaExport',
null,
function(result){
//show JSON length
console.log((JSON.stringify(result)).length) //RESULT: 457596
//set data
var start = new Date().getTime();
doc.setData(result);
var end = new Date().getTime();
var time = end - start;
console.log('doc.setData(result): ' + time); //RESULT: doc.setData(result): 0
//render
start = new Date().getTime();
doc.render();
end = new Date().getTime();
time = end - start;
console.log('doc.render: ' + time); //RESULT: doc.render: 139060
//get zip
start = new Date().getTime();
out=doc.getZip().generate({type:"blob"})
end = new Date().getTime();
time = end - start;
console.log('out=doc.getZip().generate: ' + time); //RESULT: out=doc.getZip().generate: 1749
saveAs(out,"output.docx");
commonFunctions.hideLoaderDivSecond();
}
);

sekretar commented Apr 9, 2015

OK, this is my JS code and I've added some measurement. Results are after //

I got almost same issue with nodejs

$.post('/xxx_SoaExport',
null,
function(result){
//show JSON length
console.log((JSON.stringify(result)).length) //RESULT: 457596
//set data
var start = new Date().getTime();
doc.setData(result);
var end = new Date().getTime();
var time = end - start;
console.log('doc.setData(result): ' + time); //RESULT: doc.setData(result): 0
//render
start = new Date().getTime();
doc.render();
end = new Date().getTime();
time = end - start;
console.log('doc.render: ' + time); //RESULT: doc.render: 139060
//get zip
start = new Date().getTime();
out=doc.getZip().generate({type:"blob"})
end = new Date().getTime();
time = end - start;
console.log('out=doc.getZip().generate: ' + time); //RESULT: out=doc.getZip().generate: 1749
saveAs(out,"output.docx");
commonFunctions.hideLoaderDivSecond();
}
);

@edi9999 edi9999 reopened this Apr 9, 2015

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Apr 9, 2015

Member

Ok, I didn't expect that the render method took 139 seconds.

Can you send me your docx via email ( on my profile) so that I can see what is taking so much time ?

Member

edi9999 commented Apr 9, 2015

Ok, I didn't expect that the render method took 139 seconds.

Can you send me your docx via email ( on my profile) so that I can see what is taking so much time ?

@sekretar

This comment has been minimized.

Show comment
Hide comment
@sekretar

sekretar Apr 9, 2015

I've send you piece of JSON and my docx template.

sekretar commented Apr 9, 2015

I've send you piece of JSON and my docx template.

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Apr 13, 2015

Member

Hi, I have tried to find out where the slow code might be situated using dtrace http://blog.nodejs.org/2012/04/25/profiling-node-js/

But it seems dtrace is not working on my machine. I have tried other methods without any success.

I thought the bottleneck was the creation of the subxmltemplater (when they are loops, many instances of xmltemplater are created, where we could reuse the instance multiple times as it has the same content , but just different tags). However, this didn't have any impact on the time of creation for your example so for now, I just don't know any further.

Member

edi9999 commented Apr 13, 2015

Hi, I have tried to find out where the slow code might be situated using dtrace http://blog.nodejs.org/2012/04/25/profiling-node-js/

But it seems dtrace is not working on my machine. I have tried other methods without any success.

I thought the bottleneck was the creation of the subxmltemplater (when they are loops, many instances of xmltemplater are created, where we could reuse the instance multiple times as it has the same content , but just different tags). However, this didn't have any impact on the time of creation for your example so for now, I just don't know any further.

@sekretar

This comment has been minimized.

Show comment
Hide comment
@sekretar

sekretar Apr 13, 2015

OK Ed,

thank you for your time.

I will still use this :)

sekretar commented Apr 13, 2015

OK Ed,

thank you for your time.

I will still use this :)

@edi9999 edi9999 changed the title from async example to Speed improvements for big json with many nested levels (my json is 500 000 character) Apr 22, 2015

@edi9999 edi9999 added the enhancement label Jul 27, 2015

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Nov 11, 2015

Member

docxtemplater v1.1.0 has been released with some speed improvements (here is the full changelog : https://github.com/open-xml-templating/docxtemplater/blob/master/CHANGELOG.md)

Can you update and tell if it works faster now ?

Member

edi9999 commented Nov 11, 2015

docxtemplater v1.1.0 has been released with some speed improvements (here is the full changelog : https://github.com/open-xml-templating/docxtemplater/blob/master/CHANGELOG.md)

Can you update and tell if it works faster now ?

@edi9999 edi9999 closed this Nov 15, 2015

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 3, 2016

I am experiencing the exact same issue, looks like any updates since have not fixed this issue with large JSON Objects. Unfortunately render() is taking forever and sometimes doesn't finish; it also freezes the browser UI

sculver-affirma commented Mar 3, 2016

I am experiencing the exact same issue, looks like any updates since have not fixed this issue with large JSON Objects. Unfortunately render() is taking forever and sometimes doesn't finish; it also freezes the browser UI

@edi9999 edi9999 reopened this Mar 3, 2016

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 3, 2016

json-test.txt
The attatched JSON object (3505 lines) is taking about 4 minutes to render(). Also attached the Template file.
template.docx

sculver-affirma commented Mar 3, 2016

json-test.txt
The attatched JSON object (3505 lines) is taking about 4 minutes to render(). Also attached the Template file.
template.docx

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 3, 2016

Member

Thanks for sending a sample, I will reproduce it to find the source of the latency.

I would first like to find out if the rendering is linear for the size of the input ( if that is not the case, they is still a bug). If the algorithm is indeed linear, it will be time to do a compilation step before the rendering of the document (which is the way I saw the library going in the long term anyway)

Member

edi9999 commented Mar 3, 2016

Thanks for sending a sample, I will reproduce it to find the source of the latency.

I would first like to find out if the rendering is linear for the size of the input ( if that is not the case, they is still a bug). If the algorithm is indeed linear, it will be time to do a compilation step before the rendering of the document (which is the way I saw the library going in the long term anyway)

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 3, 2016

Member

Are you using version 2.0 ?

Member

edi9999 commented Mar 3, 2016

Are you using version 2.0 ?

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 3, 2016

Thanks. Yes version 2.0

sculver-affirma commented Mar 3, 2016

Thanks. Yes version 2.0

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 4, 2016

Member

The render function takes 4 seconds with node on my machine. However, I see big ram consumption (700mb). Maybe the ram consumption explains the extreme slowness in the browser, since they are limited by the process.

The loop time seems to be non linear after testing multiple values, I'll see what can be done

Member

edi9999 commented Mar 4, 2016

The render function takes 4 seconds with node on my machine. However, I see big ram consumption (700mb). Maybe the ram consumption explains the extreme slowness in the browser, since they are limited by the process.

The loop time seems to be non linear after testing multiple values, I'll see what can be done

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 4, 2016

I'm using IE11 with 8GB RAM. Its been tested on 4 different machines and all are experiencing the same issue.

sculver-affirma commented Mar 4, 2016

I'm using IE11 with 8GB RAM. Its been tested on 4 different machines and all are experiencing the same issue.

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 4, 2016

Member

Just for testing, can you try in another browser like Firefox or chrome ?

Member

edi9999 commented Mar 4, 2016

Just for testing, can you try in another browser like Firefox or chrome ?

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 7, 2016

Just tested it in Chrome; takes just a few seconds. IE 10 and IE 11 taking minutes.

sculver-affirma commented Mar 7, 2016

Just tested it in Chrome; takes just a few seconds. IE 10 and IE 11 taking minutes.

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 11, 2016

Should the script be compatible with IE?

sculver-affirma commented Mar 11, 2016

Should the script be compatible with IE?

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 11, 2016

Member

What do you mean ?

Member

edi9999 commented Mar 11, 2016

What do you mean ?

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 11, 2016

It seems the issue is only with Internet Explorer, and since many users use IE it seems like a bug. I also tried in IE Edge and it takes minutes to render

sculver-affirma commented Mar 11, 2016

It seems the issue is only with Internet Explorer, and since many users use IE it seems like a bug. I also tried in IE Edge and it takes minutes to render

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 12, 2016

Member

No, the issue is for all js engines, node, chrome Firefox,... The problem is just more visible in IE.

Anyway, I think I found the bottleneck, I will publish a new version over the weekend

Member

edi9999 commented Mar 12, 2016

No, the issue is for all js engines, node, chrome Firefox,... The problem is just more visible in IE.

Anyway, I think I found the bottleneck, I will publish a new version over the weekend

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 12, 2016

Member

I created a new version, 2.1. It should be fixed now.

Member

edi9999 commented Mar 12, 2016

I created a new version, 2.1. It should be fixed now.

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 15, 2016

Member

Has this solved your issue @sculver-affirma ?

Member

edi9999 commented Mar 15, 2016

Has this solved your issue @sculver-affirma ?

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 15, 2016

We've scheduled to test the new script today. Will let you know. thanks.

sculver-affirma commented Mar 15, 2016

We've scheduled to test the new script today. Will let you know. thanks.

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 15, 2016

Looks like that did the trick; much faster now. thank you for your help.

sculver-affirma commented Mar 15, 2016

Looks like that did the trick; much faster now. thank you for your help.

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Mar 16, 2016

Member

Just to know : how much faster ? I'd like to write a blog article about the optimisations.

Member

edi9999 commented Mar 16, 2016

Just to know : how much faster ? I'd like to write a blog article about the optimisations.

@sculver-affirma

This comment has been minimized.

Show comment
Hide comment
@sculver-affirma

sculver-affirma Mar 16, 2016

Before it was taking between 1-5 minutes to render depending on size of the data. Now it take about 3-10 seconds

sculver-affirma commented Mar 16, 2016

Before it was taking between 1-5 minutes to render depending on size of the data. Now it take about 3-10 seconds

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Jul 3, 2016

Member

I wrote a blog post about the speed gain : http://javascript-ninja.fr/optimizing-speed-in-node-js/

Member

edi9999 commented Jul 3, 2016

I wrote a blog post about the speed gain : http://javascript-ninja.fr/optimizing-speed-in-node-js/

@edi9999 edi9999 closed this Jul 3, 2016

@edi9999

This comment has been minimized.

Show comment
Hide comment
@edi9999

edi9999 Aug 20, 2016

Member

Hi, I'm asking docxtemplater users to send their docx and data (in an anonymised form) to create integration tests so that these issues don't appear again, can you help ? See #244

Member

edi9999 commented Aug 20, 2016

Hi, I'm asking docxtemplater users to send their docx and data (in an anonymised form) to create integration tests so that these issues don't appear again, can you help ? See #244

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment