Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem since html2pdf 4.0.3 => 5.0.1 #230

Open
michauk opened this issue Sep 19, 2017 · 43 comments
Open

Performance problem since html2pdf 4.0.3 => 5.0.1 #230

michauk opened this issue Sep 19, 2017 · 43 comments
Labels

Comments

@michauk
Copy link

michauk commented Sep 19, 2017

Hello there,
I upgraded from html2pdf 4.0.3 to latest 5.* some days ago. On a Debian standard server (php 5.6...), in order to prepare the Debian upgrade (from Jessie to Stretch, I'm a bit late). I think I read somewhere that I had to upgrade html2pdf to 5 to get it running on php 7. In any case, it surely is a good thing to upgrade this tool along with the distro.

I encountered a big performance problem I didn't have before.

Mainly, my PDF is a [sometimes] big table (many rows, a few columns) with basic stuff in it (amount, date, reference...)
With html2pdf 5, I need 1.5-2 seconds to generate that table (1 page) for only ~70 lines. Time increases exponentially with the number of lines (the server is a big one with, no performance problem, tuning/monitoring it myself blah blah blah).

  • Approx. 40 seconds for 194 lines.
  • Timeout after 10 minutes for ~1000 lines.
    Before that, for approx. 3000 lines, I needed a few seconds (maybe 5 or 10, I don't remember, but it was reasonnable).

I started to debug this and reduced a copy of my code to just this table with no PHP/mysql processing, I just copied/pasted my table content. I activated setDebugMode to check.

Maybe I missed something simple (or my html code is dirty?), but I can't find why. I tried to remove every style attribute, it's a bit faster, but still slow and exponentially slower.
If anyone has any idea, I'd appreciate.

Here you can find an example with only this table content (+ a footer) with the style attributes (I put it on all TD which is maybe a bad idea. And the same without any style nowhere.
You can copy/paste some TR/TD some generate hundreds of lines and check the processing time.

examples.zip

Regards,
Jacques M.

@michauk
Copy link
Author

michauk commented Sep 20, 2017

OK I modified "exemple07a.php" in /examples/res/, to increase the number of lines. Same thing => time becomes cray. I tried to add tables (with one line) instead of one table with many rows, same crazy processing time.

@michauk
Copy link
Author

michauk commented Sep 20, 2017

OK I switched back to v4.03 and won't change it unless I'm forced to.
I tested html2pdf 4.03 with PHP7, it's working.
With the 4.03, I can create a table with 500 lines in approx 13 seconds.
I can create 10 tables, each containing 500 lines, in approx 3 minutes, provided you increase the php time limit + memory to 512MB.
And with PHP7, it just took 13 seconds.
Now I just hope the 4.03 will still work for a long time...

@spipu
Copy link
Owner

spipu commented Jan 20, 2018

i wil check this on the last version, thanks for the report

@spipu spipu added the bug label Jan 20, 2018
@layebaARD
Copy link

Hi I would like to generate a pdf but I do not understand the latest version.
Could I have a link that would allow me to download version 4.03

@jomofcw
Copy link

jomofcw commented Apr 18, 2018

Hi there,

I encounter the same problem, as explicitly explained by @michauk .
Is there any plan to fix it, please ?
I'm downgrading to v4.*, waiting for the fix.

Thanks for your work.

@jomofcw
Copy link

jomofcw commented May 24, 2018

Hello,

Sorry to spam about it, but it's an issue that need fix, really.
V5 seems to be great, but while this issue exists, it make it unusable, sadly.
I can help if test cases are needed.

Thanks for your work.

@verbunden
Copy link

We had a similar problem. For us it was because the call of "TCPDF::_destroy" took too long. The reason for this were 2 million files in the temporary folder searched by "TCPDF::_destroy" due to a failure in the session cleaning process. Once the folder was cleaned everything worked fine.

PHP v7.0.29-1~dotdeb+8.1 (Debian 3.16.51-3+deb8u1 x86_64)

@spipu
Copy link
Owner

spipu commented Aug 1, 2018

Hi,
i juste create a performance tool, to search the pb.
you can find it on the performance branch : https://github.com/spipu/html2pdf/blob/performance/performance/full.php

i need some metrics on different environments. Who can launch it ?

@spipu
Copy link
Owner

spipu commented Aug 1, 2018

My metrics:
1|66|11538
5|115|11539
10|172|11541
25|353|11546
50|740|11552
75|1101|12047
100|1503|12504
250|4589|15622
500|23645|21000
750|68102|26003
1000|124105|31748

@jomofcw
Copy link

jomofcw commented Aug 1, 2018

Hello,

To avoid any waste of time, can you provide some deploiement instructions, please ?
I'll test it asap.

@spipu
Copy link
Owner

spipu commented Aug 1, 2018

sure

git clone https://github.com/spipu/html2pdf.git -b performance  html2pdf_performance
cd html2pdf_performance
composer install --no-dev
cd performance
php ./full.php

@jomofcw
Copy link

jomofcw commented Aug 1, 2018

Thanks !

Sorry, another problem :/.

root@myWebServer:/myPath/html2pdf-performance# git clone git@github.com:spipu/html2pdf.git -b performance html2pdf_performance
Clonage dans 'html2pdf_performance'...
Warning: Permanently added the RSA host key for IP address '192.30.253.113' to the list of known hosts.
Permission denied (publickey).
fatal: Impossilble de lire le dépôt distant.

Veuillez vérifier que vous avez les droits d'accès
et que le dépôt existe.

@Airthee
Copy link

Airthee commented Aug 1, 2018

Hi, this is my metrics :

1|2249|11541
5|701|11542
10|904|11543
25|1573|11548
50|2718|11557
75|3905|12052
100|5256|12509
250|14134|15627
500|52411|21004
750|129720|26008
1000|244713|31752

I'm on Ubuntu for Windows (WSL), I run the script with PHP 5.6 (production version).

@spipu
Copy link
Owner

spipu commented Aug 1, 2018

@jomofcw i fix the instructions

@jomofcw
Copy link

jomofcw commented Aug 1, 2018

It's OK, thanks ^^.

So my metrics (using PHP7 on Debian environment with near default configuration) :
1|87|10493
5|107|10494
10|133|10495
25|217|10500
50|376|10504
75|556|10747
100|768|11204
250|2959|14322
500|42322|19699
750|135246|24702
1000|267831|30447

@KevinF-tech
Copy link

Hello,

My metrics (Debian 9.5 - PHP 7.0.30):

1|131|10168
5|379|10169
10|691|10171
25|1649|10176
50|3531|10180
75|5125|10480
100|7302|10953
250|18326|14179
500|45333|19721
750|86438|24890
1000|145040|30799

@Lorendex
Copy link

PHP7.0 Ubuntu 16.04
1|76|10506
5|90|10507
10|116|10508
25|194|10513
50|322|10517
75|481|10761
100|679|11218
250|2118|14336
500|13066|19714
750|32667|24717
1000|62140|30462

PHP7.2 Ubuntu 16.04
1|70|10506
5|97|10507
10|116|10508
25|194|10513
50|343|10517
75|516|10761
100|685|11218
250|2136|14336
500|13033|19714
750|33688|24717
1000|69630|30462

PHP7.1.6 Windows 10
1|94|9165
5|119|9166
10|156|9167
25|271|9172
50|474|9327
75|708|9711
100|979|10056
250|3250|12372
500|23620|16538
750|69842|20382
1000|144159|24846

@will2877
Copy link

Are there any News to this Issue?
I seem to have the same Problem, but dont have shell access to the Host.
It works fine on my local xampp installation but becomes very slow once I upload to my hosted server.

Local PHP: 7.2.9
Remote PHP: 7.2.11

Thanks in advance!

@jacobdo2
Copy link

jacobdo2 commented Jan 5, 2019

Any news on this issue?

@S-K-P
Copy link

S-K-P commented Jan 25, 2019

With html2pdf 5.2.1

Same problem as @michauk, but like @DiisMami it works fine on local installation with:

  • Wamp
  • Windows 10
  • PHP 7.2.14

It doesn't work fine on web host with:

  • CentOS 7
  • PHP 7.2.14

My metrics on local:
1|90|10513
5|87|10514
10|109|10515
25|174|10520
50|301|10524
75|449|10786
100|614|11243
250|1937|14361
500|13403|19739
750|38670|24742
1000|82348|30487

My metrics on web host:
1|149|14092
5|214|14242
10|314|14404
25|650|14951
50|1289|15878
75|2095|16667
100|3012|17664
250|18764|23276
500|139009|32879
750|415472|43881
1000|took to much time

PS: sorry if my english is bad

@meritel
Copy link

meritel commented Feb 26, 2019

Hello,
After many tests, i found a real performance lack with your way of cloning object, in function createSubHTML().

// clone the sub object
 $subHtml = clone self::$_subobj;

Each function which is calling this createSubHTML() put the return of createSubHTML() in a variable $sub, and often destroy it (the $sub) after having used it. (your function _destroySubHTML doesn't work here, so I made the unset($sub) manualy.)

Each time createSubHTML() is called, I log a lack between 10ms and 20ms depending on our server charge.
With my HTML file, i have counted 843 calls to createSubHTML()/_destroySubHTML(), which takes 9 947.4ms (~9s)

The complete generation of my PDF (1 page, some html tables nested) take 11 202.8ms (~11sec)

(sorry for my english, i'm FR)

@meritel
Copy link

meritel commented Mar 1, 2019

Hello,

Finaly found the way to solve the problem. The problem is not that much the fact that you clone the whole object, but the __destroy methods of each class linked to this object, when it is remove by the garbage collector.
When you make the clone, and send it by reference to the function that called the createSubHTML(), you put it in $sub variable..
When this variable is destroyed, the clone is also unset. And the magical methods __destroy of all the classes are called.
The one in tcpdf.php is really slow. By unactivating the glob block (see below), i pass from ~11 000 ms => ~800ms
(in the pdf i try to make, i'm passing 843x in this code, so 843x it search in the whole tmp_dir some files to cleanup, using glob function [regexp search]. We have on our server something that already clean up this dir, so i've desactivated this part of code)

public function _destroy($destroyall=false, $preserve_objcopy=false) {
		[...]
		/* 
		//This part is slowing down html2PDF
		if ($destroyall AND !$preserve_objcopy) {
			// remove all temporary files
			$tmpfiles = glob(K_PATH_CACHE.'__tcpdf_'.$this->pdf->file_id.'_*');
			if (!empty($tmpfiles)) {
				array_map('unlink', $tmpfiles);
			}
		}
		*/
		[...]
	}

@meritel
Copy link

meritel commented Mar 1, 2019

By the way, this file cleaning should be executed ONCE, at the real end of the script using HTML2PDF. So i've surcharged MyPDF.php, with __destruct and _destroy functions, with the glob removing commented (see my precedent post).

In Html2Pdf.php, i added the tmp_file cleaning in the magical __destruct() of Html2Pdf, like this :

	static protected $_tmpFilesAreCleaned   = false;	// flag : file cleaning is done

	public function __destruct() {
		if($this->_isSubPart || self::$_tmpFilesAreCleaned) return; 
		// remove all temporary files
		$tmpfiles = glob(K_PATH_CACHE.'__tcpdf_*');
		if (!empty($tmpfiles)) {
			self::$_tmpFilesAreCleaned = true;
			array_map('unlink', $tmpfiles);
		}
	}

Hope this will help ;)

@KevinF-tech
Copy link

Thanks it works good! Could you open a PR?

@meritel
Copy link

meritel commented Mar 4, 2019

Done. Instead of modifying tcpd.php, i changed MyPDF.php, surcharging tcpdf's __destruct magic method.

@meritel
Copy link

meritel commented Mar 5, 2019

Be carreful with the code above, if you want to execute more than 1 instance of html2pdf, one instance could delete files of the other one while the 2nde instance is not yet finished.
I'll modify my PR in this way soon.

@meritel
Copy link

meritel commented Mar 5, 2019

modified. PR done.

@Tofandel
Copy link
Contributor

#456

@Tofandel
Copy link
Contributor

It's mostly a tcpdf issue then I'd recommend making a PR there

@meritel
Copy link

meritel commented Mar 22, 2019

I don't think it's a tcpdf issue... Html2pdf makes lot of recursive instantiation of tcpdf class, that's the real problem... And tcpdf has to clean up its variables once the instance is destroyed..
tcpdf has not been made in that way of thousands recursive instantiations/destructions....

@Tofandel
Copy link
Contributor

Tofandel commented Mar 22, 2019

Okay I got what's happening, the ID of the cloned object is the same so it's trying to cleanup the same ID over and over again, it's still an issue in the scope of tcpdf but that can have a hotfix here as well

@Tofandel
Copy link
Contributor

Tofandel commented Mar 22, 2019

I made a PR on Tcpdf, you can hotfix this on Mypdf class on your PR and revert the rest

@jomofcw
Copy link

jomofcw commented May 14, 2019

Any news about this, please ? This is the only thing that avoid me from using html2pdf v5 :'(.

@Tofandel
Copy link
Contributor

Tofandel commented May 20, 2019

I'm personally switching to https://github.com/mpdf/mpdf and advise everybody here to do the same, it's very easy to switch from one to another,
This lib is old with no activity, the support is terrible and don't get me started on the code quality.

@citystrolch
Copy link

I have tested meritel's core hack and it brings a massive performance boost indeed, thank you. I suggest to implement this, however, I know that the task is probably first of all with tcpdf, I'll try and suggest it there, too.

@Tofandel
Copy link
Contributor

FYI: My tcpdf PR has been merged

@citystrolch
Copy link

Thanks @Tofandel - that means future installations of html2pdf as well as tcpdf should have it included right? (sorry to ask like a beginner, but in terms of github I am...)

@Tofandel
Copy link
Contributor

It was also included in the new release, it means that if you run
composer update
The version of tcpdf will be updated and you will get the performance improvement

@ggedde
Copy link

ggedde commented Oct 18, 2019

I am having the same issue with V5.2.1. I have reverted back to 4.03.
33 page document with 17 tables with about 40 items in each table
5.2.1 = 22 seconds
4.03 = 6 seconds

I will have to check out MPDF someday, but for now 4.03 is working good, I just really wanted to use the end_last_page tag which is not available on 4.03, but not a huge deal.

@michauk
Copy link
Author

michauk commented Oct 18, 2019

Hey,
2 years are gone since my 1st message, and it's still a problem
Maybe I'll consider moving to something like "onlyoffice docbuilder". It seems it could replace both php->pdf and php->xls tools.
For the moment this good ol' 4.03 release (actually I moved to 4.6.1 to be able to use different php libs with composer) is still doing the job. Maybe I had a warning issue on "Countable" object when I upgraded to php 7.3 (debian buster). So I modified the code a bit. I can't remember if it's in this lib or another.
Regards,

@ggedde
Copy link

ggedde commented Oct 18, 2019

Yeah, 4.6.1 was better but still not as fast as 4.03
5.2.1 = 22 seconds
4.6.1 = 9 seconds
4.03 = 6 seconds

Yeah, I had to fix the countable issue too.

@Olofu
Copy link

Olofu commented Nov 17, 2020

My Metrics on shared web hosting

cPanel Version | 86.0 (build 30)
PHP Version 7.4
Architecture | x86_64
Operating System | linux

1|62|10495
5|79|10496
10|101|10497
25|174|10502
50|308|10506
75|452|10514
100|620|10960
250|2019|14076
500|5791|19450
750|11295|24449
1000|19189|30191

@Albob
Copy link

Albob commented Jun 1, 2022

Hello and thanks for the lib. I also find it quite slow. Here are my performance measure for html2pdf v5.2.4, if this helps:

Steps: 1, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000
Try by Steps: 10

1|212|9412
5|198|9413
10|284|9414
25|547|9419
50|979|9423
75|1506|9644
100|2090|10010
250|5622|12452
500|14838|16823
750|29104|20876
1000|54589|25549

My CPU is an Intel Core i7-1065G7
image

php --version
PHP 7.4.1 (cli) (built: Jan 20 2020 22:21:57) ( ZTS Visual C++ 2017 x86 )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Xdebug v2.9.1, Copyright (c) 2002-2020, by Derick Rethans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests