Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: The "window-status" is causing the "process.waitFor()" to block indefinitely. #131

Merged
merged 5 commits into from Dec 7, 2023

Conversation

xweiba
Copy link
Contributor

@xweiba xweiba commented Dec 6, 2023

wkhtmltopdf uses a rather old version of WebKit, which often has trouble with up-to-date JavaScript code.

When using window-status and a SyntaxError: Parse error occurs in JavaScript, it can result in the failure of setting the window-status. The wkhtmltopdf process will remain in a waiting state indefinitely. In such cases, a timeout is needed to handle this situation.

…dowStatusTimeout to prevent the thread from being blocked indefinitely in case of a SyntaxError: Parse error in JavaScript, which may cause the failure of setting the window-status.
Copy link
Owner

@jhonnymertz jhonnymertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xweiba, thanks for the PR, really appreciate the contribution! Added some comments and questions to make sure I understood the issue. Let me know what you think and how can I help you with this.

* wkHTMLtoPDF uses a rather old version of WebKit, which often has trouble with up-to-date JavaScript code.
* When using window-status and a SyntaxError: Parse error occurs in JavaScript, it can result in the failure of setting the window-status. The wkhtmltopdf process will remain in a waiting state indefinitely. In such cases, a timeout is needed to handle this situation.
* */
private int windowStatusTimeout = 30;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the already existing timeout field for this purpose (right above this field)? Do you think they are different in some sense?

If we can keep only one "timeout" field would be better, and may suggestion would be using the same existing timeout field for this. We can also increase the default timeout field value to 30s, if you think 10s is too low for a default value. Regarding the 30 seconds, is it just a best guess? Could we keep 10s? This is configurable anyway via setTimeout method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very sorry, I had a busy day at work today, and I now have time to respond. 🙏🙏🙏

Regarding windowStatusTimeout, I think I need to explain the scenario in which this issue arises:

When you add the --window-status parameter in the command, wkhtmltopdf considers the current page to be loaded only after you have completed the assignment window.status = "complete" in JavaScript. It typically occurs in situations where the page content is generated asynchronously. Failure to add this parameter can result in incomplete rendering of the page content.

If you have added this parameter in the command but, for some reason, the page has not completed the assignment, wkhtmltopdf will think that the current page has not loaded and will continue to wait. This is the reason why process.waitFor() is blocked.

This issue is often caused by developers unfamiliar with the features of wkhtmltopdf. When new features or dependencies are added, developers may use ES6 or syntax not supported by wkhtmltopdf's WebKit, leading to a SyntaxError: Parse error warning during the parsing of JavaScript. This interruption in normal JavaScript execution prevents the completion of the assignment. Since it is a warning rather than an error, wkhtmltopdf does not exit but continues to wait.

Therefore, I believe it is a timeout caused by the window.status parameter, but it does fall within the range of a runtime timeout.

Regarding the 30s, in our use case, there are often concurrent processing situations, and 10s may not be sufficient. However, it is configurable, so I think there is no need to modify it.

You are right, and I will remove this parameter. I apologize for any confusion it may have caused you. 🙏🙏🙏

My English is not very good; it was generated by OpenAI's translation. Please forgive any inaccuracies.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @xweiba no need to apologize! I'm from Brazil, so my English is not good neither =D
Anyway, I really appreciate your contribution and explanation of the issue. Now I fully understand the issue and I'm confident we can merge it. Thank you for this!

Comment on lines 338 to 348
if (command.contains("window-status")) {
logger.debug("Waiting for window-status waitFor: {}s...", this.windowStatusTimeout);
if (!process.waitFor(this.windowStatusTimeout, TimeUnit.SECONDS)) {
// 关闭进程
process.destroy();
logger.error("Error generating pdf by window-status timeout, command: {}", command);
throw new RuntimeException("window-status timeout");
}
} else {
process.waitFor();
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (command.contains("window-status")) {
logger.debug("Waiting for window-status waitFor: {}s...", this.windowStatusTimeout);
if (!process.waitFor(this.windowStatusTimeout, TimeUnit.SECONDS)) {
// 关闭进程
process.destroy();
logger.error("Error generating pdf by window-status timeout, command: {}", command);
throw new RuntimeException("window-status timeout");
}
} else {
process.waitFor();
}
if (!process.waitFor(this.timeout, TimeUnit.SECONDS)) {
process.destroy();
logger.error("PDF generation failed by defined timeout of {}s, command: {}", timeout, command);
throw new RuntimeException("PDF generation timeout by user. Try to increase the timeout.");
}

Following the idea of using the already existing timeout field, we can use directly process.waitFor(this.timeout, TimeUnit.SECONDS). As we already have the concept of having a timeout, we should use it wherever possible to keep usage expectations consistent.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we will need some unit tests to cover this behavior, could you add it please? We could set a really low timeout time to force the error and check if the exception is thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I've rewritten it and added a unit test to cover this behavior.

Copy link
Owner

@jhonnymertz jhonnymertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xweiba thanks again for submitting the PR and helping with the maintenance of this code. PR looks good to me, merging it now.

Thanks for the detailed description of the issue and solution, I'll use it to improve the documentation. I'll do my best to generate a new version of the library, including your fix, today. Let you know.

@jhonnymertz jhonnymertz merged commit bac1394 into jhonnymertz:master Dec 7, 2023
1 check passed
@jhonnymertz
Copy link
Owner

@xweiba version 1.3.0-RELEASE is now available including your fix, thanks again for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants