-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Fixes #3019 - Better import from LO / OO / soffice #3783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updating
This change is meant to ease using LibreOffice as converter. When LibreOffice converts a file, it adds some classes to the <title> tag. This is a quick & dirty way of matching the <title> and comment it out independently on the classes that are set on it.
|
Crashes the import process. Etherpas itself survives. No logs. See #3019 (comment). |
|
Ugh, libreoffice get this wrong. If libreoffice isn't fully installed IE you only install libreoffice-common it fails.
is required. What should happen is instead of Returning |
|
When using NodeJS spawn When using CLI Works |
|
I dunno tbh, this fix is fine as far how it's designed, the problem is with nodejs not properly passing the arguments to I tried hardcoding and @muxator can you take a look and sanity check my work on this? If you set |
|
Bump @muxator for help :D |
|
@JohnMcLear I confirm your problem. When invoked directly on the shell this command works flawlessly: The same command wrapped in a // file: test-libreoffice.js
const { spawn } = require('child_process');
//const ls = spawn('ls', ['-lh', '/usr']);
const ls = spawn('libreoffice6.4', [
'--headless',
'--convert-to',
'html:"XHTML Writer File:UTF8"',
'--outdir',
'.',
'--invisible',
'--nologo',
'--nolockcheck',
'--writer',
'/tmp/oneline.odt',
]);
ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});
ls.stderr.on('data', (data) => {
console.log(`stderr: ${data}`);
});
ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});Result: Tested on nodejs 12.13.1. I'll try to profile the LibreOffice execution. |
|
I'm glad you sanity checked this, glad it's not me being an idiot! Thanks! Please let me know if you solve and how, this one pissed me off good :D |
|
This is not nodejs specific. The same program above, converted in python, gives the exact same result: #!/usr/bin/env python3
#
# file: /tmp/test.py
import subprocess
args = [
'libreoffice6.4',
'--headless',
'--convert-to',
'html:"XHTML Writer File:UTF8"',
'--outdir',
'.',
'/tmp/oneline.odt',
'--invisible',
'--nologo',
'--nolockcheck',
'--writer',
]
result = subprocess.run(args, capture_output=True)
print('stdout:', result.stdout)
print('stderr:', result.stderr)When executing: |
|
Aaaand this is the reason for which everyone always needs to have a good diff tool at hand. Here's the difference between the output of the succeeding (directly on the shell) and the failing (js via -convert /tmp/oneline.odt -> /tmp/oneline.html using filter : XHTML Writer File:UTF8
+convert /tmp/oneline.odt -> /tmp/oneline.html using filter : "XHTML Writer File:UTF8"Do you see that in the js (& python) execution there are two double quotes more? Now, in hindsight, it is easy to understand: the double quotes are needed only when executing via the shell, because the shell needs escaping for spaces. # bash requires the double quotes BECAUSE of the spaces, to convey
# that "html:XHTML Writer File:UTF8" is really a single parameter
#
--convert-to "html:XHTML Writer File:UTF8" When we explicitly pass parameters to a subprocess via Leave out the double quotes and it should work. On my toy Javascript & python programs it worked. |
|
woop nice will add to my list of things to do 2mrw. Thanks! |
|
After looking at this snippet I think that, we might be having one more problem. LibreOffice exits with a 0 exit code even when the conversion fails. It did when we were passing a bad parameter. If Etherpad just checks for the process exit code it may mistake a failed conversion for a successful one. We should check for something else in addition.
|
|
|
Lol I just looked and I already WAS leaving out the double quotes btw :D gonna test in isolation now to generate some logic that works/fails. The below works: var spawn = require("child_process").spawn; |
|
Yea, I'm fucking stumped. Sanity check for me..
Yet running this isolated: Works fine... Can you confirm? Am I losing my mind? |
I am going to try it now. |
|
Confirmed the issue. What nodejs version are you running? Is it 12? Edit: the error happens on both node 10 and 12, but on 12 it is harder to spot |
|
I have a suspect. To see what happens, you'll have to run Etherpad under nodejs 10 (on nodejs 12 a deprecated feature was removed and you won't see anything useful in the logs). See #3834 and #3841. The
|
|
I think that the line that needs to be changed is this one: etherpad-lite/src/node/utils/LibreOffice.js Line 105 in e625168
But I really would like to see this become more robust (without encoding information in the temporary file name). Some more monkey patching on that file would be almost immoral. 😭 |
|
How the funk did I not see that?! I'm running v13.10.1 and that might explain why but wow, what a crazy situation tho.. Solution, pass the |
|
Yea, agreed. This whole import/export pass to third party binaries is a mess. Gonna fix this, make it stable then will do a good review for future versions. |
|
In the next commit, we are going to change the conversion method to "html:XHTML Writer File:UTF8". Without this change, that conversion method name would end up in the extension of the temporary file that is created as an intermediate step. In this way, the file extensione will always stay ".html". No functional changes, hopefully. Only the extension of the temporary file should change.
This yields better conversion results, but requires the previous change, otherwise there would have been difficulties in locating the temporary file name.
|
Right, let's get this moving. Please merge this PR as it is now and then I can keep moving forward with changes. I'm going to hit problems with contentcollector.js and my list work really soon! |
1351c22 to
2b83227
Compare
|
Merged with some reordering of the commits. Thanks. |
Discussion here: #3019
I don't like the
returnin contentcollector.js -- I'm +1 a better method.This fixes copy/paste and import too, in theory. Not fully tested.