Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docx conversion on debian does not work | Libreoffice crash #784

Closed
AlexKvrlp opened this issue Jan 24, 2024 · 36 comments
Closed

docx conversion on debian does not work | Libreoffice crash #784

AlexKvrlp opened this issue Jan 24, 2024 · 36 comments
Labels
documentation Documentation update request

Comments

@AlexKvrlp
Copy link

AlexKvrlp commented Jan 24, 2024

Hi,

thank you for such an great tool! It works fine in my development environment (ubuntu wsl and ubuntu).

But on our production system I get only an "internal Server Error".


  _____     __           __
 / ___/__  / /____ ___  / /  ___ _______ _
/ (_ / _ \/ __/ -_) _ \/ _ \/ -_) __/ _ '/
\___/\___/\__/\__/_//_/_.__/\__/_/  \_, /
                                   /___/

A Docker-powered stateless API for PDF files.
Version: 8.0.2
-------------------------------------------------------
[SYSTEM] modules: api chromium libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook
[SYSTEM] pdfengines: libreoffice-pdfengine pdfcpu pdftk qpdf
[SYSTEM] prometheus: collecting metrics
[SYSTEM] libreoffice-api: LibreOffice ready to start
[SYSTEM] api: server listening on port 3000
[SYSTEM] chromium: Chromium ready to start
{"level":"debug","ts":1706091729.0305188,"logger":"api.formslibreofficeconvert","msg":"form fields: map[]","trace":"d7073679-33f7-4943-9731-65a26ed76780"}
{"level":"debug","ts":1706091729.0306876,"logger":"api.formslibreofficeconvert","msg":"form files: map[dummy.docx:/tmp/51178158-7b89-4f40-89d4-7a55af6d0faa/615b0125-c4cd-4770-a33f-dd8677df2123/dummy.docx]","trace":"d7073679-33f7-4943-9731-65a26ed76780"}
{"level":"debug","ts":1706091729.030958,"logger":"api.formslibreofficeconvert","msg":"process lock acquired","trace":"d7073679-33f7-4943-9731-65a26ed76780"}
{"level":"debug","ts":1706091729.031098,"logger":"libreoffice-api.libreoffice","msg":"start process"}
{"level":"debug","ts":1706091729.0316796,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin","msg":"start unix process: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/709497b2-2c99-483f-95f0-bddeea39f0bb/10ade9be-20e8-4259-9505-d3ef4fa8a6c5 --accept=socket,host=127.0.0.1,port=42351,tcpNoDelay=1;urp;StarOffice.ComponentContext"}
{"level":"debug","ts":1706091729.1251678,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin.stderr","msg":"terminate called after throwing an instance of 'std::runtime_error'"}
{"level":"debug","ts":1706091729.1252174,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin.stderr","msg":"  what():  osl::Thread::create failed"}
{"level":"debug","ts":1706091729.1448293,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin","msg":"unix process already killed"}
{"level":"debug","ts":1706091729.1448631,"logger":"api.formslibreofficeconvert","msg":"process lock released","trace":"d7073679-33f7-4943-9731-65a26ed76780"}
{"level":"debug","ts":1706091729.1452055,"logger":"api.formslibreofficeconvert","msg":"'/tmp/51178158-7b89-4f40-89d4-7a55af6d0faa/615b0125-c4cd-4770-a33f-dd8677df2123' context's working directory removed","trace":"d7073679-33f7-4943-9731-65a26ed76780"}
{"level":"error","ts":1706091729.1452672,"logger":"api","msg":"convert to PDF: process first start: start process: execute LibreOffice: unix process error: wait for unix process: signal: aborted (core dumped)","trace":"d7073679-33f7-4943-9731-65a26ed76780","remote_ip":"172.17.0.1","host":"localhost:3000","uri":"/forms/libreoffice/convert","method":"POST","path":"/forms/libreoffice/convert","referer":"","user_agent":"GuzzleHttp/7","status":500,"latency":117286886,"latency_human":"117.286886ms","bytes_in":174496,"bytes_out":21}

Our production server is an debian with 4.9.0.19-amd64 kernel. Docker version is 19.03.12.

According to the documentation I checked the ressources but there is no limitation. I think 32GB memory should be enough. ;-)

So I tried to track down the problem. If I start the container with --libreoffice-auto-start enabled, the container didn't start.


  _____     __           __
 / ___/__  / /____ ___  / /  ___ _______ _
/ (_ / _ \/ __/ -_) _ \/ _ \/ -_) __/ _ '/
\___/\___/\__/\__/_//_/_.__/\__/_/  \_, /
                                   /___/

A Docker-powered stateless API for PDF files.
Version: 8.0.2
-------------------------------------------------------
[SYSTEM] modules: api chromium libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook
[SYSTEM] chromium: Chromium ready to start
[SYSTEM] pdfengines: libreoffice-pdfengine pdfcpu pdftk qpdf
{"level":"debug","ts":1706091982.3399339,"logger":"libreoffice-api.libreoffice","msg":"start process"}
[SYSTEM] prometheus: collecting metrics
{"level":"debug","ts":1706091982.3410268,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin","msg":"start unix process: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/7469d30f-d126-4c27-94ab-e54f9de264be/34df5b55-398d-4cbd-bfa8-705e9c4c5e8f --accept=socket,host=127.0.0.1,port=43647,tcpNoDelay=1;urp;StarOffice.ComponentContext"}
{"level":"debug","ts":1706091982.3984182,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin.stderr","msg":"terminate called after throwing an instance of 'std::runtime_error'"}
{"level":"debug","ts":1706091982.3984692,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin.stderr","msg":"  what():  osl::Thread::create failed"}
[FATAL] starting libreoffice-api: launch supervisor: start process: execute LibreOffice: unix process error: wait for unix process: signal: aborted (core dumped)
{"level":"debug","ts":1706091982.4568202,"logger":"libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin","msg":"unix process already killed"}

The are some tickets which suggest to increase --libreoffice-start-timeout. I increased it up to 400 seconds but still no luck.

Next I checked the LibreOffice Version with docker exec -it gotenberg libreoffice --version and got ERROR 4 forking process.

For me, it seems like that libreoffice is unable to start. Thats why no conversion is possible. But how to fix it?

At the moment I'm out of ideas how to get gotenberg running.

I would be very appreciate for any hints. :-)

@gulien
Copy link
Collaborator

gulien commented Jan 24, 2024

Hello @AlexKvrlp,

Did you provide enough CPU/memory to your Gotenberg instance?

@AlexKvrlp
Copy link
Author

AlexKvrlp commented Jan 24, 2024

Hi @gulien,

I think so. There is no limitation for the container.
From docker info:

[...]
 Kernel Version: 4.9.0-19-amd64
 Operating System: Debian GNU/Linux 9 (stretch)
 OSType: linux
 Architecture: x86_64
 CPUs: 6
 Total Memory: 31.33GiB
[...]

Edit:
Some additional information: The problem appears at gotenberg version 7, too. Other docker container runs since years without any problem on this maschine.

@JocoLabs
Copy link

JocoLabs commented Jan 24, 2024

Edit for clarity: The system works, but randomly fails (many times a day on v8 vs almost never on v7.4.2)

Hi,

i have been busy trying different combinations of settings while attempting not to post an issue... but after seeing this, im tossing in a +1.

I had two instances running on 7.4.2 with the default settings. I MIGHT see a single 503 error each day, and most of the time, no errors at all. I couldnt use any version past 7.4.2 because i would get many more 503 errors.

Anyway, i just upgraded to v8 thinking that the old issue was probably resolved by now, but no.
Each VM that is running an instance has 12gb and 8 cores (xeons).
I have increased the startup time out to 90s as well as the api timeout as high as 500s. the 503 error still appears.
I have played with combinations of the auto start on/off, as well as the start after, 0 and other values
Nada. I even increased the instance count to 4 just to see if maybe it was load (despite it being the same load as last week on 7.4.2).

I would like to avoid going back to 7.4.2, but if thats what ends up happening, ill be ok with it.

If you need some logs, whats the best way to go about getting those?

Thanks!

@AlexKvrlp
Copy link
Author

AlexKvrlp commented Jan 25, 2024

I don't think that our both issues has the same reason. In my case it runs perfectly on system A but fails completly at system B.
Tested with version 7 and 8. Works on version 7. Fails on version 7.10.0.

Edit:
corrected which versions fails

@JocoLabs
Copy link

In that case, sorry to hijack the thread. However, if you would humor me (just in case), maybe try out the version i mentioned?
docker.io/gotenberg/gotenberg:7.4.2

If it still doesnt work, ill remove my comments to clean out the clutter.

@AlexKvrlp
Copy link
Author

Thanks a lot @JocoLabs! 7.4.2 works!
I will try to find the first not working version and report it here.

Again: THANKS

@AlexKvrlp
Copy link
Author

AlexKvrlp commented Jan 25, 2024

I'm sorry. The above information that it fails on version 7 was not correct.
The last working version is 7.9.2. It start failing with version 7.10.0

@gulien
Copy link
Collaborator

gulien commented Jan 25, 2024

Hey guys, looks like your issues are somehow related to: #763.

TL;DR:

IMO the versions 7.6 and 7.5 of LibreOffice are somehow the culprit.

What's weird is that on my end, I have 0 issue, locally or on my demo instance. The issue does not seem really common, otherwise a lot of people would complaint about it 🤔

Anyway, coud you try:

FROM gotenberg/gotenberg:8.0.2

USER root

RUN DEBIAN_FRONTEND=noninteractive apt-get remove -y -qq libreoffice &&\
    DEBIAN_FRONTEND=noninteractive apt-get autoremove -y -qq &&\
    apt-get update -qq &&\
    DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends libreoffice &&\
    libreoffice --version

USER gotenberg
docker build -t "gotenberg/gotenberg:libreoffice-bookworm" -f Dockerfile .

Here it will replace the existing LibreOffice version (from bookworm-backports) with LibreOffice 7.4 (from bookworm).

Could you test if it works better in your case?

@AlexKvrlp
Copy link
Author

Sorry, but I failed to build the image. Here ist the output:

 docker build -t "gotenberg/gotenberg:libreoffice-bookworm" -                                                                                                                              f Dockerfile .
Sending build context to Docker daemon  421.4kB
Step 1/4 : FROM gotenberg/gotenberg:8.0.2
8.0.2: Pulling from gotenberg/gotenberg
Digest: sha256:50ddab5dd5fbd152b55497ee7978efc34f7e00e4326b70140401859155cf1f8f
Status: Downloaded newer image for gotenberg/gotenberg:8.0.2
 ---> 6db2663d606a
Step 2/4 : USER root
 ---> Running in 04a37de0cd2e
Removing intermediate container 04a37de0cd2e
 ---> e869cbb970cc
Step 3/4 : RUN DEBIAN_FRONTEND=noninteractive apt-get remove libreoffice &&    D                                                                                                                              EBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends lib                                                                                                                              reoffice &&    libreoffice --version
 ---> Running in 1dee0d24bdee
Reading package lists...
Building dependency tree...
Reading state information...
The following packages were automatically installed and are no longer required:
  coinor-libcbc3 coinor-libcgl1 coinor-libclp1 coinor-libcoinmp1v5
  coinor-libcoinutils3v5 coinor-libosi1v5 fonts-opensymbol iso-codes
  libabsl20220623 libabw-0.1-1 libblas3 libboost-filesystem1.74.0
  libboost-iostreams1.74.0 libboost-locale1.74.0 libboost-thread1.74.0
  libbox2d2 libcap2-bin libcdr-0.1-1 libclucene-contribs1v5 libclucene-core1v5
  libcolamd2 libdw1 libe-book-0.1-1 libelf1 libeot0 libepubgen-0.1-1
  libetonyek-0.1-1 libexttextcat-2.0-0 libexttextcat-data libfreehand-0.1-1
  libgfortran5 libgpgme11 libgpgmepp6 libgstreamer-plugins-base1.0-0
  libgstreamer1.0-0 libharfbuzz-icu0 libhunspell-1.7-0 libhyphen0 libice6
  liblangtag-common liblangtag1 liblapack3 libltdl7 libmhash2 libmspub-0.1-1
  libmwaw-0.3-3 libmythes-1.2-0 libnumbertext-1.0-0 libnumbertext-data
  libodfgen-0.1-1 libopenjp2-7 liborc-0.4-0 libpagemaker-0.0-0 libpoppler126
  libpython3.11 libquadmath0 libqxp-0.0-0 libraptor2-0 librasqal3 librdf0
  libreoffice-base libreoffice-base-core libreoffice-base-drivers
  libreoffice-calc libreoffice-common libreoffice-core libreoffice-draw
  libreoffice-impress libreoffice-math libreoffice-report-builder-bin
  libreoffice-style-colibre libreoffice-uiconfig-base
  libreoffice-uiconfig-calc libreoffice-uiconfig-common
  libreoffice-uiconfig-draw libreoffice-uiconfig-impress
  libreoffice-uiconfig-math libreoffice-uiconfig-writer libreoffice-writer
  librevenge-0.0-0 libsm6 libstaroffice-0.0-0 libsuitesparseconfig5
  libuno-cppu3 libuno-cppuhelpergcc3-3 libuno-purpenvhelpergcc3-3 libuno-sal3
  libuno-salhelpergcc3-3 libunwind8 libvisio-0.1-1 libwpd-0.10-10 libwpg-0.3-3
  libwps-0.4-4 libx11-xcb1 libxmlsec1 libxmlsec1-nss libxslt1.1 libyajl2
  libzmf-0.0-0 libzxing2 lp-solve python3-uno ucf uno-libs-private ure
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  libreoffice
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 68.6 kB disk space will be freed.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c DEBIAN_FRONTEND=noninteractive apt-get remove libreoffice &&    DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends libreoffice &&    libreoffice --version' returned a non-zero code: 1

@gulien
Copy link
Collaborator

gulien commented Jan 25, 2024

@AlexKvrlp I've updated the Dockerfile, could you try again?

@JocoLabs
Copy link

Thanks for the docker file, ill give that a shot. I knew something changed between the versions, i just wasnt sure which part (and i was trying to avoid creating an issue if no one else was).

Lastly, reading over other issues, it almost seems like a race condition with the locks, gc, or something else. As i mentioned, even with api and lo-startup timeouts at 500s, still no go... my guess is the race condition hits, and its stuck in a deadlock until timeout catches it (the timeout hit is always api timeout, never startup in my experience).

Anyway, thanks for this, I will test when i can as the only way for me is to put it into production to mirror the environment that causes the issue (dont wait up for me).

@gulien
Copy link
Collaborator

gulien commented Jan 25, 2024

A race condition would cause a « go » panic AFAIK 🤔. Regarding the lock, that’s a possibility, but it should happen way more often IMO, or the conditions are rare.

To clarify, does it happen only on startup? Or on restart? Or on conversion requests?

@gulien
Copy link
Collaborator

gulien commented Jan 25, 2024

Also, do you have the complete error message?

@gulien
Copy link
Collaborator

gulien commented Jan 25, 2024

@AlexKvrlp it looks like your error is happening on LibreOffice first start:

if err != nil && exitCode != 81 {

No idea why, but I wonder if it is related to the LibreOffice version.

@JocoLabs
Copy link

I just reworked the logging to warn, as well as tweaked the timeout on each instance so i can see which one broke. Lastly, of my four instances, two are running the vanilla version, and two are running the version using the older LO as per the dockerfile above.

I'll keep you posted.

@AlexKvrlp
Copy link
Author

@AlexKvrlp I've updated the Dockerfile, could you try again?

Many thanks for your help!! Sadly the build failed again:

docker build -t "gotenberg/gotenberg:libreoffice-bookworm" -f Dockerfile .
Sending build context to Docker daemon  421.4kB
Step 1/4 : FROM gotenberg/gotenberg:8.0.2
 ---> 6db2663d606a
Step 2/4 : USER root
 ---> Using cache
 ---> e869cbb970cc
Step 3/4 : RUN DEBIAN_FRONTEND=noninteractive apt-get remove -y -qq libreoffice &&    DEBIAN_FRONTEND=noninteractive apt-get autoremove -y -qq &&    apt-get update -qq &&    DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends libreoffice &&    libreoffice --version
 ---> Running in 01636fac857f
(Reading database ... 25469 files and directories currently installed.)
Removing libreoffice (4:7.6.4~rc1-1~bpo12+1) ...
E: Problem executing scripts DPkg::Post-Invoke 'rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true'
E: Sub-process returned an error code
The command '/bin/sh -c DEBIAN_FRONTEND=noninteractive apt-get remove -y -qq libreoffice &&    DEBIAN_FRONTEND=noninteractive apt-get autoremove -y -qq &&    apt-get update -qq &&    DEBIAN_FRONTEND=noninteractive apt-get install -y -qq --no-install-recommends libreoffice &&    libreoffice --version' returned a non-zero code:

@gulien
Copy link
Collaborator

gulien commented Jan 26, 2024

@AlexKvrlp @JocoLabs I've pushed the image gulnap/gotenberg:libreoffice-bookworm.

@AlexKvrlp
Copy link
Author

@AlexKvrlp @JocoLabs I've pushed the image gulnap/gotenberg:libreoffice-bookworm.

Thanks, I fired up gulnap/gotenberg:libreoffice-bookworm. Unfortunately no improvement. :-/


Unable to find image 'gulnap/gotenberg:libreoffice-bookworm' locally
libreoffice-bookworm: Pulling from gulnap/gotenberg
2f44b7a888fa: Already exists
ed52e3819bbf: Already exists
1d7bbbbdb055: Already exists
365e0cae6d98: Already exists
b14c03b0063d: Already exists
093c4372632f: Already exists
bbf40e348c50: Already exists
c9f7353269e2: Already exists
02cf4053ea6d: Already exists
4f4fb700ef54: Already exists
1880097bcf21: Pull complete
Digest: sha256:a2ce773ef5f667580186d86916d7547839ef96ed042d67bb0d389b0494c91c20
Status: Downloaded newer image for gulnap/gotenberg:libreoffice-bookworm

  _____     __           __
 / ___/__  / /____ ___  / /  ___ _______ _
/ (_ / _ \/ __/ -_) _ \/ _ \/ -_) __/ _ '/
\___/\___/\__/\__/_//_/_.__/\__/_/  \_, /
                                   /___/

A Docker-powered stateless API for PDF files.
Version: 8.0.2
-------------------------------------------------------
[SYSTEM] modules: api chromium libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook
[SYSTEM] pdfengines: libreoffice-pdfengine pdfcpu pdftk qpdf
[SYSTEM] libreoffice-api: LibreOffice ready to start
[SYSTEM] chromium: Chromium ready to start
[SYSTEM] prometheus: collecting metrics
[SYSTEM] api: server listening on port 3000
2024/01/26 08:49:33.341 ERROR   api     convert to PDF: process first start: start process: execute LibreOffice: unix process error: wait for unix process: signal: aborted (core dumped)       {"trace": "95de9b24-0f7c-4a09-b363-822be141f2a9", "remote_ip": "172.17.0.1", "host": "localhost:3000", "uri": "/forms/libreoffice/convert", "method": "POST", "path": "/forms/libreoffice/convert", "referer": "", "user_agent": "GuzzleHttp/7", "status": 500, "latency": 1061861068, "latency_human": "1.061861068s", "bytes_in": 176309, "bytes_out": 21}

@gulien
Copy link
Collaborator

gulien commented Jan 26, 2024

Are you using the same version of Docker locally and on your production server?
Anyway, I wonder if LibreOffice, in your case and for whatever reason, does not require a first start.

In my case it is working as expected:


  _____     __           __               
 / ___/__  / /____ ___  / /  ___ _______ _
/ (_ / _ \/ __/ -_) _ \/ _ \/ -_) __/ _ '/
\___/\___/\__/\__/_//_/_.__/\__/_/  \_, / 
                                   /___/

A Docker-powered stateless API for PDF files.
Version: 8.0.2
-------------------------------------------------------
[SYSTEM] modules: api chromium libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook 
2024/01/26 09:16:06.678 DEBUG   libreoffice-api.libreoffice     start process
[SYSTEM] pdfengines: libreoffice-pdfengine pdfcpu pdftk qpdf
2024/01/26 09:16:06.678 DEBUG   libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin start unix process: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/b37f8243-f34b-4139-9582-1bae51c8a322/9a7337ef-b43d-4ec8-879a-8398927ed189 --accept=socket,host=127.0.0.1,port=35721,tcpNoDelay=1;urp;StarOffice.ComponentContext
[SYSTEM] prometheus: collecting metrics
[SYSTEM] chromium: Chromium ready to start
2024/01/26 09:16:07.018 DEBUG   libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin unix process already killed
2024/01/26 09:16:07.018 DEBUG   libreoffice-api.libreoffice     got exit code 81, e.g., LibreOffice first start
2024/01/26 09:16:07.019 DEBUG   libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin start unix process: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/b37f8243-f34b-4139-9582-1bae51c8a322/9a7337ef-b43d-4ec8-879a-8398927ed189 --accept=socket,host=127.0.0.1,port=35721,tcpNoDelay=1;urp;StarOffice.ComponentContext
2024/01/26 09:16:07.019 DEBUG   libreoffice-api.libreoffice     waiting for the LibreOffice socket to be available...
2024/01/26 09:16:07.960 DEBUG   libreoffice-api.libreoffice     LibreOffice socket available
2024/01/26 09:16:07.960 DEBUG   libreoffice-api.libreoffice     process successfully started
[SYSTEM] libreoffice-api: LibreOffice automatically started
[SYSTEM] api: server listening on port 3000
Docker version 24.0.6, build ed223bc

No idea what's happening there.

@AlexKvrlp
Copy link
Author

ok, this could be that the problem. For development I'm using 24.0.6 too. On production is 19.03.12. But did the docker version matters?

@gulien
Copy link
Collaborator

gulien commented Jan 26, 2024

🤷‍♂️ I don't know to be honest.

But your/my local are using a Docker version which is a few major releases away from your production version. I'm not familiar with the Docker versioning system, but I guess it might have some impact.

@AlexKvrlp
Copy link
Author

Ok, I'll instruct our server providers to update docker to 24.0.6. . I'll let you know if it helped. But it could take a few days until I get a response.

Thanks again

@AlexKvrlp
Copy link
Author

Are you using the same version of Docker locally and on your production server? Anyway, I wonder if LibreOffice, in your case and for whatever reason, does not require a first start.

In my case it is working as expected:


  _____     __           __               
 / ___/__  / /____ ___  / /  ___ _______ _
/ (_ / _ \/ __/ -_) _ \/ _ \/ -_) __/ _ '/
\___/\___/\__/\__/_//_/_.__/\__/_/  \_, / 
                                   /___/

A Docker-powered stateless API for PDF files.
Version: 8.0.2
-------------------------------------------------------
[SYSTEM] modules: api chromium libreoffice libreoffice-api libreoffice-pdfengine logging pdfcpu pdfengines pdftk prometheus qpdf webhook 
2024/01/26 09:16:06.678 DEBUG   libreoffice-api.libreoffice     start process
[SYSTEM] pdfengines: libreoffice-pdfengine pdfcpu pdftk qpdf
2024/01/26 09:16:06.678 DEBUG   libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin start unix process: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/b37f8243-f34b-4139-9582-1bae51c8a322/9a7337ef-b43d-4ec8-879a-8398927ed189 --accept=socket,host=127.0.0.1,port=35721,tcpNoDelay=1;urp;StarOffice.ComponentContext
[SYSTEM] prometheus: collecting metrics
[SYSTEM] chromium: Chromium ready to start
2024/01/26 09:16:07.018 DEBUG   libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin unix process already killed
2024/01/26 09:16:07.018 DEBUG   libreoffice-api.libreoffice     got exit code 81, e.g., LibreOffice first start
2024/01/26 09:16:07.019 DEBUG   libreoffice-api.libreoffice.usrliblibreofficeprogramsoffice.bin start unix process: /usr/lib/libreoffice/program/soffice.bin --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/b37f8243-f34b-4139-9582-1bae51c8a322/9a7337ef-b43d-4ec8-879a-8398927ed189 --accept=socket,host=127.0.0.1,port=35721,tcpNoDelay=1;urp;StarOffice.ComponentContext
2024/01/26 09:16:07.019 DEBUG   libreoffice-api.libreoffice     waiting for the LibreOffice socket to be available...
2024/01/26 09:16:07.960 DEBUG   libreoffice-api.libreoffice     LibreOffice socket available
2024/01/26 09:16:07.960 DEBUG   libreoffice-api.libreoffice     process successfully started
[SYSTEM] libreoffice-api: LibreOffice automatically started
[SYSTEM] api: server listening on port 3000
Docker version 24.0.6, build ed223bc

No idea what's happening there.

@gulien I'm very sorry but I'm only now seeing that you haven't made any conversion here. The error always only occurs for me when I convert a Docx file or start the container with the argument --libreoffice-auto-start

I will report if docker upgrade is done.

@gulien
Copy link
Collaborator

gulien commented Jan 26, 2024

Yes but I've started Gotenberg with the argument --libreoffice-auto-start

@AlexKvrlp
Copy link
Author

Oh yes, sorry missed [SYSTEM] libreoffice-api: LibreOffice automatically started

@AlexKvrlp
Copy link
Author

I got a quick response from our server provider. We are using Debian 9. The highest possible docker version is 19.03.15.
But the problem still exists with 19.03.15.
I will try to reproduce the error on an other system and will report as soon I have an result.

@JocoLabs
Copy link

JocoLabs commented Jan 27, 2024

Hi,

In all instances (using your backport, and vanilla), the logged error is
convert to PDF: context deadline exceeded.
The timeout is already 90s ... i just increased it to 120s to eliminate that from being the issue (not having enough time, 120 should be more than enough for our small pdf conversion jobs).

I did see something on SO about someone testing routes in Go, and if too many came in, it would throw that error almost instantly... it seemed related to the context timeout being tied to ALL of them, and not just a single one.

During the right times of day, it would be possible for me to send enough docs to the api that it might just be hitting that 90s for all in queue (and not 90 per item).

I dont know Go well enough to see if that is the case with your code.

Lastly, i can fall back to 7.4.2, i dont want to take up too much time if im an edge case.

EDIT
Actually... reviewing the curl logs of the failed calls, the upload content seems too small. I will take a look at the files being sent... maybe this is a goose chase and something else is wrong on my end.

@gulien
Copy link
Collaborator

gulien commented Jan 27, 2024

@JocoLabs are you using the LibreOffice module in « stateless » mode in version 7.4.2 (i.e., a dedicated LibreOffice instance per conversion)?

In current version, Gotenberg has only one LibreOffice instance running and a lock mechanism to ensure that one and only one conversion is done at a time.

If there are a lot of requests incoming, it may be possible that some may not be able to acquire the lock for a conversion before timing out.

You can checkout the queue size of LibreOffice with https://gotenberg.dev/docs/routes#metrics-route.

I’d suggest to increase your number of Gotenberg instances to mitigate this issue.

@JocoLabs
Copy link

JocoLabs commented Jan 27, 2024

@gulien I was unaware of how LO was running in any version. I just had it running with out of the box settings, and saw next to zero issues.

If 7.4.2 was running stateless, and 8.x is stateful; is there a flag to make it stateless in 8.x?

I am doing a final trial of four instances with timeouts at 500s. If i see enough failures, i will just roll back.
The math doesn't make sense to have two instances with out of the box settings and next to no errors and move to something that requires additional instances, and daily errors.

Either way, thanks again for the effort with this.

EDIT:
I also updated my code to retry on 503... that may help as well.

@gulien
Copy link
Collaborator

gulien commented Jan 27, 2024

No stateless mode now.

The main difference is that a unitary conversion is a lot faster. Also, instead of infinitely scaling LibreOffice instances inside a container, with the risk of resource starving, it is now up to your infrastructure to handle the scaling of Gotenberg instances. My point being it is now easier to define a strategy that fits your need, because most Docker orchestrators know how to scale up/down containers (and its cheap).

@JocoLabs
Copy link

JocoLabs commented Jan 27, 2024

I understand your reasoning. Thanks for the input. I will just roll back because each of my servers have 64 cores and 256g memory at my disposal. I can just tune docker to have no limits. Which is much easier than configuring a whole orch setup (which will end up using the same amount of resources anyway).

Thanks again, and hopefully AlexKvrlp gets his stuff fixed up.

EDIT:
I will spin up a prometheus instance so i can see what my load really looks like (a lot of these conversions are small 25k docx files).

@JocoLabs
Copy link

Dumb question (because i have limited knowledge on the topic). Thinking about your auto scaling comments; if auto scaling is based on resource usage, and you only run a single instance of LO doing one conversion at a time (managing your own internal work queue), what would actually trigger the orchestrator to start new instances?

Lastly, because im getting off topic now, if there is further discussion here (maybe for others silent, but having a similar issue), would it be helpful to start a new issue for this topic (scaling)?

@gulien
Copy link
Collaborator

gulien commented Jan 29, 2024

I mean, in your case, you may just spin up more instances by default. Best option is to check que gotenberg_libreoffice_requests_queue_size overtime in a day, and find out what could be the best number of instances to handle all your scenarios. The same metric may be used to auto-scale your number of instances, but maybe that's overkill for your use case.

@AlexKvrlp
Copy link
Author

I got a quick response from our server provider. We are using Debian 9. The highest possible docker version is 19.03.15. But the problem still exists with 19.03.15. I will try to reproduce the error on an other system and will report as soon I have an result.

I have first results:

  • on an new installed debian 9 (stretch) VM, Gotenberg fails. Same error as in our production environment.
  • on a new installed debian 10 (buster) VM, there is no error. Also if the docker components have the same version like the debian 9 machine

So this error occurs only on debian 9 (stretch). I will try to find a workaround an will report it here, if there is any. If not, it's ok for me too. We will upgrade the production os asap. Maybe a note in the documentation would be an option.

@gulien
Copy link
Collaborator

gulien commented Jan 29, 2024

Thanks for the details @AlexKvrlp 👍

@gulien gulien added documentation Documentation update request and removed bug Something isn't working pdf-engines libreoffice maybe labels Jan 29, 2024
@gulien
Copy link
Collaborator

gulien commented Feb 11, 2024

Documentation has been updated 👍

@gulien gulien closed this as completed Feb 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation update request
Projects
None yet
Development

No branches or pull requests

3 participants