Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Commit

Permalink
fix(tika): adapt to Gotenberg 7 API
Browse files Browse the repository at this point in the history
This commit adapts to the latest breaking changes from Gotenberg 7.
It also freezes the usage of the Gotenberg server to v7.x. Doing
this prevents further breaking changes leaking in our code base.

* refs #1250
  • Loading branch information
Tooa committed Aug 27, 2021
1 parent cd43bc1 commit 2dcacae
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 15 deletions.
4 changes: 2 additions & 2 deletions docker/compose/docker-compose.postgres-tika.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,10 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998

gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
CHROMIUM_DISABLE_ROUTES: 1

tika:
image: apache/tika
Expand Down
4 changes: 2 additions & 2 deletions docker/compose/docker-compose.sqlite-tika.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,10 @@ services:
PAPERLESS_TIKA_ENDPOINT: http://tika:9998

gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
CHROMIUM_DISABLE_ROUTES: 1

tika:
image: apache/tika
Expand Down
6 changes: 3 additions & 3 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ Tika settings
#############

Paperless can make use of `Tika <https://tika.apache.org/>`_ and
`Gotenberg <https://thecodingmachine.github.io/gotenberg/>`_ for parsing and
`Gotenberg <https://gotenberg.dev/>`_ for parsing and
converting "Office" documents (such as ".doc", ".xlsx" and ".odt"). If you
wish to use this, you must provide a Tika server and a Gotenberg server,
configure their endpoints, and enable the feature.
Expand Down Expand Up @@ -444,10 +444,10 @@ requires are as follows:
# ...
gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
CHROMIUM_DISABLE_ROUTES: 1
tika:
image: apache/tika
Expand Down
12 changes: 6 additions & 6 deletions docs/troubleshooting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,22 +101,22 @@ You may experience these errors when using the optional TIKA integration:

.. code::
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/convert/office
requests.exceptions.HTTPError: 504 Server Error: Gateway Timeout for url: http://gotenberg:3000/forms/libreoffice/convert
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 10 seconds.
Gotenberg is a server that converts Office documents into PDF documents and has a default timeout of 30 seconds.
When conversion takes longer, Gotenberg raises this error.

You can increase the timeout by configuring an environment variable for gotenberg (see also `here <https://thecodingmachine.github.io/gotenberg/#environment_variables.default_wait_timeout>`__).
You can increase the timeout by configuring an environment variable for Gotenberg (see also `here <https://gotenberg.dev/docs/modules/api#properties>`__).
If using docker-compose, this is achieved by the following configuration change in the ``docker-compose.yml`` file:

.. code:: yaml
gotenberg:
image: thecodingmachine/gotenberg
image: gotenberg/gotenberg:7
restart: unless-stopped
environment:
DISABLE_GOOGLE_CHROME: 1
DEFAULT_WAIT_TIMEOUT: 30
CHROMIUM_DISABLE_ROUTES: 1
API_PROCESS_TIMEOUT: 60
Permission denied errors in the consumption directory
#####################################################
Expand Down
2 changes: 1 addition & 1 deletion scripts/start_services.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
docker run -p 5432:5432 -e POSTGRES_PASSWORD=password -v paperless_pgdata:/var/lib/postgresql/data -d postgres:13
docker run -d -p 6379:6379 redis:latest
docker run -p 3000:3000 -d thecodingmachine/gotenberg
docker run -p 3000:3000 -d gotenberg/gotenberg:7
docker run -p 9998:9998 -d apache/tika
2 changes: 1 addition & 1 deletion src/paperless_tika/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def parse(self, document_path, mime_type, file_name=None):
def convert_to_pdf(self, document_path, file_name):
pdf_path = os.path.join(self.tempdir, "convert.pdf")
gotenberg_server = settings.PAPERLESS_TIKA_GOTENBERG_ENDPOINT
url = gotenberg_server + "/convert/office"
url = gotenberg_server + "/forms/libreoffice/convert"

self.log("info", f"Converting {document_path} to PDF as {pdf_path}")
files = {"files": (file_name or os.path.basename(document_path),
Expand Down

0 comments on commit 2dcacae

Please sign in to comment.