Add document file name to API #108

bauerj · 2020-12-07T17:27:25Z

I have found one change in the API that I perceive as a regression. The old API used to contain a file_name attribute for documents. This is gone now. I have used the file ending to choose which viewer to use (PNG files don't work in Adobe Reader for example).

This means that currently only PDF documents can be viewed in Paperless App, if Paperless-NG is the backend.

The text was updated successfully, but these errors were encountered:

jonaswinkler · 2020-12-07T18:50:24Z

The endpoints for previews and downloads correctly report filenames and content type as part of the response (Content-Disposition and Content-Type). I thought having the filename in the document object would be redundant. The fetch view of OG paperless does this as well.

I can add these again, However: Paperless generates PDF documents with embedded text from images and stores both the original image and the PDF document in its media folder. The API prefers to serve the PDF over the original, if available. There's an option that forces paperless to serve originals, which I need to document. The response headers always reflect what's being served.

Some background: I really wanted paperless to be able to add selectable text to image-only documents (both pdf documents without text and pure images), and since I didn't really feel comfortable with overwriting the original documents, I've decided to keep both. This solution also allows users to retroactively add text layers to image-only documents, and keep track of which documents this has already been done for. Also: No issues when the pdf libraries decide to fail. Also: No issues when users want to move away from paperless, original documents still there.

If I were to add filenames again, each document would have both an original_file_name, and maybe an archived_file_name, if one is available. Should we do that?

I really need to get this documented.

bauerj · 2020-12-07T19:26:37Z

I guess I could implement that by parsing the Content-Disposition header but that would mean I either had to

Make a request to the download endpoint, even if the file is already downloaded, just to find out the type. Of course this would dramatically increase the time it takes to open a document.
Maintain a mapping of document -> file type locally. I tried not to store any metadata locally to prevent getting it out-of-sync with the server.

I guess solution number 2 is okay but I would prefer your proposal of including original_file_name and archived_file_name as that would be cleaner.

jonaswinkler · 2020-12-07T20:54:16Z

Well, that's done. I'm sorry for making this somewhat more complicated, however, these changes were required for the new OCR mechanisms.

bauerj · 2020-12-07T20:57:03Z

Thanks! No worries, this seems to be straight-forward 😊

Missing required editing of `src-ui/src/app/app.module.ts`

bauerj mentioned this issue Dec 7, 2020

Paperless-NG compatibility bauerj/paperless_app#18

Closed

jonaswinkler pushed a commit that referenced this issue Dec 7, 2020

added filenames to the API #108

87fa118

bauerj closed this as completed Dec 7, 2020

tribut pushed a commit to tribut/paperless-ng that referenced this issue Feb 20, 2022

Amend instructions to add a new language (jonaswinkler#108)

fe5293b

Missing required editing of `src-ui/src/app/app.module.ts`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add document file name to API #108

Add document file name to API #108

bauerj commented Dec 7, 2020

jonaswinkler commented Dec 7, 2020

bauerj commented Dec 7, 2020

jonaswinkler commented Dec 7, 2020

bauerj commented Dec 7, 2020

Add document file name to API #108

Add document file name to API #108

Comments

bauerj commented Dec 7, 2020

jonaswinkler commented Dec 7, 2020

bauerj commented Dec 7, 2020

jonaswinkler commented Dec 7, 2020

bauerj commented Dec 7, 2020