Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Add document file name to API #108

Closed
bauerj opened this issue Dec 7, 2020 · 4 comments
Closed

Add document file name to API #108

bauerj opened this issue Dec 7, 2020 · 4 comments

Comments

@bauerj
Copy link
Contributor

bauerj commented Dec 7, 2020

I have found one change in the API that I perceive as a regression. The old API used to contain a file_name attribute for documents. This is gone now. I have used the file ending to choose which viewer to use (PNG files don't work in Adobe Reader for example).

This means that currently only PDF documents can be viewed in Paperless App, if Paperless-NG is the backend.

@jonaswinkler
Copy link
Owner

The endpoints for previews and downloads correctly report filenames and content type as part of the response (Content-Disposition and Content-Type). I thought having the filename in the document object would be redundant. The fetch view of OG paperless does this as well.

I can add these again, However: Paperless generates PDF documents with embedded text from images and stores both the original image and the PDF document in its media folder. The API prefers to serve the PDF over the original, if available. There's an option that forces paperless to serve originals, which I need to document. The response headers always reflect what's being served.

Some background: I really wanted paperless to be able to add selectable text to image-only documents (both pdf documents without text and pure images), and since I didn't really feel comfortable with overwriting the original documents, I've decided to keep both. This solution also allows users to retroactively add text layers to image-only documents, and keep track of which documents this has already been done for. Also: No issues when the pdf libraries decide to fail. Also: No issues when users want to move away from paperless, original documents still there.

If I were to add filenames again, each document would have both an original_file_name, and maybe an archived_file_name, if one is available. Should we do that?

I really need to get this documented.

@bauerj
Copy link
Contributor Author

bauerj commented Dec 7, 2020

I guess I could implement that by parsing the Content-Disposition header but that would mean I either had to

  1. Make a request to the download endpoint, even if the file is already downloaded, just to find out the type. Of course this would dramatically increase the time it takes to open a document.
  2. Maintain a mapping of document -> file type locally. I tried not to store any metadata locally to prevent getting it out-of-sync with the server.

I guess solution number 2 is okay but I would prefer your proposal of including original_file_name and archived_file_name as that would be cleaner.

jonaswinkler pushed a commit that referenced this issue Dec 7, 2020
@jonaswinkler
Copy link
Owner

Well, that's done. I'm sorry for making this somewhat more complicated, however, these changes were required for the new OCR mechanisms.

@bauerj
Copy link
Contributor Author

bauerj commented Dec 7, 2020

Thanks! No worries, this seems to be straight-forward 😊

@bauerj bauerj closed this as completed Dec 7, 2020
tribut pushed a commit to tribut/paperless-ng that referenced this issue Feb 20, 2022
Missing required editing of `src-ui/src/app/app.module.ts`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants