Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extr.c? #242

Open
hwhw opened this issue Oct 21, 2014 · 10 comments
Open

Remove extr.c? #242

hwhw opened this issue Oct 21, 2014 · 10 comments

Comments

@hwhw
Copy link
Member

hwhw commented Oct 21, 2014

extr.c is code written by @tigran123 to build a mupdf-based attachment extractor. That was used with kindlevncviewer, but with koreader, we currently do not support such attachments.

My proposal would be to remove the code since now it's a bit distracting.
On the other hand, there's no real need to do so other than keeping our code footprint small.

I would offer to introduce an "attachment API" based on the extr.c code into the mupdf interface, so in perspective, we could add extraction (and probably attachment listing) functionality to KoReader, too.

@tigran123
Copy link
Member

The two pieces of code: extr.c and pdfattach complement each other and were written for the purpose of working around Amazon's limitation of rejecting non-PDF/mobi files from submission to their cloud storage accessible directly from Kindle device. Namely, one would take a DjVu (or mp3 or whatever) and use pdfattach to attach it (multiple files supported) to a PDF file. Then he would send this PDF to Amazon Cloud and access it from Kindle. Then, using kindlepdfviewer, he would extract those files which are needed and view them. This is the whole purpose of it.

The pdfattach utility is quite simple (almost trivial), but extr.c is, imho, a nice illustration how to access PDF objects directly (using mupdf) and autonomously. IMHO, it should be removed after such "attachment API" is actually implemented in Koreader, not before.

PS. I assume you meant kindlepdfviewer when you wrote kindlevncviewer as this has nothing to do with VNC.

@hwhw
Copy link
Member Author

hwhw commented Oct 21, 2014

Yes, of course I meant KPV. OK, I agree. We should have the API for accessing attachments - since that is a useful feature in any case, I guess. I'm not sure if we should actively promote the Amazon Cloud storage, errm, feature, but that's a different issue.

@tigran123
Copy link
Member

Btw, in case someone looks at the actual source code of extr.c I should mention that the obvious memory leak in the function save_attachments() is intentional (strdup(3) is called, but no free(3)), because this is an utility designed to be executed and exited, thus destroying its address space on termination. There is no free(3) for strdup(3) because it would slow down the program unnecessarily. But if the function is copied "as is" inside a long-lived program like koreader then the memory leak should be fixed first, obviously. Otherwise on each save attachment operation it would leak a tiny bit of memory.

@chrox
Copy link
Member

chrox commented Oct 21, 2014

I' m not sure if extr.c can still be compiled with Mupdf 1.5. It hasn't been compiled since Mupdf 1.4.

@tigran123
Copy link
Member

It uses the standard pdf_load_page()/pdf_open_stream()/pdf_dict_gets() interface that is unlikely to change in a million years, let alone in a minor revision upgrade from 1.4 to 1.5. Having said that, I haven't checked whether it still compiles or not.

Now, if you really must remove these two utilities, please go ahead and do it. I have created a separate repository here:

https://github.com/tigran123/pdf-attach-extract

So if you need them in the future to refer to when writing attachment display/extraction API in koreader you can always refer to the above repository.

@benoit-pierre
Copy link
Contributor

benoit-pierre commented May 29, 2024

I does not (compile): the API has changed.

@Frenzie
Copy link
Member

Frenzie commented May 29, 2024

It's probably not too hard to update; the API mostly just added an extra pointer or two here and there. (Except for the highlights; iirc that changed quite significantly but that's not relevant here.) I wouldn't necessarily rush to delete it, but it's worth noting that it's inspired by what used to be called mupdfshow, now pdfshow https://github.com/ArtifexSoftware/mupdf/blob/6d4ff647eaaa70b35813f31fb5204ea7b668b9e9/source/tools/pdfshow.c

@tigran123
Copy link
Member

Wow, I expected the API not to change in a million years, but it did in just 12 :)
But then again, when I left the research on neural networks in the early 1990s and switched to Linux kernel development I honestly did not expect that 30 years later I would be chatting to a very intelligent LLM, nor that I would write a chat system myself: http://sigmaai.zapto.org :)

@Frenzie
Copy link
Member

Frenzie commented May 29, 2024

I did some archeology:
koreader/kindlepdfviewer#487
koreader/kindlepdfviewer#488

So there used to be a functionality to press Alt+S to save all attachments on the current page into the directory.

@tigran123
Copy link
Member

Besides, removing some code I wrote from an opensource project may affect my free access to GitHub Copilot, which I have discovered only the day before yesterday and now am using all the time when working on my Sigma AI project :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants