Cannot open uncompressed PDF file #3677

abdulbadii · 2023-11-05T17:54:13Z

Need at editing an uncompressed PDF file but suddenly geany seems to be as ignoring it completely i.e. cannot open that uncompressed PDF file
Anyone knowledgeable what actually is going on such ?

dolik-rce · 2023-11-05T18:12:06Z

My guess would be a problem with encoding. Did you check the status tab in the bottom panel? It should tell you if there is some issue with the file being opened.

abdulbadii · 2023-11-05T18:30:16Z

There's message:
"File 'some.PDF' does not look like a text file or the file encoding not supported"

I thought till now geany is capable of opening all code pages

Any viable workaround ?

elextr · 2023-11-05T23:54:49Z

Some points:

Geany will not load any file which contains NUL bytes after converting the encoding to UTF-8
There is no guaranteed way to detect encoding of a file
PDFs can contain image data which can contain NUL bytes and is not text that can be converted to UTF-8
The message "File 'some.PDF' does not look like a text file or the file encoding not supported" means one o fthe above occurred for every encoding Geany knows.

b4n · 2023-11-06T13:58:02Z

Any viable workaround ?

Viable I don't know, but I heard some people working around NUL bytes by replacing those with a placeholder and back after edition. Something like sed -i 's/\0/%%NUL%%/g' file && geany file && sed -i 's/%%NUL%%/\x00/g/' or along the idea. Something like that should work, but I never used ir myself.

elextr · 2023-11-07T00:37:29Z

Still the image data may not be convertible to UTF-8, its just a sequence of bytes, not any encoding.

kugel- · 2023-11-07T06:22:30Z

base64 then, the solution to just about anything?

elextr · 2023-11-07T06:36:02Z

base64 then, the solution to just about anything?

:-)

Well, base 64 would be a good solution, if iconv encoding converters recognised the image data inside the PDF and converted it to base 64 instead of just crapping out when the random bytes in the image do not make a valid encoding. (Image or any other binary that can be embedded in PDF).

If base 64 or any other no-NULs format is allowed in PDFs maybe the OP could use one of the PDF converters to do so before editing it in Geany.

But probably the best answer is for the OP to use an actual PDF editor, not try to edit PDF content in a text editor.

b4n · 2023-11-07T14:41:05Z

Still the image data may not be convertible to UTF-8, its just a sequence of bytes, not any encoding.

UTF-8 no, but many encodings are actually "a sequence of bytes", which is the reason why choosing the right one when opening is so hard (basically, it's a guessing game if you don't know already the answer). Any stream of bytes its gonna be valid in one of the ISO-8859-* encodings, the only reason we don't accept them is that we refuse NUL bytes.

elextr · 2023-11-07T14:51:51Z

Ok, ... convertable to UTF-8 with no NULs ...

Picky picky mumble mumble... :-)

b4n · 2023-11-07T14:54:10Z

Yes, any ISO-8859-* stream with no NULs is convertible to UTF-8 with no NULs. Or am I missing something? I don't think any of ISO-8859-* has things Unicode cannot represent :)

elextr · 2023-11-07T14:58:21Z

Yes, but image data is very likely to have NUL bytes which those encodings convert to NULs IIRC. My point is that the text in the PDF will be convertable by some encoding just fine, but embedded images run through the same converter will get garbage and likely NULs. The iconv converters don't know about embedded images to skip them.

b4n · 2023-11-07T15:05:49Z

Sure, it won't be a convenient experience, and if the non-binary data is not single-byte encoded it's unlikely to be really usable as even stripping/replacing the NULs will not allow to convert to that multi-byte encoding if it has stricter rules than "any byte goes anywhere" (like UTF-8).

Anyway, any solution for editing binary data is gonna be sub-optimal if not specialized for that type of data. Geany knows binary data that represent text, anything else it doesn't. Even real hex editors are usually a pain if they don't have specific support for the format -- but still, they permit to do some useful things sometimes.

elextr · 2023-11-07T15:16:42Z

So to summarise, for PDF files use a PDF editor, for image files use an image editor, for pure text files use Geany.

eht16 · 2023-12-20T12:51:35Z

I propose to close this, it's highly unlikely Geany will ever become a PDF editor. Or is there any reason in keeping this open?

b4n · 2023-12-20T13:02:04Z

Yeah it's probably fine to close.

FWIW, I have code on top of my encoding PRs for opening binary files, but that's limited to the loading and encoding management, not adapting all code to work with NULs. Yet, search seems to work fairly well @elextr 😉
Anyway, I'm not sure we're gonna merge it, as it's not 100% trivial and doesn't necessarily add a lot of value if most features are half-broken -- however it could help with viewing broken log files or fixing a tiny corruption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot open uncompressed PDF file #3677

Cannot open uncompressed PDF file #3677

abdulbadii commented Nov 5, 2023 •

edited

dolik-rce commented Nov 5, 2023

abdulbadii commented Nov 5, 2023

elextr commented Nov 5, 2023

b4n commented Nov 6, 2023

elextr commented Nov 7, 2023

kugel- commented Nov 7, 2023

elextr commented Nov 7, 2023

b4n commented Nov 7, 2023

elextr commented Nov 7, 2023

b4n commented Nov 7, 2023

elextr commented Nov 7, 2023

b4n commented Nov 7, 2023

elextr commented Nov 7, 2023

eht16 commented Dec 20, 2023

b4n commented Dec 20, 2023

Cannot open uncompressed PDF file #3677

Cannot open uncompressed PDF file #3677

Comments

abdulbadii commented Nov 5, 2023 • edited

dolik-rce commented Nov 5, 2023

abdulbadii commented Nov 5, 2023

elextr commented Nov 5, 2023

b4n commented Nov 6, 2023

elextr commented Nov 7, 2023

kugel- commented Nov 7, 2023

elextr commented Nov 7, 2023

b4n commented Nov 7, 2023

elextr commented Nov 7, 2023

b4n commented Nov 7, 2023

elextr commented Nov 7, 2023

b4n commented Nov 7, 2023

elextr commented Nov 7, 2023

eht16 commented Dec 20, 2023

b4n commented Dec 20, 2023

abdulbadii commented Nov 5, 2023 •

edited