Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open uncompressed PDF file #3677

Open
abdulbadii opened this issue Nov 5, 2023 · 15 comments
Open

Cannot open uncompressed PDF file #3677

abdulbadii opened this issue Nov 5, 2023 · 15 comments

Comments

@abdulbadii
Copy link

abdulbadii commented Nov 5, 2023

Need at editing an uncompressed PDF file but suddenly geany seems to be as ignoring it completely i.e. cannot open that uncompressed PDF file
Anyone knowledgeable what actually is going on such ?

@dolik-rce
Copy link
Contributor

My guess would be a problem with encoding. Did you check the status tab in the bottom panel? It should tell you if there is some issue with the file being opened.

@abdulbadii
Copy link
Author

There's message:
"File 'some.PDF' does not look like a text file or the file encoding not supported"

I thought till now geany is capable of opening all code pages

Any viable workaround ?

@elextr
Copy link
Member

elextr commented Nov 5, 2023

Some points:

  1. Geany will not load any file which contains NUL bytes after converting the encoding to UTF-8
  2. There is no guaranteed way to detect encoding of a file
  3. PDFs can contain image data which can contain NUL bytes and is not text that can be converted to UTF-8
  4. The message "File 'some.PDF' does not look like a text file or the file encoding not supported" means one o fthe above occurred for every encoding Geany knows.

@b4n
Copy link
Member

b4n commented Nov 6, 2023

Any viable workaround ?

Viable I don't know, but I heard some people working around NUL bytes by replacing those with a placeholder and back after edition. Something like sed -i 's/\0/%%NUL%%/g' file && geany file && sed -i 's/%%NUL%%/\x00/g/' or along the idea. Something like that should work, but I never used ir myself.

@elextr
Copy link
Member

elextr commented Nov 7, 2023

Still the image data may not be convertible to UTF-8, its just a sequence of bytes, not any encoding.

@kugel-
Copy link
Member

kugel- commented Nov 7, 2023

base64 then, the solution to just about anything?

@elextr
Copy link
Member

elextr commented Nov 7, 2023

base64 then, the solution to just about anything?

:-)

Well, base 64 would be a good solution, if iconv encoding converters recognised the image data inside the PDF and converted it to base 64 instead of just crapping out when the random bytes in the image do not make a valid encoding. (Image or any other binary that can be embedded in PDF).

If base 64 or any other no-NULs format is allowed in PDFs maybe the OP could use one of the PDF converters to do so before editing it in Geany.

But probably the best answer is for the OP to use an actual PDF editor, not try to edit PDF content in a text editor.

@b4n
Copy link
Member

b4n commented Nov 7, 2023

Still the image data may not be convertible to UTF-8, its just a sequence of bytes, not any encoding.

UTF-8 no, but many encodings are actually "a sequence of bytes", which is the reason why choosing the right one when opening is so hard (basically, it's a guessing game if you don't know already the answer). Any stream of bytes its gonna be valid in one of the ISO-8859-* encodings, the only reason we don't accept them is that we refuse NUL bytes.

@elextr
Copy link
Member

elextr commented Nov 7, 2023

Ok, ... convertable to UTF-8 with no NULs ...

Picky picky mumble mumble... :-)

@b4n
Copy link
Member

b4n commented Nov 7, 2023

Yes, any ISO-8859-* stream with no NULs is convertible to UTF-8 with no NULs. Or am I missing something? I don't think any of ISO-8859-* has things Unicode cannot represent :)

@elextr
Copy link
Member

elextr commented Nov 7, 2023

Yes, but image data is very likely to have NUL bytes which those encodings convert to NULs IIRC. My point is that the text in the PDF will be convertable by some encoding just fine, but embedded images run through the same converter will get garbage and likely NULs. The iconv converters don't know about embedded images to skip them.

@b4n
Copy link
Member

b4n commented Nov 7, 2023

Sure, it won't be a convenient experience, and if the non-binary data is not single-byte encoded it's unlikely to be really usable as even stripping/replacing the NULs will not allow to convert to that multi-byte encoding if it has stricter rules than "any byte goes anywhere" (like UTF-8).

Anyway, any solution for editing binary data is gonna be sub-optimal if not specialized for that type of data. Geany knows binary data that represent text, anything else it doesn't. Even real hex editors are usually a pain if they don't have specific support for the format -- but still, they permit to do some useful things sometimes.

@elextr
Copy link
Member

elextr commented Nov 7, 2023

So to summarise, for PDF files use a PDF editor, for image files use an image editor, for pure text files use Geany.

@eht16
Copy link
Member

eht16 commented Dec 20, 2023

I propose to close this, it's highly unlikely Geany will ever become a PDF editor. Or is there any reason in keeping this open?

@b4n
Copy link
Member

b4n commented Dec 20, 2023

Yeah it's probably fine to close.

FWIW, I have code on top of my encoding PRs for opening binary files, but that's limited to the loading and encoding management, not adapting all code to work with NULs. Yet, search seems to work fairly well @elextr 😉
Anyway, I'm not sure we're gonna merge it, as it's not 100% trivial and doesn't necessarily add a lot of value if most features are half-broken -- however it could help with viewing broken log files or fixing a tiny corruption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants