Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate encodings and localisation #39

Open
RReverser opened this issue Sep 29, 2022 · 12 comments
Open

Investigate encodings and localisation #39

RReverser opened this issue Sep 29, 2022 · 12 comments

Comments

@RReverser
Copy link
Collaborator

I've been looking a bit into how libgphoto2 deals with encodings, and found that it actually respects system localisation + some undocumented APIs for overriding localisation/encodings. What worries me there is that I'm not certain that non-English strings are guaranteed to be encoded in UTF-8 by libgphoto2, in which case perhaps we don't have a right to return them as String, or perhaps we at least have to customize encoding upon startup to enforce UTF-8 or something... As I said, I only started looking at this and there's a bit of uncertainty here. Now that I added test framework, I think it would be helpful to try and verify those assumptions and check what happens with non-ASCII characters.

Originally posted by @RReverser in #27 (comment)

@maxicarlos08
Copy link
Owner

maxicarlos08 commented Oct 1, 2022

I don't have much C experience (and less with the locales, only heard bad things about it), but isn't there a function like setlocale we can use to just set the locale to C to get consistent results regardless of the operating's system locale?

Sorry if what I'm saying makes no sense at all :/

@maxicarlos08
Copy link
Owner

maxicarlos08 commented Oct 1, 2022

Wait nevermind, libgphoto2 already does that: https://github.com/gphoto/libgphoto2/blob/master/ChangeLog#L2249

@RReverser
Copy link
Collaborator Author

Yeah it does, as well as it has an internal function for overriding the encoding specifically. All I'm saying is that it's worth testing and adding whatever's necessary to make sure that Unicode strings from camera are correctly parsed and converted to UTF-8 that Rust expects regardless of LC_ALL=... environment variable or any other settings (I suspect right now if user sets LC_ALL to a non-UTF8 encoding, and there are non-ASCII strings, we'll get either corruption or runtime panics).

@RReverser
Copy link
Collaborator Author

If you have camera at hand and it allows settings some strings (e.g. Canon allows to set copyright author name via Digital Photo Professional), it would be good to set that to e.g. Chinese characters, overriding LC_ALL to some non-UTF8 encoding and checking how those characters are parsed by gphoto2-rs.

@maxicarlos08
Copy link
Owner

I added an example to test this, unfortunately my camera (a Nikon D3400) does not seem to have any text that I can freely set.

If you have a camera where you can set a text, just adjust the CONFIG_KEY in the example to the widget name.

@RReverser
Copy link
Collaborator Author

I added an example to test this, unfortunately my camera (a Nikon D3400) does not seem to have any text that I can freely set.

It should have the same copyright field (https://www.dummies.com/article/home-auto-hobbies/photography/add-copyright-notice-images-nikon-d5600-245525/), but seems like it can be set only from the camera, which likely means no Unicode chars.

@RReverser
Copy link
Collaborator Author

Actually, one other thing you could do is just create a file with unicode chars via file manager when camera is connected, and then checking how it's reported via CameraFS?

@maxicarlos08
Copy link
Owner

Actually, one other thing you could do is just create a file with unicode chars via file manager when camera is connected, and then checking how it's reported via CameraFS?

My camera sadly only exposes images to libgphoto2 it has taken and match the naming format :(

I suspect right now if user sets LC_ALL to a non-UTF8 encoding, and there are non-ASCII strings, we'll get either corruption or runtime panics

Will we? AFAICT we are using String::from_utf8_lossy wherever we get a string from libgphoto2, so invalid UTF-8 should not cause a pash

@RReverser
Copy link
Collaborator Author

so invalid UTF-8 should not cause a pash

Those I count as corruption :P

My camera sadly only exposes images to libgphoto2 it has taken and match the naming format :(

Hm, are you saying that manually added files via file manager are not visible via CameraFS? Fascinating.

@RReverser
Copy link
Collaborator Author

Anyway, I'll add some unicode chars to the virtual camera to make this easier to verify.

@maxicarlos08
Copy link
Owner

Anyway, I'll add some unicode chars to the virtual camera to make this easier to verify.

WDYM? I haven't found any writable TextWidget on the Virtual camera. You don't seem to be able to create or upload files/directories on the virtual camera (the storage access type is RoDelete)

@RReverser
Copy link
Collaborator Author

I mean in libgphoto2 itself :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants