Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid utf-8 in EDID decode Extra descriptor #352

Closed
digitaltrails opened this issue Nov 19, 2023 · 5 comments
Closed

Invalid utf-8 in EDID decode Extra descriptor #352

digitaltrails opened this issue Nov 19, 2023 · 5 comments

Comments

@digitaltrails
Copy link

The output from ddcutil detect --verbose may sometimes include invalid utf-8 characters. This tripped up the python in vdu_controls:
vdu_controls/pull/49

In this case, it appears the decode of the edid has yielded invalid utf8. I dealt with the above issue inside vdu_controls by filtering/escaping the offending character sequence.

Should ddcutil/libddcutil handle decode errors by removing, replacing, or escaping the offending characters? Or is it a problem for the client side? This is not an urgent issue, but just something to consider when you have some spare time.

The reason I raise this now is that working on ddcutil-dbus-server caused me to reconsider where the issue should be handled. I've discovered that C-g-dbus and python-dasbus are both capable of just passing the bad uft8 along to the client. For the moment, I've added a sanitize_utf8() to ddcutil-dbus-server which replaces invalid utf8 with ?. (My sanitize implementation relies on the g-lib g_utf8_validate() function, so I guess it's not suitable for non g-lib code.)

@digitaltrails
Copy link
Author

I should add that if edid text fields are always suppose to decode to ASCII and not UTF-8, perhaps any invalid ASCII characters could just be filtered out or replaced.

@rockowitz
Copy link
Owner

rockowitz commented Nov 20, 2023

As per the VESA ENHANCED EXTENDED DISPLAY IDENTIFICATION DATA STANDARD (Defines EDID Structure Version 1, Revision 4) Release A, Revision 2 September 25, 2006 the contents of Alphanumeric Data String Descriptor Definition (tag #FE), which ddcutil refers to as the Extra Descriptor, consists of ASCII characters. This is also documented as code page 437 in the EDID Display Descriptors section of the Wikipedia EDID page.

This descriptor is reported in the EDID summary for purely informational purposes. Usually it is blank. If we are occasionally seeing non-ASCII characters in the string, the simplest solution woud be to simply not report the field.

@rockowitz
Copy link
Owner

Further, slightly more awake comments. What the EDID spec considers as ASCII is defined in its Appendix E - ASCII Reference Tables. Table ASCII-II, which defines code points x80..xFF, indeed appears to be identical to Microsoft Code Page 437, as stated in the Wikipedia spec. In that table, x80 is defined as C-Cedilla. The string in question is not UTF-8 encoded. x80 would only appear as the 2nd, 3rd or 4th byte of a multi-byte character encoding. In ISO 8859-1, x80 is undefined and in a range reserved for control characters.

In any event, the x80 is probably just someone's sloppiness or an attempt at a clever encoding that only their software understood. ddcutil could emit a character sequence such as "" for characters in the range x80..xFF, but as I noted for our purposes the Extra Descriptor is just a curiosity and not worth a lot off effort. On the other hand, if a character in the range x80..xFF appeared in a model name or serial number that would be problematic.

As it happens, there was a time I swam in the character encoding swamp. I attended several of the early Unicode Technical Committee meetings on behalf of the Research Libraries Group for which I did work at the time. Their cross-library database allowed for cataloging materials using most of the world's scripts (Latin, Cerillic, Hebrew, Arabic, Chinese, Japanese, and Korean, IIRC). The notable exception was Devanagari.

@digitaltrails
Copy link
Author

Because I've dealt with this in vdu_controls, I don't have any issues with what the ddcutil command produces.

For ddcutil-dbus-server I'm currently filtering text fields to remove bad UTF-8, but I will simplify that to filter non-printable ASCII.

As it happens, there was a time I swam in the character encoding swamp. I attended several of the early Unicode Technical Committee meetings on behalf of the Research Libraries Group for which I did work at the time. Their cross-library database allowed for cataloging materials using most of the world's scripts (Latin, Cerillic, Hebrew, Arabic, Chinese, Japanese, and Korean, IIRC). The notable exception was Devanagari.

Down here in NZ, many of my clients had not even progressed beyond uppercase EBCDIC/ASCII.

@rockowitz
Copy link
Owner

I have modified the EDID report output to replace any character having value 127 with the string "", where HH is the hex value of the character.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants