Invalid utf-8 in EDID decode Extra descriptor #352

digitaltrails · 2023-11-19T23:04:35Z

The output from ddcutil detect --verbose may sometimes include invalid utf-8 characters. This tripped up the python in vdu_controls:
vdu_controls/pull/49

In this case, it appears the decode of the edid has yielded invalid utf8. I dealt with the above issue inside vdu_controls by filtering/escaping the offending character sequence.

Should ddcutil/libddcutil handle decode errors by removing, replacing, or escaping the offending characters? Or is it a problem for the client side? This is not an urgent issue, but just something to consider when you have some spare time.

The reason I raise this now is that working on ddcutil-dbus-server caused me to reconsider where the issue should be handled. I've discovered that C-g-dbus and python-dasbus are both capable of just passing the bad uft8 along to the client. For the moment, I've added a sanitize_utf8() to ddcutil-dbus-server which replaces invalid utf8 with ?. (My sanitize implementation relies on the g-lib g_utf8_validate() function, so I guess it's not suitable for non g-lib code.)

The text was updated successfully, but these errors were encountered:

digitaltrails · 2023-11-20T00:11:26Z

I should add that if edid text fields are always suppose to decode to ASCII and not UTF-8, perhaps any invalid ASCII characters could just be filtered out or replaced.

rockowitz · 2023-11-20T10:00:12Z

As per the VESA ENHANCED EXTENDED DISPLAY IDENTIFICATION DATA STANDARD (Defines EDID Structure Version 1, Revision 4) Release A, Revision 2 September 25, 2006 the contents of Alphanumeric Data String Descriptor Definition (tag #FE), which ddcutil refers to as the Extra Descriptor, consists of ASCII characters. This is also documented as code page 437 in the EDID Display Descriptors section of the Wikipedia EDID page.

This descriptor is reported in the EDID summary for purely informational purposes. Usually it is blank. If we are occasionally seeing non-ASCII characters in the string, the simplest solution woud be to simply not report the field.

rockowitz · 2023-11-20T17:57:19Z

Further, slightly more awake comments. What the EDID spec considers as ASCII is defined in its Appendix E - ASCII Reference Tables. Table ASCII-II, which defines code points x80..xFF, indeed appears to be identical to Microsoft Code Page 437, as stated in the Wikipedia spec. In that table, x80 is defined as C-Cedilla. The string in question is not UTF-8 encoded. x80 would only appear as the 2nd, 3rd or 4th byte of a multi-byte character encoding. In ISO 8859-1, x80 is undefined and in a range reserved for control characters.

In any event, the x80 is probably just someone's sloppiness or an attempt at a clever encoding that only their software understood. ddcutil could emit a character sequence such as "" for characters in the range x80..xFF, but as I noted for our purposes the Extra Descriptor is just a curiosity and not worth a lot off effort. On the other hand, if a character in the range x80..xFF appeared in a model name or serial number that would be problematic.

As it happens, there was a time I swam in the character encoding swamp. I attended several of the early Unicode Technical Committee meetings on behalf of the Research Libraries Group for which I did work at the time. Their cross-library database allowed for cataloging materials using most of the world's scripts (Latin, Cerillic, Hebrew, Arabic, Chinese, Japanese, and Korean, IIRC). The notable exception was Devanagari.

digitaltrails · 2023-11-20T23:04:20Z

Because I've dealt with this in vdu_controls, I don't have any issues with what the ddcutil command produces.

For ddcutil-dbus-server I'm currently filtering text fields to remove bad UTF-8, but I will simplify that to filter non-printable ASCII.

As it happens, there was a time I swam in the character encoding swamp. I attended several of the early Unicode Technical Committee meetings on behalf of the Research Libraries Group for which I did work at the time. Their cross-library database allowed for cataloging materials using most of the world's scripts (Latin, Cerillic, Hebrew, Arabic, Chinese, Japanese, and Korean, IIRC). The notable exception was Devanagari.

Down here in NZ, many of my clients had not even progressed beyond uppercase EBCDIC/ASCII.

rockowitz · 2023-11-22T04:07:20Z

I have modified the EDID report output to replace any character having value 127 with the string "", where HH is the hex value of the character.

rockowitz added bug Fixed enhancement labels Nov 22, 2023

rockowitz closed this as completed Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid utf-8 in EDID decode Extra descriptor #352

Invalid utf-8 in EDID decode Extra descriptor #352

digitaltrails commented Nov 19, 2023

digitaltrails commented Nov 20, 2023

rockowitz commented Nov 20, 2023 •

edited

Loading

rockowitz commented Nov 20, 2023

digitaltrails commented Nov 20, 2023

rockowitz commented Nov 22, 2023

Invalid utf-8 in EDID decode Extra descriptor #352

Invalid utf-8 in EDID decode Extra descriptor #352

Comments

digitaltrails commented Nov 19, 2023

digitaltrails commented Nov 20, 2023

rockowitz commented Nov 20, 2023 • edited Loading

rockowitz commented Nov 20, 2023

digitaltrails commented Nov 20, 2023

rockowitz commented Nov 22, 2023

rockowitz commented Nov 20, 2023 •

edited

Loading