-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid utf-8 in EDID decode Extra descriptor #352
Comments
I should add that if edid text fields are always suppose to decode to ASCII and not UTF-8, perhaps any invalid ASCII characters could just be filtered out or replaced. |
As per the VESA ENHANCED EXTENDED DISPLAY IDENTIFICATION DATA STANDARD (Defines EDID Structure Version 1, Revision 4) Release A, Revision 2 September 25, 2006 the contents of Alphanumeric Data String Descriptor Definition (tag #FE), which ddcutil refers to as the Extra Descriptor, consists of ASCII characters. This is also documented as code page 437 in the EDID Display Descriptors section of the Wikipedia EDID page. This descriptor is reported in the EDID summary for purely informational purposes. Usually it is blank. If we are occasionally seeing non-ASCII characters in the string, the simplest solution woud be to simply not report the field. |
Further, slightly more awake comments. What the EDID spec considers as ASCII is defined in its Appendix E - ASCII Reference Tables. Table ASCII-II, which defines code points x80..xFF, indeed appears to be identical to Microsoft Code Page 437, as stated in the Wikipedia spec. In that table, x80 is defined as C-Cedilla. The string in question is not UTF-8 encoded. x80 would only appear as the 2nd, 3rd or 4th byte of a multi-byte character encoding. In ISO 8859-1, x80 is undefined and in a range reserved for control characters. In any event, the x80 is probably just someone's sloppiness or an attempt at a clever encoding that only their software understood. ddcutil could emit a character sequence such as "" for characters in the range x80..xFF, but as I noted for our purposes the Extra Descriptor is just a curiosity and not worth a lot off effort. On the other hand, if a character in the range x80..xFF appeared in a model name or serial number that would be problematic. As it happens, there was a time I swam in the character encoding swamp. I attended several of the early Unicode Technical Committee meetings on behalf of the Research Libraries Group for which I did work at the time. Their cross-library database allowed for cataloging materials using most of the world's scripts (Latin, Cerillic, Hebrew, Arabic, Chinese, Japanese, and Korean, IIRC). The notable exception was Devanagari. |
Because I've dealt with this in For
Down here in NZ, many of my clients had not even progressed beyond uppercase EBCDIC/ASCII. |
I have modified the EDID report output to replace any character having value 127 with the string "", where HH is the hex value of the character. |
The output from
ddcutil detect --verbose
may sometimes include invalid utf-8 characters. This tripped up the python in vdu_controls:vdu_controls/pull/49
In this case, it appears the decode of the edid has yielded invalid utf8. I dealt with the above issue inside vdu_controls by filtering/escaping the offending character sequence.
Should ddcutil/libddcutil handle decode errors by removing, replacing, or escaping the offending characters? Or is it a problem for the client side? This is not an urgent issue, but just something to consider when you have some spare time.
The reason I raise this now is that working on
ddcutil-dbus-server
caused me to reconsider where the issue should be handled. I've discovered that C-g-dbus and python-dasbus are both capable of just passing the bad uft8 along to the client. For the moment, I've added a sanitize_utf8() toddcutil-dbus-server
which replaces invalid utf8 with?
. (My sanitize implementation relies on the g-lib g_utf8_validate() function, so I guess it's not suitable for non g-lib code.)The text was updated successfully, but these errors were encountered: