Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Check if byte strings are properly encoded in UTF-8 #30424
@cachedout Well, some 3rd-party packagers entering, say, German umlauts or something else (Kanji?) and those symbols aren't properly converted or weren't properly stored (wrong editor, locale etc). Often it happens in the "Description" and "Summary" fields of the package. We also found few packages in our codebase this way. So in order to hunt all them down, prevent them in a future and ask maintainer to fix it, this PR helps to do so. However, typical API user will just get a description as is (just without these symbols).
I hope it makes sense.