New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if byte strings are properly encoded in UTF-8 #30424

Merged
merged 5 commits into from Jan 19, 2016

Conversation

Projects
None yet
2 participants
@isbm
Contributor

isbm commented Jan 18, 2016

Sometimes 3rd party packages contains broken UTF-8 strings. This renders JSON output crash (although works on STDOUT and YAML). This PR fixes this problem and logs the error log about bad package.

@cachedout

This comment has been minimized.

Show comment
Hide comment
@cachedout

cachedout Jan 19, 2016

Contributor

Thanks for sending this in @isbm

Could you please clarify what you mean by "broken UTF-8"? If those strings are just simply not UTF-8 and the underlying package manager supports them, it seems like we should try to as well. Thoughts?

Contributor

cachedout commented Jan 19, 2016

Thanks for sending this in @isbm

Could you please clarify what you mean by "broken UTF-8"? If those strings are just simply not UTF-8 and the underlying package manager supports them, it seems like we should try to as well. Thoughts?

@isbm

This comment has been minimized.

Show comment
Hide comment
@isbm

isbm Jan 19, 2016

Contributor

@cachedout Well, some 3rd-party packagers entering, say, German umlauts or something else (Kanji?) and those symbols aren't properly converted or weren't properly stored (wrong editor, locale etc). Often it happens in the "Description" and "Summary" fields of the package. We also found few packages in our codebase this way. So in order to hunt all them down, prevent them in a future and ask maintainer to fix it, this PR helps to do so. However, typical API user will just get a description as is (just without these symbols).

I hope it makes sense.

Contributor

isbm commented Jan 19, 2016

@cachedout Well, some 3rd-party packagers entering, say, German umlauts or something else (Kanji?) and those symbols aren't properly converted or weren't properly stored (wrong editor, locale etc). Often it happens in the "Description" and "Summary" fields of the package. We also found few packages in our codebase this way. So in order to hunt all them down, prevent them in a future and ask maintainer to fix it, this PR helps to do so. However, typical API user will just get a description as is (just without these symbols).

I hope it makes sense.

@cachedout

This comment has been minimized.

Show comment
Hide comment
@cachedout

cachedout Jan 19, 2016

Contributor

@isbm Sounds good. Thanks for the clarification.

Contributor

cachedout commented Jan 19, 2016

@isbm Sounds good. Thanks for the clarification.

cachedout added a commit that referenced this pull request Jan 19, 2016

Merge pull request #30424 from isbm/isbm-zypper-utf-8-errors
Check if byte strings are properly encoded in UTF-8

@cachedout cachedout merged commit 05ad3dc into saltstack:2015.8 Jan 19, 2016

2 of 5 checks passed

default Merged build finished.
Details
jenkins/salt-pr-rs-cent7-n Salt PR - RS CentOS 7 #11404 — FAILURE
Details
jenkins/salt-pr-rs-ubuntu14.04-n Salt PR - RS Ubuntu 14 #8891 — FAILURE
Details
jenkins/salt-pr-clone Salt PR - Clone Repository #12821 — SUCCESS
Details
jenkins/salt-pr-lint-n Salt PR - Code Lint #12509 — SUCCESS
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment