Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-48923] Core should use UTF-8 by default #3231

Closed
wants to merge 2 commits into from
Closed

[JENKINS-48923] Core should use UTF-8 by default #3231

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Jan 13, 2018

See JENKINS-48923.

This started off as a follow-up to PR #3210 (see #3210 (comment) and PR #3224), but it appeared that fixing the one use of the default charset could be problematic, as there are several other uses of the default charset.

So, I have attempted to change the charset that Jenkins uses by default to UTF-8 across core.

This should be considered an experimental, work-in-progress PR.

Proposed changelog entries

  • Major Enhancement: Switched to UTF-8 by default

Submitter checklist

  • JIRA issue is well described
  • Changelog entry appropriate for the audience affected by the change (users or developer, depending on the change). Examples
    * Use the Internal: prefix if the change has no user-visible impact (API, test frameworks, etc.)
  • Appropriate autotests or explanation to why this change has no tests
  • For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@daniel-beck
@oleg-nenashev

@ghost ghost mentioned this pull request Jan 13, 2018
4 tasks
@oleg-nenashev
Copy link
Member

Although I generally I agree with the change, it may have a serious impact on instances using another encoding, because they may improperly read previously saved files after the migration. It needs to be really well justified and tested if we want to merge it

@oleg-nenashev oleg-nenashev added the needs-more-reviews Complex change, which would benefit from more eyes label Jan 17, 2018
// It's not possible to retrieve the charset that the writer is using;
// however, for all uses of this constructor, the writer is an instance
// of StringWriter, so it's okay to assume UTF-8.
this(new WriterOutputStream(w), StandardCharsets.UTF_8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WriterOutputStream constructor needs to specify the encoding too, I think, or you will get junk.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added.

@@ -1697,15 +1709,14 @@ protected final void execute(@Nonnull RunExecution job) {
long start = System.currentTimeMillis();

try {
Computer computer = Computer.currentComputer();
if (computer != null) {
setCharset(computer.getDefaultCharset());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is what I think is really wrong. We should change the charset of all builds to be UTF-8, unconditionally, so that miscellaneous messages can always be safely written to the log file; but make sure that anything which copies external process output (namely, Launcher) transcodes that output on either the master or remote side.

Cf. my corresponding changes for Pipeline in jenkinsci/workflow-job-plugin#27 and jenkinsci/durable-task-plugin#29, with integration test (ShellStepTest.encoding) in jenkinsci/workflow-durable-task-step-plugin#21.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure you are right. Ideally only UTF-8 would be used, though I was hesitant to go through and rip out anything relating to charsets. Right now I have changed the implementations of Computer.getDefaultCharset() to return StandardCharsets.UTF_8, so effectively this is what is being done, but without removing the scaffolding just yet.

@jglick jglick added the work-in-progress The PR is under active development, not ready to the final review label Jan 19, 2018
* @since TODO
*/
public final void setCharset(Charset charset) {
this.charsetInstance = charset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not enough. onLoad must also restore charsetInstance based on the persisted charset, or getCharset will wind up always returning UTF-8 after a restart.

@oleg-nenashev oleg-nenashev added the stalled The PR is reasonable and might be merged, but it is no longer active. It can be taken over by other label Feb 3, 2019
@oleg-nenashev
Copy link
Member

This PR needs to address comments && there is a serious merge conflict there.
Daniel Trebbien has deleted his GitHub account, so I would not expect actions in this PR. Closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-more-reviews Complex change, which would benefit from more eyes stalled The PR is reasonable and might be merged, but it is no longer active. It can be taken over by other work-in-progress The PR is under active development, not ready to the final review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants