[JENKINS-48923] Core should use UTF-8 by default #3231

ghost · 2018-01-13T22:48:54Z

This started off as a follow-up to PR #3210 (see #3210 (comment) and PR #3224), but it appeared that fixing the one use of the default charset could be problematic, as there are several other uses of the default charset.

So, I have attempted to change the charset that Jenkins uses by default to UTF-8 across core.

This should be considered an experimental, work-in-progress PR.

Proposed changelog entries

Major Enhancement: Switched to UTF-8 by default

Submitter checklist

JIRA issue is well described
Changelog entry appropriate for the audience affected by the change (users or developer, depending on the change). Examples
* Use the Internal: prefix if the change has no user-visible impact (API, test frameworks, etc.)
Appropriate autotests or explanation to why this change has no tests
For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@daniel-beck
@oleg-nenashev

oleg-nenashev · 2018-01-14T21:24:12Z

Although I generally I agree with the change, it may have a serious impact on instances using another encoding, because they may improperly read previously saved files after the migration. It needs to be really well justified and tested if we want to merge it

jglick · 2018-01-19T19:20:11Z

core/src/main/java/hudson/util/StreamTaskListener.java

+        // It's not possible to retrieve the charset that the writer is using;
+        // however, for all uses of this constructor, the writer is an instance
+        // of StringWriter, so it's okay to assume UTF-8.
+        this(new WriterOutputStream(w), StandardCharsets.UTF_8);


The WriterOutputStream constructor needs to specify the encoding too, I think, or you will get junk.

Thanks! Added.

jglick · 2018-01-19T19:26:08Z

core/src/main/java/hudson/model/Run.java

@@ -1697,15 +1709,14 @@ protected final void execute(@Nonnull RunExecution job) {
            long start = System.currentTimeMillis();

            try {
+                Computer computer = Computer.currentComputer();
+                if (computer != null) {
+                    setCharset(computer.getDefaultCharset());


This part is what I think is really wrong. We should change the charset of all builds to be UTF-8, unconditionally, so that miscellaneous messages can always be safely written to the log file; but make sure that anything which copies external process output (namely, Launcher) transcodes that output on either the master or remote side.

Cf. my corresponding changes for Pipeline in jenkinsci/workflow-job-plugin#27 and jenkinsci/durable-task-plugin#29, with integration test (ShellStepTest.encoding) in jenkinsci/workflow-durable-task-step-plugin#21.

I am sure you are right. Ideally only UTF-8 would be used, though I was hesitant to go through and rip out anything relating to charsets. Right now I have changed the implementations of Computer.getDefaultCharset() to return StandardCharsets.UTF_8, so effectively this is what is being done, but without removing the scaffolding just yet.

jglick · 2018-02-09T20:06:03Z

core/src/main/java/hudson/model/Run.java

+     * @since TODO
+     */
+    public final void setCharset(Charset charset) {
+        this.charsetInstance = charset;


Not enough. onLoad must also restore charsetInstance based on the persisted charset, or getCharset will wind up always returning UTF-8 after a restart.

oleg-nenashev · 2019-02-03T01:01:31Z

This PR needs to address comments && there is a serious merge conflict there.
Daniel Trebbien has deleted his GitHub account, so I would not expect actions in this PR. Closing for now

[JENKINS-48923] Core should use UTF-8 by default

32a6bbf

ghost mentioned this pull request Jan 13, 2018

Assume queueFile is UTF-8-encoded #3224

Closed

4 tasks

oleg-nenashev added the needs-more-reviews Complex change, which would benefit from more eyes label Jan 17, 2018

jglick reviewed Jan 19, 2018

View reviewed changes

jglick added the work-in-progress The PR is under active development, not ready to the final review label Jan 19, 2018

Pass StandardCharsets.UTF_8 to the WriterOutputStream constructor

e22276b

jglick requested changes Feb 9, 2018

View reviewed changes

jglick mentioned this pull request Feb 9, 2018

[JEP-206] Always use UTF-8 for the main Pipeline log file jenkinsci/workflow-job-plugin#89

Merged

jglick mentioned this pull request Jul 31, 2018

[JENKINS-52692,JENKINS-38313] - External task logging API #3557

Closed

4 tasks

oleg-nenashev added the stalled The PR is reasonable and might be merged, but it is no longer active. It can be taken over by other label Feb 3, 2019

oleg-nenashev closed this Feb 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-48923] Core should use UTF-8 by default #3231

[JENKINS-48923] Core should use UTF-8 by default #3231

ghost commented Jan 13, 2018 •

edited by ghost

Loading

oleg-nenashev commented Jan 14, 2018

jglick Jan 19, 2018

ghost Jan 21, 2018

jglick Jan 19, 2018

ghost Jan 21, 2018

jglick Feb 9, 2018

oleg-nenashev commented Feb 3, 2019

[JENKINS-48923] Core should use UTF-8 by default #3231

[JENKINS-48923] Core should use UTF-8 by default #3231

Conversation

ghost commented Jan 13, 2018 • edited by ghost Loading

Proposed changelog entries

Submitter checklist

Desired reviewers

oleg-nenashev commented Jan 14, 2018

jglick Jan 19, 2018

Choose a reason for hiding this comment

ghost Jan 21, 2018

Choose a reason for hiding this comment

jglick Jan 19, 2018

Choose a reason for hiding this comment

ghost Jan 21, 2018

Choose a reason for hiding this comment

jglick Feb 9, 2018

Choose a reason for hiding this comment

oleg-nenashev commented Feb 3, 2019

ghost commented Jan 13, 2018 •

edited by ghost

Loading