Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-48923] Core should use UTF-8 by default #3231

Closed
wants to merge 2 commits into from

Conversation

2 participants
@ghost
Copy link

commented Jan 13, 2018

See JENKINS-48923.

This started off as a follow-up to PR #3210 (see #3210 (comment) and PR #3224), but it appeared that fixing the one use of the default charset could be problematic, as there are several other uses of the default charset.

So, I have attempted to change the charset that Jenkins uses by default to UTF-8 across core.

This should be considered an experimental, work-in-progress PR.

Proposed changelog entries

  • Major Enhancement: Switched to UTF-8 by default

Submitter checklist

  • JIRA issue is well described
  • Changelog entry appropriate for the audience affected by the change (users or developer, depending on the change). Examples
    * Use the Internal: prefix if the change has no user-visible impact (API, test frameworks, etc.)
  • Appropriate autotests or explanation to why this change has no tests
  • For dependency updates: links to external changelogs and, if possible, full diffs

Desired reviewers

@daniel-beck
@oleg-nenashev

@ghost ghost referenced this pull request Jan 13, 2018

Closed

Assume queueFile is UTF-8-encoded #3224

0 of 4 tasks complete
@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Jan 14, 2018

Although I generally I agree with the change, it may have a serious impact on instances using another encoding, because they may improperly read previously saved files after the migration. It needs to be really well justified and tested if we want to merge it

// It's not possible to retrieve the charset that the writer is using;
// however, for all uses of this constructor, the writer is an instance
// of StringWriter, so it's okay to assume UTF-8.
this(new WriterOutputStream(w), StandardCharsets.UTF_8);

This comment has been minimized.

Copy link
@jglick

jglick Jan 19, 2018

Member

The WriterOutputStream constructor needs to specify the encoding too, I think, or you will get junk.

This comment has been minimized.

Copy link
@ghost

ghost Jan 21, 2018

Author

Thanks! Added.

@@ -1697,15 +1709,14 @@ protected final void execute(@Nonnull RunExecution job) {
long start = System.currentTimeMillis();

try {
Computer computer = Computer.currentComputer();
if (computer != null) {
setCharset(computer.getDefaultCharset());

This comment has been minimized.

Copy link
@jglick

jglick Jan 19, 2018

Member

This part is what I think is really wrong. We should change the charset of all builds to be UTF-8, unconditionally, so that miscellaneous messages can always be safely written to the log file; but make sure that anything which copies external process output (namely, Launcher) transcodes that output on either the master or remote side.

Cf. my corresponding changes for Pipeline in jenkinsci/workflow-job-plugin#27 and jenkinsci/durable-task-plugin#29, with integration test (ShellStepTest.encoding) in jenkinsci/workflow-durable-task-step-plugin#21.

This comment has been minimized.

Copy link
@ghost

ghost Jan 21, 2018

Author

I am sure you are right. Ideally only UTF-8 would be used, though I was hesitant to go through and rip out anything relating to charsets. Right now I have changed the implementations of Computer.getDefaultCharset() to return StandardCharsets.UTF_8, so effectively this is what is being done, but without removing the scaffolding just yet.

* @since TODO
*/
public final void setCharset(Charset charset) {
this.charsetInstance = charset;

This comment has been minimized.

Copy link
@jglick

jglick Feb 9, 2018

Member

Not enough. onLoad must also restore charsetInstance based on the persisted charset, or getCharset will wind up always returning UTF-8 after a restart.

@oleg-nenashev

This comment has been minimized.

Copy link
Member

commented Feb 3, 2019

This PR needs to address comments && there is a serious merge conflict there.
Daniel Trebbien has deleted his GitHub account, so I would not expect actions in this PR. Closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.