Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console log not being rendered properly (Encoding issue) #3442

Closed
jyotisingh opened this issue May 2, 2017 · 18 comments · Fixed by #3486
Closed

Console log not being rendered properly (Encoding issue) #3442

jyotisingh opened this issue May 2, 2017 · 18 comments · Fixed by #3486
Milestone

Comments

@jyotisingh
Copy link
Contributor

Issue Type
  • Bug Report
Summary
Environment

Could reproduce this on build.gocd.io. Not reproducible on my laptop.

Steps to Reproduce
  1. Copy the attached console.log.txt file to the artifact location of any of the completed jobs on the server.
  2. Rename the file to console.log. Amend file permissions so that 'go' user has access to the file(if applicable).
  3. Refresh the job details page, switch to console tab
Expected Results

Expected the tree structure to show up

Actual Results

See some gibberish instead
screen shot 2017-05-02 at 11 58 22 am

console.log.txt

@ketan
Copy link
Member

ketan commented May 2, 2017

Could reproduce this on build.gocd.io. Not reproducible on my laptop.

Very much reproducible — https://build.gocd.io/go/tab/build/detail/gocd-docs.go.cd-master/9/Build/1/build_job

@marques-work
Copy link
Member

I've tried without success to reproduce this locally. I suspect it will never happen on MacOS X as the default is always UTF-8. We must be changing the encoding before sending the response because the file is valid UTF-8 (according to iconv). I'll dig into this some more.

@ketan
Copy link
Member

ketan commented May 9, 2017

I've tried without success to reproduce this locally. I suspect it will never happen on MacOS X as the default is always UTF-8. We must be changing the encoding before sending the response because the file is valid UTF-8 (according to iconv). I'll dig into this some more.

That's werid. I was seeing junk chars instead of the fancy npm tree on the URL I pasted in the comment before. I suspect something changed with chrome. Next time, I'll be sure to grab a screenshot if I come across this.

@marques-work
Copy link
Member

marques-work commented May 10, 2017

@ketan there might be something more to it. I'm on Chrome Version 57.0.2987.133 (64-bit). I see junk characters with the URL, but not when I download the raw output and drop it into my dev instance...

Exactly the same file, same browser, but different platform. I still think there's an underlying platform encoding issue.

@ketan
Copy link
Member

ketan commented May 10, 2017

Kidding me — I see junk on that page now.

screen shot 2017-05-10 at 3 36 00 pm.

If it helps, here are the request/response headers: Clearly mentioning the encoding as utf-8.

screen shot 2017-05-10 at 3 36 54 pm

I'm using chrome beta and the user-agent header contains the version Chrome/59.0.3071.36.

To be doubly sure, I did a "copy as curl" and here's what I see in my terminal —

screen shot 2017-05-10 at 3 41 36 pm

I'm beginning to suspect that the server is doing something funny. I've dumped the output of "copy as curl" in a gist at https://gist.github.com/ketan/4cfaa5e2a797e216ff44854197569d9d.

@ketan
Copy link
Member

ketan commented May 10, 2017 via email

@marques-work
Copy link
Member

@ketan no, I just did a save from the browser. This sounds very Heisenbergian :). I'll try that.

@marques-work
Copy link
Member

@ketan, well, actually I realized that I can't try it - I don't think I have access to the machines.

@marques-work
Copy link
Member

@ketan Also interesting

XHR response, requesting log as an artifact:

$j.ajax({url: "https://build.gocd.io/go/files/gocd-docs.go.cd-master/9/Build/1/build_job/cruise-output/console.log", type: "GET"}).done(console.log)
... truncated ...

&2|20:06:11.087 npm WARN prefer global npm@2.14.1 should be installed with -g
&1|20:06:12.274 gocd-docs@1.0.0 /go/pipelines/gocd-docs.go.cd-master
&1|20:06:12.275 ├─┬ gitbook-cli@0.3.6 
&1|20:06:12.275 │ ├── bash-color@0.0.3 
&1|20:06:12.275 │ ├─┬ commander@2.8.1 
&1|20:06:12.275 │ │ └── graceful-readlink@1.0.1 
&1|20:06:12.276 │ ├─┬ fs-extra@0.24.0 
&1|20:06:12.276 │ │ ├── graceful-fs@4.1.11 
&1|20:06:12.276 │ │ ├── jsonfile@2.4.0 
&1|20:06:12.276 │ │ ├── path-is-absolute@1.0.1 
&1|20:06:12.277 │ │ └─┬ rimraf@2.6.1 
&1|20:06:12.277 │ │   └─┬ glob@7.1.1 
&1|20:06:12.277 │ │     ├── fs.realpath@1.0.0 
&1|20:06:12.278 │ │     ├─┬ inflight@1.0.6 
&1|20:06:12.278 │ │     │ └── wrappy@1.0.2 
&1|20:06:12.278 │ │     ├─┬ minimatch@3.0.3 
&1|20:06:12.278 │ │     │ └─┬ brace-expansion@1.1.7 
&1|20:06:12.279 │ │     │   ├── balanced-match@0.4.2 
&1|20:06:12.279 │ │     │   └── concat-map@0.0.1 
&1|20:06:12.279 │ │     └── once@1.4.0 
&1|20:06:12.279 │ ├── lodash@3.10.1 
&1|20:06:12.280 │ ├─┬ npm@2.14.1 
&1|20:06:12.280 │ │ ├── abbrev@1.0.7 
&1|20:06:12.280 │ │ ├── ansi@0.3.0 
&1|20:06:12.280 │ │ ├── ansicolors@0.3.2 
&1|20:06:12.281 │ │ ├── ansistyles@0.1.3 
&1|20:06:12.281 │ │ ├── archy@1.0.0 
&1|20:06:12.281 │ │ ├── async-some@1.0.2 
&1|20:06:12.281 │ │ ├── block-stream@0.0.8 
&1|20:06:12.282 │ │ ├── char-spinner@1.0.1 
&1|20:06:12.282 │ │ ├── chmodr@1.0.1 
&1|20:06:12.282 │ │ ├── chownr@1.0.1 
&1|20:06:12.282 │ │ ├─┬ cmd-shim@2.0.1 
&1|20:06:12.283 │ │ │ └── graceful-fs@3.0.8 
&1|20:06:12.283 │ │ ├─┬ columnify@1.5.2 
&1|20:06:12.283 │ │ │ ├─┬ strip-ansi@3.0.0 
&1|20:06:12.284 │ │ │ │ └── ansi-regex@2.0.0 

XHR response, using the polling request:

$j.ajax({url: "https://build.gocd.io/go/files/gocd-docs.go.cd-master/9/Build/1/build_job/cruise-output/console.log?startLineNumber=0", type: "GET"}).done(console.log)
... truncated ...

&2|20:06:11.087 npm WARN prefer global npm@2.14.1 should be installed with -g
&1|20:06:12.274 gocd-docs@1.0.0 /go/pipelines/gocd-docs.go.cd-master
&1|20:06:12.275 ��������� gitbook-cli@0.3.6 
&1|20:06:12.275 ��� ��������� bash-color@0.0.3 
&1|20:06:12.275 ��� ��������� commander@2.8.1 
&1|20:06:12.275 ��� ��� ��������� graceful-readlink@1.0.1 
&1|20:06:12.276 ��� ��������� fs-extra@0.24.0 
&1|20:06:12.276 ��� ��� ��������� graceful-fs@4.1.11 
&1|20:06:12.276 ��� ��� ��������� jsonfile@2.4.0 
&1|20:06:12.276 ��� ��� ��������� path-is-absolute@1.0.1 
&1|20:06:12.277 ��� ��� ��������� rimraf@2.6.1 
&1|20:06:12.277 ��� ���   ��������� glob@7.1.1 
&1|20:06:12.277 ��� ���     ��������� fs.realpath@1.0.0 
&1|20:06:12.278 ��� ���     ��������� inflight@1.0.6 
&1|20:06:12.278 ��� ���     ��� ��������� wrappy@1.0.2 
&1|20:06:12.278 ��� ���     ��������� minimatch@3.0.3 
&1|20:06:12.278 ��� ���     ��� ��������� brace-expansion@1.1.7 
&1|20:06:12.279 ��� ���     ���   ��������� balanced-match@0.4.2 
&1|20:06:12.279 ��� ���     ���   ��������� concat-map@0.0.1 
&1|20:06:12.279 ��� ���     ��������� once@1.4.0 
&1|20:06:12.279 ��� ��������� lodash@3.10.1 
&1|20:06:12.280 ��� ��������� npm@2.14.1 
&1|20:06:12.280 ��� ��� ��������� abbrev@1.0.7 
&1|20:06:12.280 ��� ��� ��������� ansi@0.3.0 
&1|20:06:12.280 ��� ��� ��������� ansicolors@0.3.2 
&1|20:06:12.281 ��� ��� ��������� ansistyles@0.1.3 
&1|20:06:12.281 ��� ��� ��������� archy@1.0.0 
&1|20:06:12.281 ��� ��� ��������� async-some@1.0.2 
&1|20:06:12.281 ��� ��� ��������� block-stream@0.0.8 
&1|20:06:12.282 ��� ��� ��������� char-spinner@1.0.1 
&1|20:06:12.282 ��� ��� ��������� chmodr@1.0.1 
&1|20:06:12.282 ��� ��� ��������� chownr@1.0.1 
&1|20:06:12.282 ��� ��� ��������� cmd-shim@2.0.1 
&1|20:06:12.283 ��� ��� ��� ��������� graceful-fs@3.0.8 
&1|20:06:12.283 ��� ��� ��������� columnify@1.5.2 
&1|20:06:12.283 ��� ��� ��� ��������� strip-ansi@3.0.0 
&1|20:06:12.284 ��� ��� ��� ��� ��������� ansi-regex@2.0.0 
&1|20:06:12.284 ��� ��� ��� ��������� wcwidth@1.0.0 
&1|20:06:12.284 ��� ��� ���   ��������� defaults@1.0.2 

Grabbing the log as an artifact doesn't do any string interpretation, whereas hitting the console log polling endpoint does this:

            // ConsoleService.java - getConsoleOut() method, lines 67 - 75
            BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
            String consoleLine;
            while (null != (consoleLine = reader.readLine())) {
                if (lineNumber >= startingLine) {
                    builder.append(consoleLine);
                    builder.append(FileUtil.lineSeparator());
                }
                lineNumber++;
            }

InputStreamReader must be choosing an encoding other than UTF-8, which likely yields the garbled characters. Of course, once it becomes a String in Java, it's UTF-8, which is why we see that in the response headers, after it's turned to shit :).

I would propose changing the line to this:

BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, Charset.forName("UTF-8")));

so that these characters are preserved. I do realize there was some prior discussing about forcing UTF-8, so maybe it's time to revisit?

@marques-work
Copy link
Member

My guess is that InputSreamReader is resolving ISO-8859-1 as the default for the platform somehow.

@marques-work
Copy link
Member

Update: Barrow scp'd the console log to me and everything looks fine there, so it's written to the filesystem properly.

@ketan
Copy link
Member

ketan commented May 11, 2017

so it's written to the filesystem properly.

I believe it was you who changed it in d33ffcd. So the console log on the agent is read in the default charset, and is treated as a binary until it is written to the disk on the server.

@ketan
Copy link
Member

ketan commented May 11, 2017

Reopening for @rajiesh or @ankitsri11 to take a quick stab.

@ketan ketan reopened this May 11, 2017
@ketan
Copy link
Member

ketan commented May 11, 2017

I do realize there was some prior discussing about forcing UTF-8, so maybe it's time to revisit

There are only 2 options here —

  • the opinionated one — use UTF-8 for reading console log stream from STDOUT/ERR on agent, stream it as UTF-8 to the server and stream it as UTF-8 to the browser.
  • the configurable one — read console log stream from STDOUT/ERR on agent in the platform local encoding, send the log to the server in the same encoding, have the server remember the encoding that was used. While streaming the logs to the browser, have the server recall what encoding the log was sent in and send the logs in the same encoding.

The first one, obviously is a lot simpler to implement and will likely work for most of the users with no issues whatsoever.

Right now, there's an impedance mismatch, between the encoding on the agent and on the server, which may cause issues and errors in case the encodings are incompatible.

@marques-work
Copy link
Member

I believe it was you who changed it in d33ffcd.

Gotta love it when I introduce hard to pin bugs :). Thanks for merging the PR. I see that prior to that change, StringEntity, without a specified charset appears to default to ISO-8859-1.

Regarding the 2 options, I'd fall into the first category -- that of the UTF-8 opinion.

@BrewRunner
Copy link

Is this being resourced? It appears to be related to an issue that I raised last week (which is preventing me from upgrading to any newer releases of GoCD). In my case, the console log no longer updates upon encountering non-ASCII characters.

#3926

@arvindsv
Copy link
Member

arvindsv commented Oct 18, 2017

@BrewRunner No one is currently looking at #3926. @ankitsri11 might take a look at reproducing both #3930 and #3926 in the coming days, but not sure when at this point.

ketan added a commit to ketan/gocd that referenced this issue Nov 30, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 1, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 4, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 4, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 4, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 7, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 10, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 11, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 21, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Dec 21, 2017
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Feb 6, 2018
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Feb 6, 2018
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Feb 6, 2018
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Feb 6, 2018
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Feb 6, 2018
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
ketan added a commit to ketan/gocd that referenced this issue Feb 7, 2018
As part of commit 7fc200b, the
`Content-Type` header was set to `text/plain; charset=utf-8`. However
the `HttpServletResponse`'s `getWriter()` was returning a writer bound
to the system default character encoding. We now force the encoding of
the writer to `utf-8`.

Fixes gocd#3926, gocd#3442
@ketan
Copy link
Member

ketan commented Feb 7, 2018

Closing as dup of #3926

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants