New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output from processing-java badly encoded for Windows Command Prompt #1633

Closed
Antony74 opened this Issue Feb 18, 2013 · 7 comments

Comments

Projects
None yet
4 participants
@Antony74

Antony74 commented Feb 18, 2013

This follows on from #1456 - Output from processing-java does not seem to be UTF-8 encoded. Very similar problem, only it was OS X before, and Windows Command Prompt this time.

D:>mkdir example

D:>echo Hamster x; >example\example.pde

D:>processing-java --build --sketch=D:\example --output=output --force
example.pde:1:0:1:0: Cannot find a class or type named ÔÇ£HamsterÔÇØ

Expected output:
example.pde:1:0:1:0: Cannot find a class or type named "Hamster"

Version:
processing-2.0b7, and also yesterday's Revision bbb1627.

@benfry

This comment has been minimized.

Show comment
Hide comment
@benfry

benfry Feb 18, 2013

Member

At a glance, that looks like your command prompt is not UTF-8 encoded. The version from Git should be writing UTF-8 data properly, though this may mean that UTF-8 isn't a good default for Windows.

Member

benfry commented Feb 18, 2013

At a glance, that looks like your command prompt is not UTF-8 encoded. The version from Git should be writing UTF-8 data properly, though this may mean that UTF-8 isn't a good default for Windows.

@boubpopsyteam

This comment has been minimized.

Show comment
Hide comment
@boubpopsyteam

boubpopsyteam Feb 18, 2013

Contributor

Default charset for windows CMD is "PC Code Page 850" which is not UTF-8 indeed.

Contributor

boubpopsyteam commented Feb 18, 2013

Default charset for windows CMD is "PC Code Page 850" which is not UTF-8 indeed.

@benfry

This comment has been minimized.

Show comment
Hide comment
@benfry

benfry Feb 18, 2013

Member

That's locale-specific (on a Japanese system it will be different, for instance).

Out of curiosity, how difficult is it to change the prompt to use UTF-8?

Member

benfry commented Feb 18, 2013

That's locale-specific (on a Japanese system it will be different, for instance).

Out of curiosity, how difficult is it to change the prompt to use UTF-8?

@boubpopsyteam

This comment has been minimized.

Show comment
Hide comment
@boubpopsyteam

boubpopsyteam Feb 18, 2013

Contributor

My locale is fr-FR which is ISO-8859-15 but default for cmd prompt is CP 850 ...

CP 65001 seems to be the "UTF-8 like" code page, you can output utf-8 text but many basics stuff like 'more' can't work anymore ...

PowerShell seems to provide some way of using utf-8 but still have the BOM issue.

Contributor

boubpopsyteam commented Feb 18, 2013

My locale is fr-FR which is ISO-8859-15 but default for cmd prompt is CP 850 ...

CP 65001 seems to be the "UTF-8 like" code page, you can output utf-8 text but many basics stuff like 'more' can't work anymore ...

PowerShell seems to provide some way of using utf-8 but still have the BOM issue.

@Antony74

This comment has been minimized.

Show comment
Hide comment
@Antony74

Antony74 Feb 19, 2013

I'm sticking with my original recommendation/fix for this (41677ee). Here are a couple of examples of what other people have said about this issue:

"Thus the default messages, in order to viewable in the widest array of configurations, shells, and environments, needs to stick to the ISO 646 character repertoire."

http://www.inter-locale.com/whitepaper/cli-i18n.html#messages

"In the C locale, the output of GNU programs should stick to plain ASCII for quotation characters in messages to users: preferably 0x22 (‘"’) or 0x27 (‘'’) for both opening and closing quotes."

http://www.gnu.org/prep/standards/standards.html#Character-Set

Antony74 commented Feb 19, 2013

I'm sticking with my original recommendation/fix for this (41677ee). Here are a couple of examples of what other people have said about this issue:

"Thus the default messages, in order to viewable in the widest array of configurations, shells, and environments, needs to stick to the ISO 646 character repertoire."

http://www.inter-locale.com/whitepaper/cli-i18n.html#messages

"In the C locale, the output of GNU programs should stick to plain ASCII for quotation characters in messages to users: preferably 0x22 (‘"’) or 0x27 (‘'’) for both opening and closing quotes."

http://www.gnu.org/prep/standards/standards.html#Character-Set

@benfry benfry added the help wanted label Nov 20, 2014

@satoshiokita

This comment has been minimized.

Show comment
Hide comment
@satoshiokita

satoshiokita Mar 6, 2016

Contributor

I dont know history of this problem. but Windows command prompt and Powershell prompt is not UTF-8 encoding. so we should use system default setting.

    // Turns out the output goes as MacRoman or something else useless.
    // http://code.google.com/p/processing/issues/detail?id=1418
    try {
      systemOut = new PrintStream(System.out, true, "UTF-8");
      systemErr = new PrintStream(System.err, true, "UTF-8");
      if (Platform.isWindows()) {
        systemOut = new PrintStream(System.out, true);
        systemErr = new PrintStream(System.err, true);
      }

how difficult is it to change the prompt to use UTF-8?

commonly, we(japanese) don't change prompt encoding. but There is a 'chcp' command.

My test environemnt:

  • Windows 10 Pro 64bit(Japanese)
  • I don' test MacOSX

the CP932 is Windows encode for Japanese.
https://en.wikipedia.org/wiki/Code_page_932

test command prompt

Processing Output is UTF-8 and my command prompt is CP932.
001

test powershell prompt

PowerShell seems to provide some way of using utf-8 but still have the BOM issue.

Yes. we should use [System.IO.File]::WriteAllLines method.

002

Contributor

satoshiokita commented Mar 6, 2016

I dont know history of this problem. but Windows command prompt and Powershell prompt is not UTF-8 encoding. so we should use system default setting.

    // Turns out the output goes as MacRoman or something else useless.
    // http://code.google.com/p/processing/issues/detail?id=1418
    try {
      systemOut = new PrintStream(System.out, true, "UTF-8");
      systemErr = new PrintStream(System.err, true, "UTF-8");
      if (Platform.isWindows()) {
        systemOut = new PrintStream(System.out, true);
        systemErr = new PrintStream(System.err, true);
      }

how difficult is it to change the prompt to use UTF-8?

commonly, we(japanese) don't change prompt encoding. but There is a 'chcp' command.

My test environemnt:

  • Windows 10 Pro 64bit(Japanese)
  • I don' test MacOSX

the CP932 is Windows encode for Japanese.
https://en.wikipedia.org/wiki/Code_page_932

test command prompt

Processing Output is UTF-8 and my command prompt is CP932.
001

test powershell prompt

PowerShell seems to provide some way of using utf-8 but still have the BOM issue.

Yes. we should use [System.IO.File]::WriteAllLines method.

002

@benfry

This comment has been minimized.

Show comment
Hide comment
@benfry

benfry May 8, 2016

Member

@satoshiokita Thanks for tracking down the fix. Mac OS X (and Linux?) had the opposite problem, so the code was changed during the 2.0 release cycle to write UTF-8. We'll add this code so that it works properly (the old way) on Windows.

Fix incorporated for 3.1.

Member

benfry commented May 8, 2016

@satoshiokita Thanks for tracking down the fix. Mac OS X (and Linux?) had the opposite problem, so the code was changed during the 2.0 release cycle to write UTF-8. We'll add this code so that it works properly (the old way) on Windows.

Fix incorporated for 3.1.

@benfry benfry closed this May 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment