Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Win + Russian locale + child_process.exec = ??? #2190

Closed
indutny opened this issue Nov 25, 2011 · 20 comments
Closed

Win + Russian locale + child_process.exec = ??? #2190

indutny opened this issue Nov 25, 2011 · 20 comments
Labels

Comments

@indutny
Copy link
Member

indutny commented Nov 25, 2011

First, lets try running dir in windows cmdline:

U:\Fido\Soft\_Sources\node>dir
 Том в устройстве U имеет метку USBSTICK
 Серийный номер тома: 467B-0CBE

 Содержимое папки U:\Fido\Soft\_Sources\node

24.11.2011  15:12    <DIR>          .
24.11.2011  15:12    <DIR>          ..
25.11.2011  09:37               522 index.js
               1 файлов            522 байт
               2 папок  17 412 767 744 байт свободно

Then lets run following script in same terminal:

var fs = require('fs'); // file system

require('child_process').exec('dir', function(err, outstr){
   fs.createWriteStream('testfile.txt', {
      flags: 'w',
      encoding: 'binary'
   }).write(outstr);
});

The output will be:

 ??? ? ?????? U ????? ????? USBSTICK
 ?????? ????? ??: 467B-0CBE

 ???????? ????? U:\Fido\Soft\_Sources\node

24.11.2011  15:12    <DIR>          .
24.11.2011  15:12    <DIR>          ..
25.11.2011  09:36               522 index.js
               1 ????            522 ????
               2 ?????  17?412?767?744 ???? ??????

Translated and copied from http://habrahabr.ru/qa/13851/

@Mithgol
Copy link

Mithgol commented Nov 26, 2011

The issue is that Node.js does always expect UTF-8 output from a child process, but Windows with Russian locale defaults to CP866.

Workaround

If you only need some UTF-8 output from a Windows shell command, the above issue can be easily worked around by prepending «chcp 65001» to the shell command and thus changing the default codepage of the output from your child process to the node.js-expected UTF-8:

var forker = require('child_process');
var fs = require('fs'); // file system

forker.exec('chcp 65001 | dir', function(err, outstr){
   fs.createWriteStream('testfile.txt', {
      flags: 'w',
      encoding: 'binary'
   }).write(outstr);
});

You get the correct output in «testfile.txt» then.

Workaround issues

Issue 1

You still get garbage in console, this cannot be worked around. For example, the following JavaScript

var clog = console.log;
clog('\nRunning under Node.js version ' + process.versions.node + ' on ' + process.arch +
    '-type processor, ' + process.platform + ' platform.');

var forker = require('child_process');
var fs = require('fs'); // file system

forker.exec('chcp 65001 | dir', function(err, outstr){
   fs.createWriteStream('testfile.txt', {
      flags: 'w',
      encoding: 'binary'
   }).write(outstr);
   clog('\n' + outstr);
});

will produce correct «testfile.txt» and the following output:

console output screenshot

That's because, somewhere deep inside, Node.js expects that UTF-8 output can always be handled by the console, though the actual Windows console with Russian locale does expect CP866 output.

That is probably another Node.js issue — separate from expecting UTF-8 output from a shell command.

Issue 2

It does not help if you run «chcp 65001» before starting Node.js in the same Windows console, intending to change console's codepage. Quite the contrary, it has severe undesired effects on keyboard input, and the console output remains garbage anyway.

That is a known misfeature of Windows XP: «chcp 65001» does change the codepage of shell command's output, but does not affect the console. Even the output of chcp itself becomes garbage.

yet another console output screenshot

@Mithgol
Copy link

Mithgol commented Nov 26, 2011

I have just discovered that both of the two issues of the given workaround are present only when raster fonts are chosen for the Windows console (in its properties).

But if «Lucida Console» vector fonts are chosen in the properties of the Windows console, then the console output of Node.js looks correct in any codepage, see the following screenshot:

screenshot

(Though the «'chcp 65001 | dir'» workaround is still necessary and the original issue does not stand corrected.)

@Mithgol
Copy link

Mithgol commented Nov 26, 2011

If the «'chcp 65001 | dir'» workaround is not present, then the results of the script (running in console with «Lucida Console» fonts) depend on the console's codepage:

var clog = console.log;
clog('\nRunning under Node.js version ' + process.versions.node + ' on ' +
   process.arch + '-type processor, ' + process.platform + ' platform.');

require('child_process').exec('dir', function(err, outstr){
   clog('\n' + outstr);
});

screenshot

And the default codepage is CP866.

@Mithgol
Copy link

Mithgol commented Nov 26, 2011

I've made yet another test in default Windows console (with raster fonts and cp866):

screenshot

It seems that direct output of Unicode strings is fine if the corresponding characters are within CP866.

I wonder why chcp 65001 | dir output was so garbled in raster console then.

All that suddenly becomes more interesting. I should probably open another Node.js issue.

@Mithgol
Copy link

Mithgol commented Nov 26, 2011

Not a Node.js issue, but rather a workaround issue: «chcp 65001» actually affects not only the child process, but also somehow affects the current console! So we need to «chcp 866» back (in a console with raster fonts) before we output anything.

Totally unexpected!

But finally I have some, more correct, workaround for the original issue:

var clog = console.log;
clog('\nRunning under Node.js version ' + process.versions.node + ' on ' +
   process.arch + '-type processor, ' + process.platform + ' platform.');

var forker = require('child_process');
var fs = require('fs'); // file system

forker.exec('chcp 65001 | dir', function(err, outstr){
   fs.createWriteStream('testfile.txt', {
      flags: 'w',
      encoding: 'binary'
   }).write(outstr);
   forker.exec('chcp 866', function(){
      clog('\n' + outstr);
   });
});

screenshot

The only remaining problem of a workaround is that two .exec() calls are, of course, not synchronous — and thus the console remains in abnormal mode for a few ticks. Nothing can be done about that — until and unless .execSync(…) method is introduced (#1167).

@Mithgol
Copy link

Mithgol commented Nov 26, 2011

I have opened issue #2196 to report the separate problem discovered in my search of a workaround for #2190.

@piscisaureus
Copy link

@Mithgol
What if we just set the console codepage to UTF-8 when node starts - I guess that would solve the problem, right?

Another remark, node does not actually write UTF8 to the console - it writes UCS16 because that is what windows uses internally. I think that when you pick a raster font, windows will try to convert back from unicode to cp866, and that something goes wrong there.

Finally, I do not really care about raster font support. Modern windows versions pick a truetype font by default anyway.

@piscisaureus
Copy link

Another solution could be so use cmd /u /c instead of cmd /c to run commands. But this affects only the output of internal cmd commands and not other programs, so I think it would be very confusing for users.

@Mithgol
Copy link

Mithgol commented Nov 28, 2011

@piscisaureus

What if we just set the console codepage to UTF-8 when node starts - I guess that would solve the problem, right?

No, that would not — unless the console fonts are set to vector fonts (“Lucida Console” in Windows XP, probably also “Consolas” in later Windows) beforehand. The raster fonts of Windows console would not play well with UTF-8 encoding (aka codepage 65001). On Windows XP, any Russian text in raster console becomes garbled with that codepage (even the output of chcp command itself), as I have already demonstrated above:

screenshot

And by “any text” I mean “even keyboard input”.

On Windows 7, there is no garbage in 65001, but no visible characters also, when you type Russian in a console with raster fonts. And funny thing: the chcp output is in English.

I think that when you pick a raster font, windows will try to convert back from unicode to cp866, and that something goes wrong there.

Yes, that's a Windows bug and not a Node.js problem. The problem is, however, that Node.js does always expect UTF-8 input from require('child_process').exec and sometimes it needs to expect CP866.

(Ideally issue #1772 should be fixed and then require('child_process').exec augmented to expect any arbitrary encoding, because Russian locale is not the only one.)

Finally, I do not really care about raster font support. Modern windows versions pick a truetype font by default anyway.

I am not that sure. I've just checked a coworker's Windows 7 workstation and it has raster fonts set in console's settings. The fellow says it has always been like that.

@piscisaureus
Copy link

@Mithgol The problem is that currently we do not support encodings other than utf8 and utf16. And I don't really want to bake all conceivable character encodings into node-core. However maybe we can add an option to exec/spawn to convert data from the current locale to utf8 - that would probably solve the problem.

Can you confirm that it is possible to write cyrillic characters to the console using console.log, by writing the appropriate unicode codepoints?

@Mithgol
Copy link

Mithgol commented Nov 28, 2011

@piscisaureus

Yes, I can confirm.

It is currently possible to write Cyrillic characters to the Windows console (with both raster and “Lucida Console” fonts), if the console works in default encoding (which is CP866 for consoles in Windows of Russian locale), by using console.log and providing the appropriate Unicode codepoints.

I have already tested it (two days ago) and confirmed with one of the above screenshots; I'll repeat the same image here:

screenshot

This one screenshot was made in raster mode, but node -e "console.log('\u0420\u0443\u0441\u044C')" in vector mode (i.e. with “Lucida Console” fonts) makes exactly the same word.

@piscisaureus
Copy link

chcp 65001 is totally unsupported in windows so I do not want to use that. If I enter chcp 65001 & dir in a directory that has non-ansi file names I get random errors.

@Mithgol
Copy link

Mithgol commented Dec 2, 2011

Have you tried the above given script (which contains workarounds for both this issue and #2196)?

var clog = console.log;
clog('\nRunning under Node.js version ' + process.versions.node + ' on ' +
   process.arch + '-type processor, ' + process.platform + ' platform.');

var forker = require('child_process');
var fs = require('fs'); // file system

forker.exec('chcp 65001 | dir', function(err, outstr){
   fs.createWriteStream('testfile.txt', {
      flags: 'w',
      encoding: 'binary'
   }).write(outstr);
   forker.exec('chcp 866', function(){
      clog('\n' + outstr);
   });
});

If you get any errors (random or not), what are they, what do they look like?

@ry
Copy link

ry commented Dec 9, 2011

just use raw data with child_process.spawn

c = require('child_process').spawn('dir');
f = require('fs').WriteFile('testfile.txt');
c.stdout.pipe(f);

@piscisaureus
Copy link

@Mithgol I get "the system cannot write to the selected device". It happens only if the console font is a raster font tho.

@indutny indutny closed this as completed Jun 21, 2012
@Mithgol
Copy link

Mithgol commented Jun 22, 2012

Why is this issue closed?

@indutny
Copy link
Member Author

indutny commented Jun 22, 2012

Well, I think you've received a lot of comments and hints, and there was no activity for 7 months. Shoul I reopen it, why?

@Mithgol
Copy link

Mithgol commented Jun 22, 2012

Thank you for explaining.

@thorn0
Copy link

thorn0 commented Nov 11, 2015

The comments here just describe possible workarounds, but the issue is still (Node v5.0.0) there. If Windows is supposed to be a first-class citizen in the Node world, this should be fixed.

@ChALkeR
Copy link
Member

ChALkeR commented Nov 11, 2015

@thorn0 This is an archive repo, and the last comment on the issue was more than three years ago.

If you think that this is a Node.js (not a setup) problem that still exists with recent Node.js, open an issue at https://github.com/nodejs/node (if it's not there already).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants