Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caml_sys_isatty: detect Cygwin/MSYS for better -color heuristic #1406

Merged
merged 16 commits into from Nov 23, 2017

Conversation

Projects
None yet
5 participants
@nojb
Copy link
Contributor

nojb commented Oct 6, 2017

This PR improves the heuristic used to decide whether to use colors for warnings & errors by detecting the case when the native Windows port of OCaml is being used under Cygwin (which I would guess is fairly common).

@dra27 will know if we need to add some #ifdef to deal with the case of Windows XP and other strange creatures that may have trouble with GetFileInformationByHandleEx.

Of course, feel free to ignore this PR until after 4.06!

@gasche
Copy link
Member

gasche left a comment

Seems rather safe to me.

if (! GetFileInformationByHandleEx(h, FileNameInfo, buffer, sizeof(buffer) - sizeof(WCHAR)))
return 0;

nameinfo->FileName[nameinfo->FileNameLength / sizeof(WCHAR)] = 0;

This comment has been minimized.

@gasche

gasche Oct 6, 2017

Member

Just for my culture: the cygwin code you link to uses L'\0' instead of 0 to terminate widechar strings, are the two equivalent?

This comment has been minimized.

@nojb

nojb Oct 6, 2017

Author Contributor

Yes.

/* check if this could be a msys pty pipe ('msys-XXXX-ptyN-XX')
or a cygwin pty pipe ('cygwin-XXXX-ptyN-XX') */
if ((!wcsstr(nameinfo->FileName, L"msys-") &&
!wcsstr(nameinfo->FileName, L"cygwin-")) || !wcsstr(nameinfo->FileName, L"-pty"))

This comment has been minimized.

@gasche

gasche Oct 6, 2017

Member

Personally I would find

  (wcsstr(.., L"msys-") || wcsstr(..., L"cygwin-"))
  && wcsstr(.., L"-pty")

more readable.

This comment has been minimized.

@nojb

nojb Oct 6, 2017

Author Contributor

Agree, will fix.

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 6, 2017

Not checked the detail, but I think you need to change the requested windows level (it's the VER define - I can't remember the precise name) to request Windows Vista. It would be advisable to load the symbol indirectly (as for CreateSymbolicLink, for example) - though at the moment I've dealing somewhat reactively rather than pre-emptively with Windows XP support!

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 8, 2017

The intent of this PR sounds right, and I'm incompetent to comment on the implementation.

A cosmetic suggestion: while you're at it, could you move the _isatty call to the caml_win32_detect_msys_tty function you're introducing, and rename that function to caml_win32_isatty or something clearer? This way all the Windows-specific stuff (incl _isatty) is in file win32.c, as it should be.

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 8, 2017

Also, I'm disturbed at the quantity of pull requests that suggest to do something for the Windows console and other terminals: the present PR, but also #1408 , and even #1399 . Let's make sure we understand where we are going before merging all that stuff.

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 8, 2017

@xavierleroy - on "where we're going", #1399 I think was slightly misconceived (and has been closed as a result). The change here and in #1408 should be seen as trying to maintain parity between the OSes. This one moves towards consistent handling of coloured output (again, this trick is used in other projects) and there's another one in the pipeline - excuse pun - for the Windows Console itself. Amongst other things, #1408 brings feature parity for #1231, whose motivation was to allow UTF-8 to display characters instead of sequences.

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 8, 2017

I'd mentioned privately with @nojb, but I'll put it here for discussion. I'm wondering if it would be worth exposing this functionality in the Unix library. For Windows, there are two concepts for isatty:

  1. Some kind of interactive thing (so input coming from a user or output going to something a user can "see")
  2. Specifically a Windows Console

I'm wondering if we can add a function Unix.isaconsole which on Unix is just a (pointless) synonym for Unix.isatty but on Windows returns true strictly if the given descriptor is a Windows console. So we have Unix.isaconsole iff Unix.isatty.

@gasche

This comment has been minimized.

Copy link
Member

gasche commented Oct 8, 2017

Why would OCaml users care about the difference? Is the idea to move some of the console-specific logic currently in the runtime to userspace? (Or to make it easier for brave Windows-side maintainers to experiment outside the distribution?)

@nojb

This comment has been minimized.

Copy link
Contributor Author

nojb commented Oct 8, 2017

@xavierleroy Here is a less technical but maybe a little more conceptual explanation of what is going on and the reason for these Console-related PRs. Mainly directed at those who are not very familiar with how Windows works (like me; so apologies in advance for any inaccuracies!).

The first thing to keep in mind is that there is a fundamental difference between Unix and Windows in their handling of the "Console" (by which I mean what you get when you open up some kind of terminal and type things in it). For Unix systems there is basically no difference between writing & reading from the Console and writing & reading from a file: both are streams of bytes. However, in Windows the "Console" is not a stream of bytes but a two dimensional array of (wide) characters.

As a corollary, this means that anything that gets sent to the Console needs to be translated at some point into a sequence of wide characters, which means that 1) it has to have an encoding and 2) that encoding needs to be made known to the OS so that suitable re-encoding into wide characters (a.k.a. UTF-16) can take place. Regarding 1) #1200 has made it much more likely that whatever is being sent to the Console is UTF-8 encoded. Point 2) is typically done by calling the Windows API SetConsoleOutputCP or executing chcp 65001 before starting your program. This needs to be done so that one can print UTF-8 strings using write or printf.

The problem is the existence of several bugs in Windows in the handling of the above situation, some of which have only been fixed rather recently. This means that today, the native Windows port of trunk running on the Command Prompt (cmd.exe) has trouble writing and reading Unicode (and it can even segfault in one case). Since it is rather sad to have modified the whole runtime system to support Unicode on Windows and then have it basically unusable through the Console, @dra27 put a lot of energy into working around these bugs. This is what #1408 is about.

One way to sidestep the need to deal with the encoding points above is to directly read and write wide characters. Then no re-encoding is necessary and things will mostly just work. This is in a sense the most "native" way to talk to the Console on Windows and is done by calling ReadConsole and WriteConsole. It is by using these functions to implement the basic I/O primitives of the runtime system (for suitable versions of Windows) that #1408 works around the Windows bugs.

Lastly, it is also important to keep in mind that this discussion, and the modifications of #1408, are only relevant when writing & reading from the Console. As soon as we redirect those handles into, say, files we are back into the land of streams of bytes and the situation is very much like in Unix. Note that this includes the case of the native Windows ports under Cygwin since Cygwin redirects those channels behind the scenes to implement their emulation layer. This means, in particular, that you can try out the native Windows port of trunk under Cygwin today and input/output of Unicode works perfectly (as perfectly as it works under Unix).

So, TL; DR:

  • #1408 Works around Windows bugs in most (all?) versions of Windows so that I/O of Unicode works correctly on cmd.exe
  • #1416 does "2)" (communicating the encoding to the OS) for those versions of Windows where this works correctly.
  • #1406 (this PR) improves the -color heuristic by detecting the situation of the native Windows port running under Cygwin, as described in the last paragraph. Not really related to any Unicode changes.
@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Oct 9, 2017

Thanks, @nojb, for the high-level overview of the ongoing Win32 console work. Now I can express my remaining concern more clearly.

  • @dra27 recently implemented Unix.isatty for Win32 using GetConsoleMode and eschewing _isatty.
  • This PR is about caml_sys_isatty, a primitive that is not made available in the Caml libraries, not even as Unix.isatty. It is using _isatty and you're adding other cases of console-like output.

So, we end up with two isatty functions whose behaviors and implementations have completely diverged. Isn't this a case of the right hand ignoring the left hand? Couldn't we do better?

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Oct 9, 2017

@nojb and my Slack history testifies that we definitely aren't ignoring each other 😄

I haven't (yet) reviewed this PR in detail - the use of _isatty on Windows is a bug

@nojb nojb force-pushed the nojb:fix_sys_isatty_for_cygwin branch from e1247d0 to 47fffdc Oct 9, 2017

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Oct 11, 2017

@nojb nojb force-pushed the nojb:fix_sys_isatty_for_cygwin branch from 47fffdc to 2dcb122 Oct 24, 2017

@nojb nojb force-pushed the nojb:fix_sys_isatty_for_cygwin branch from e55159c to 0f57ee8 Nov 18, 2017

@xavierleroy
Copy link
Contributor

xavierleroy left a comment

Minor stylistic comments below. Globally I'm in favor of merging.

@@ -619,7 +618,7 @@ CAMLprim value caml_sys_isatty(value chan)

fd = (Channel(chan))->fd;
#ifdef _WIN32
ret = Val_bool(_isatty(fd));
ret = Val_bool(caml_win32_isatty(fd));
/* https://msdn.microsoft.com/en-us/library/f4s0ddew.aspx */

This comment has been minimized.

@xavierleroy

xavierleroy Nov 18, 2017

Contributor

It looks like this URL is no longer relevant; why not delete the comment?

This comment has been minimized.

@nojb

nojb Nov 18, 2017

Author Contributor

Done.

);

static int caml_win32_ismsystty(int fd)
{

This comment has been minimized.

@xavierleroy

xavierleroy Nov 18, 2017

Contributor

Looks like this function could (should?) take a HANDLE as argument and not a file descriptor, saving a _getosfhandle call that is redundant with the one done in caml_win32_isatty.

This comment has been minimized.

@nojb

nojb Nov 18, 2017

Author Contributor

Indeed, well spotted!

return (hFile != INVALID_HANDLE_VALUE &&
GetFileType(hFile) == FILE_TYPE_CHAR &&
GetConsoleMode(hFile, &lpMode)) || caml_win32_ismsystty(fd);
}

This comment has been minimized.

@xavierleroy

xavierleroy Nov 18, 2017

Contributor

I had to read this twice to make sure the && and || were in the right places. What about:

if (hFile == INVALD_HANDLE_VALUE) return 0;
if (GetFileType(hFile) == FILE_TYPE_CHAR && GetConsoleMode(hFile, &lpMode)) return 1;
return caml_win32_ismsystty(h);

This comment has been minimized.

@nojb

nojb Nov 18, 2017

Author Contributor

Yes, the suggested version is indeed clearer.

@xavierleroy xavierleroy modified the milestones: 4.07-or-later, 4.07 Nov 18, 2017

@nojb

This comment has been minimized.

Copy link
Contributor Author

nojb commented Nov 19, 2017

Amended as suggested by the reviewer.

Quick summary of recent changes:

I went ahead and merged the implementation of Unix.isatty under Windows and caml_sys_isatty (the primitive used to decide whether compiler colors should be used). Concretely, this means that Unix.isatty now returns true for the native Windows ports when passed a file descriptor connected to Cygwin's terminal.

Technically this is a breaking change (and is marked as such in Changes) but the doc string of Unix.isatty was always a bit ambiguous on this point (it reads: Return true if the given file descriptor refers to a terminal or console window, false otherwise).

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 21, 2017

No need to worry about changing the semantics of Unix.isatty under Windows, because as you said the intended semantics has never been clear to begin with.

Under Unix, the traditional meaning of isatty is "supports the system calls for terminal I/O such as tcsetattr and the like". Historically it refers to serial lines; more recently, to pseudo-ttys. The more modern meaning of isatty is "there is a terminal-style display and keyboard underneath, and a human user interacting with all that". Under Windows, tcsetattr and the like are not implemented in the OCaml interface, and nobody cares about RS-232 serial lines any longer, so it's OK to stick to the modern meaning.

@xavierleroy
Copy link
Contributor

xavierleroy left a comment

Looks good to me!

@xavierleroy

This comment has been minimized.

Copy link
Contributor

xavierleroy commented Nov 21, 2017

@dra27: would you like to review and give the final green light?

@dra27
Copy link
Contributor

dra27 left a comment

The only "important" change what I'm fairly sure is a missing bounds check.

#1321 fell foul of my usual trick of including information in commit messages which should be in code. If you choose not to refactor caml_win32_isatty as suggested, please do add comments explaining why we don't use _isatty

Changes Outdated
@@ -20,6 +20,11 @@ Working version

### Other libraries:

* GPR#1406: Unix.isatty now returns true in the native Windows ports when the
passed a file descriptor connected to Cygwin's terminal. In particular,
compiler colors for the native Windows ports now work under Cygwin.

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

Possibly worth a tiny clarification to the text of this: it's a "file descriptor connected to a Cygwin PTY" and worth noting that it also means it works "under Cygwin / MSYS2"

@@ -914,3 +918,62 @@ void caml_restore_win32_terminal(void)
if (startup_codepage != 0)
SetConsoleOutputCP(startup_codepage);
}

/* Detect if a named pipe corresponds to MSYS/Cygwin tty: see

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

Largely cosmetic, but MSYS is a fork of Cygwin, so I'd say "Detect if a named pipe corresponds to a Cygwin/MSYS2 pty"!


/* Detect if a named pipe corresponds to MSYS/Cygwin tty: see
https://github.com/mirror/newlib-cygwin/blob/master/winsup/cygwin/dtable.cc#L932

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

Either refer to the function name (handle_to_fn) or reference a specific commit - it's good to have a link to the source of this, but the file may change (in this case, s/master/00e9bf2 would solve it)

if (! pGetFileInformationByHandleEx(hFile, FileNameInfo, buffer, sizeof(buffer) - sizeof(WCHAR)))
return 0;

nameinfo->FileName[nameinfo->FileNameLength / sizeof(WCHAR)] = L'\0';

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

This needs a bounds check, however unlikely overflow may be.

This comment has been minimized.

@nojb

nojb Nov 23, 2017

Author Contributor

I am passing the maximum buffer size (sizeof(buffer) - sizeof(WCHAR)) to the GetFileInformationByHandleEx call. Do I understand correctly that you want the bounds check in case the system call returns something erroneous?

This comment has been minimized.

@dra27

dra27 Nov 23, 2017

Contributor

Ah, I'm sorry - I'd missed the subtlety of the - sizeof(WCHAR) ... please could that be commented on the call (something like "GetFileInformationByHandleEx doesn't null-terminate the string, so reduce the declared buffer length to allow for adding one.")

DWORD dwBufferSize
);

static int caml_win32_ismsystty(HANDLE hFile)

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

Can this be caml_win32_is_cygwin_pty (both change to Cygwin, which I think is clearer as we don't "officially" support MSYS2, and also that we're really talking about PTYs, not TTYs here).

@@ -17,6 +17,10 @@

/* Win32-specific stuff */

/* Enable Windows Vista functions */

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

This could be explicitly "FILE_INFO_BY_HANDLE_CLASS and FILE_NAME_INFO are only available from Windows Vista onwards"


hModKernel32 = GetModuleHandle(L"KERNEL32.DLL");
pGetFileInformationByHandleEx =
(tGetFileInformationByHandleEx)GetProcAddress(hModKernel32, "GetFileInformationByHandleEx");

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

FWIW, the CreateSymbolicLink test updates a global (although for some reason I chose to use a second global there, rather than just initialising it to INVALID_HANDLE_VALUE) rather than attempting to locate the function each time https://github.com/ocaml/ocaml/blob/trunk/otherlibs/win32unix/symlink.c#L51-L55. But efficiency doesn't really matter for this test, I guess!

This comment has been minimized.

@nojb

nojb Nov 23, 2017

Author Contributor

Good point; switched to a static variable.

if (GetFileType(hFile) == FILE_TYPE_CHAR && GetConsoleMode(hFile, &lpMode))
return 1;

return caml_win32_ismsystty(hFile);

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

I don't particularly "like" the fact GetFileType is queried twice: why not do that once here:

CAMLexport in caml_win32_isatty(int fd)
{
  DWORD lpMode;
  HANDLE hFile = (HANDLE)_get_osfhandle(fd);

  if (hFile == INVALID_HANDLE_VALUE)
    return 0;

  switch (GetFileType(hFile)) {
  case FILE_TYPE_CHAR:
    /* Both console handles and the NUL device are FILE_TYPE_CHAR,
       The NUL device returns FALSE for a GetConsoleMode call. _isatty
       incorrectly only uses GetFileType (see GPR#1321). */
    return GetConsoleMode(hFile, &lpMode);
  case FILE_TYPE_PIPE:
    /* Cygwin PTYs are implemented using named pipes */
    return caml_win32_is_cygwin_pty(hFile);
  default:
    return 0;
  }
}
FILE_INFO_BY_HANDLE_CLASS FileInformationClass,
LPVOID lpFileInformation,
DWORD dwBufferSize
);

This comment has been minimized.

@dra27

dra27 Nov 22, 2017

Contributor

Just for brevity, function pointer definitions don't need parameter names (cf. https://github.com/ocaml/ocaml/blob/trunk/otherlibs/win32unix/symlink.c#L31)

nojb added some commits Nov 23, 2017

Changes Outdated
@@ -20,6 +20,11 @@ Working version

### Other libraries:

* GPR#1406: Unix.isatty now returns true in the native Windows ports when the

This comment has been minimized.

@dra27

dra27 Nov 23, 2017

Contributor

Erroneous "the" at the end of the line!

@@ -914,3 +919,64 @@ void caml_restore_win32_terminal(void)
if (startup_codepage != 0)
SetConsoleOutputCP(startup_codepage);
}

/* Detect if a named pipe corresponds to a Cygwin/MSYS pty: see

This comment has been minimized.

@dra27

dra27 Nov 23, 2017

Contributor

Stray blank?

{
char buffer[1024];
FILE_NAME_INFO * nameinfo = (FILE_NAME_INFO *) buffer;
static tGetFileInformationByHandleEx pGetFileInformationByHandleEx = NULL;

This comment has been minimized.

@dra27

dra27 Nov 23, 2017

Contributor

You could initialise this to INVALID_HANDLE_VALUE...

FILE_NAME_INFO * nameinfo = (FILE_NAME_INFO *) buffer;
static tGetFileInformationByHandleEx pGetFileInformationByHandleEx = NULL;

if (pGetFileInformationByHandleEx == NULL)

This comment has been minimized.

@dra27

dra27 Nov 23, 2017

Contributor

... and compare that here instead of NULL (means that the lookup only happens once, even on Windows XP)... but it's up to you if you want to use keypresses making Windows XP execution "faster" 😉

This comment has been minimized.

@nojb

nojb Nov 23, 2017

Author Contributor

Sure, done.

@nojb

This comment has been minimized.

Copy link
Contributor Author

nojb commented Nov 23, 2017

Thanks for the careful review @dra27! I think I addressed all the points raised.

@dra27

dra27 approved these changes Nov 23, 2017

@dra27

This comment has been minimized.

Copy link
Contributor

dra27 commented Nov 23, 2017

Lovely, thank you - let's wait for the CI to confirm everything's still in order and then merge (I presume you're happy for me to squash this one?)

@nojb

This comment has been minimized.

Copy link
Contributor Author

nojb commented Nov 23, 2017

Yes, sure. Thanks!

@dra27 dra27 merged commit e86e762 into ocaml:trunk Nov 23, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.