Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode parts of env are lost when running createProcess #152

Closed
joeyh opened this issue Sep 9, 2019 · 5 comments · Fixed by #153

Comments

@joeyh
Copy link
Contributor

commented Sep 9, 2019

In a non-unicode locale, such as LANG=C, running createProcess with an env that contains a unicode character, such as '¡' results in it being stripped out of the value that is seen by the child proccess; only the ascii characters remain.

A program demonstrating the bug is:

import System.Process
import System.Environment

main = do
v <- lookupEnv "FOO"
case v of
	Just foo -> print $ "FOO is set to: " ++ foo
	Nothing -> do
		let e = [("FOO", "¡foo!")]
		print $ "running child process with environment " ++ show e
		self <- getExecutablePath
		let p = (proc self []) { env = Just e }
		(_, _, _, pid) <- createProcess p
		_ <- waitForProcess pid
		return ()

This program execs itself, so needs to be compiled, not run in ghci. On linux, built with process-1.6.5.0 and ghc-8.6.5, it behaves like this:

# LANG=en_US.utf8 ./foo
"running child process with environment [(\"FOO\",\"\\161foo!\")]"
"FOO is set to: \56514\56481foo!"
# LANG=C ./foo
"running child process with environment [(\"FOO\",\"\\161foo!\")]"
"FOO is set to: foo!"

Using strace -vf foo shows that the non-ascii characters do not make it to exec:

[pid 4335] execve("/home/joey/foo", ["/home/joey/foo"], ["FOO=foo!"] <unfinished ...>

This was discovered affecting a program that sets GIT_INDEX_FILE to a path when running git; if the path happens to contain unicode, this results in an corrupted path being passed to git.

@joeyh

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2019

withCEnvironment uses Foreign.C.String.withCString, which has this implementation:

withCString :: String -> (CString -> IO a) -> IO a
withCString s f = getForeignEncoding >>= \enc -> GHC.withCString enc s f

getForeignEncoding is described as "The Unicode encoding of the current locale, but where undecodable bytes are replaced with their closest visual match." So I think that is what is stripping some characters (and perhaps changing others).

Using getFileSystemEncoding instead should avoid the problem. I notice that its haddock actually says it's "used to decode and encode command line arguments and environment variables on non-Windows platforms". (emphasis mine) And indeed, System.Posix.Process.executeFile does use it to encode the environment (via System.Posix.Internals.withFilePath).

@joeyh

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2019

My test case above is a little bit off, because the way "¡" is encoded causes withFilePath to throw an exception, "recoverEncode: invalid argument (invalid character)" (when LANG=C). (Similarly, this program crashes when LANG=C: main = writeFile "¡foo!" "bar")

A better/more realistic test case is this, where the unicode value is read in from the system in some way, and so ghc's encoding layer is able to get it encoded in a way that would normally round-trip back out.

import System.Process
import System.Directory
import System.Environment

main = do
  v <- lookupEnv "FOO"
  case v of
    Just foo -> print $ "FOO is set to: " ++ foo
    Nothing -> do
	pwd <- getCurrentDirectory
	let e = [("FOO", pwd ++ "/index")]
	print $ "running child process with environment " ++ show e
	self <- getExecutablePath
	let p = (proc self []) { env = Just e }
	(_, _, _, pid) <- createProcess p
	_ <- waitForProcess pid
	return ()

This improved test case shows the bug in process, when run in a directory with unicode in its name:

joey@darkstar:~>cd ¡foo!
joey@darkstar:~/¡foo!>LANG=C ../foo
"running child process with environment [(\"FOO\",\"/home/joey/\\56514\\56481foo! /index\")]"
"FOO is set to: /home/joey/foo!/index"

And I've verified that this patch to process makes the test case work the same with LANG=C as it does with LANG set to a unicode locale.

diff --git a/System/Process/Posix.hs b/System/Process/Posix.hs
index 5432f41..cdcc77c 100644
--- a/System/Process/Posix.hs
+++ b/System/Process/Posix.hs
@@ -93,7 +93,7 @@ translateInternal str
 withCEnvironment :: [(String,String)] -> (Ptr CString  -> IO a) -> IO a
 withCEnvironment envir act =
   let env' = map (\(name, val) -> name ++ ('=':val)) envir
-  in withMany withCString env' (\pEnv -> withArray0 nullPtr pEnv act)
+  in withMany withFilePath env' (\pEnv -> withArray0 nullPtr pEnv act)
@snoyberg

This comment has been minimized.

Copy link
Collaborator

commented Sep 10, 2019

I'm very much opposed to the environment-variable-locale-detection logic we use throughout base, so I'm sympathetic to this change. It's a bit of a surprise to me that "file system encoding" applies to environment variables like this, but your round trip test case is pretty convincing.

My biggest concern would be regressions, or even changes in behavior. However, I'm having a hard time thinking of a use case that would be broken by this change. Can you think of anything?

And just confirming, though it seems pretty clear: there is no Windows impact from this patch, right?

I'm overall in favor of this change, with the caveat about changes in behavior above. If we can't come up with any such issues, I'd be happy to receive a PR.

Final question: does this apply to command line arguments as well?

@joeyh

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2019

@snoyberg

This comment has been minimized.

Copy link
Collaborator

commented Sep 11, 2019

OK, then I'm on board with a PR for this if you'd like to submit one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.