Skip to content
Permalink
Browse files

Reading environment variables in windows using the right encoding

  • Loading branch information
guillep committed Nov 12, 2018
1 parent 7452f17 commit ecb9f52fc072b05f1036e4d2e254dda575048199
@@ -50,15 +50,11 @@ FileSystemResolver >> resolve: aSymbol [

{ #category : #resolving }
FileSystemResolver >> resolveString: aString [
| decoded fs |
"The argument string is actually a byte array encoded differently on each platform.
We are transforming it to an image string.
We assume for now that the string is utf8 encoded."
decoded := aString asByteArray utf8Decoded.
| fs |
fs := FileSystem disk.
^ FileReference
fileSystem: fs
path: (fs pathFromString: decoded)
path: (fs pathFromString: aString)
]

{ #category : #resolving }
@@ -3,6 +3,22 @@ I represent the user environment variables. See `man environ` for more details.
Get access using:
Smalltalk os environment
Low level API
- getEnv: aVariableName
Gets the value of an environment variable called `aVariableName`
It is the system reponsibility to manage the encoding.
Rationale: A common denominator for all platforms providing an already decoded string, because windows does not (compared to *nix systems) provide a encoded byte representation of the value. Windows has instead its own wide string representation.
- getEnvRaw: anEncodedVariableName
Gets the value of an environment variable called `anEncodedVariableName` already encoded.
It is the user responsibility to encode and decode argument and return values in the encoding of this preference.
Rationale: Some systems may want to have the liberty to use different encodings, or even to put binary data in the variables.
- getEnv: aVariableName encoding: anEncoding
Gets the value of an environment variable called `aVariableName` using `anEncoding` to encode/decode arguments and return values.
Rationale: *xes could use different encodings
"
Class {
#name : #OSEnvironment,
@@ -94,11 +110,16 @@ OSEnvironment >> associationsDo: aBlock [

{ #category : #accessing }
OSEnvironment >> at: aKey [
"Gets the value of an environment variable called `aKey`
It is the system reponsibility to manage the encoding."
^ self at: aKey ifAbsent: [ KeyNotFound signalFor: aKey ]
]

{ #category : #accessing }
OSEnvironment >> at: aKey ifAbsent: aBlock [
"Gets the value of an environment variable called `aKey`.
Execute aBlock if absent.
It is the system reponsibility to manage the encoding."
^ (self getEnv: aKey) ifNil: aBlock
]

@@ -129,30 +150,83 @@ OSEnvironment >> at: aKey put: aValue [
^ self setEnv: aKey value: aValue
]

{ #category : #private }
OSEnvironment >> basicGetEnvRaw: encodedVariableName [

"PRIVATE: This primitive call works on Strings, while the correct way to manage encodings is with raw data.
Use me through #getEnvRaw: to correctly marshall data."

"Gets the value of an environment variable called `anEncodedVariableName` already encoded but in ByteString form."

<primitive: 'primitiveGetenv' module: '' error: ec>
ec ifNil: [ ^self getEnvViaFFI: encodedVariableName ].
self primitiveFail
]

{ #category : #private }
OSEnvironment >> basicGetEnvViaFFI: arg1 [

"PRIVATE: This FFI call works on Strings, while the correct way to manage encodings is with raw data.
Use me through #getEnvViaFFI: to correctly marshall data."

"This method calls the Standard C Library getenv() function.
The name of the argument (arg1) should fit decompiled version."

^ self ffiCall: #( String getenv (String arg1) ) module: LibC
]

{ #category : #private }
OSEnvironment >> defaultEncoding [

^ ZnCharacterEncoder utf8
]

{ #category : #enumeration }
OSEnvironment >> do: aBlock [

^self valuesDo: aBlock
]

{ #category : #accessing }
OSEnvironment >> getEnv: varName [
{ #category : #private }
OSEnvironment >> getEnv: aVariableName [
"Gets the value of an environment variable called `aVariableName`
It is the system reponsibility to manage the encoding.
Rationale: A common denominator for all platforms providing an already decoded string, because windows does not (compared to *nix systems) provide a encoded byte representation of the value. Windows has instead its own wide string representation."
^ self getEnv: aVariableName encoding: self defaultEncoding
]

{ #category : #private }
OSEnvironment >> getEnv: aVariableName encoding: anEncoding [
"Gets the value of an environment variable called `` using `anEncoding` to encode/decode arguments and return values.
Rationale: *xes could use different encodings"

| rawValue |
rawValue := self getEnvRaw: (aVariableName encodeWith: anEncoding).
^ rawValue ifNotNil: [ rawValue decodeWith: anEncoding ]
]

{ #category : #private }
OSEnvironment >> getEnvRaw: encodedVariableName [

"This method calls the Standard C Library getenv() function."
"OSEnvironment current getEnv: 'HOME' "
"Gets the value of an environment variable called `anEncodedVariableName` already encoded.
It is the user responsibility to encode and decode argument and return values in the encoding of this preference.
Rationale: Some systems may want to have the liberty to use different encodings, or even to put binary data in the variables."

<primitive: 'primitiveGetenv' module: '' error: ec>
ec ifNil: [ ^self getEnvViaFFI: varName ].
ec == #'bad argument' ifTrue: [
varName isString
ifFalse: [ ^self getEnv: varName asString ] ].
self primitiveFail
"This method calls the primitiveGetenv primitive and falls back into FFI if not available."

"OSEnvironment current getEnvRaw: 'HOME' utf8Encoded"

| rawValue |
rawValue := self basicGetEnvRaw: encodedVariableName asString.
^ rawValue ifNotNil: [ rawValue asByteArray ].
]

{ #category : #private }
OSEnvironment >> getEnvViaFFI: arg1 [
"This method calls the Standard C Library getenv() function. The name of the argument (arg1) should fit decompiled version."
^ self ffiCall: #( String getenv (String arg1) ) module: LibC
OSEnvironment >> getEnvViaFFI: encodedString [

"The FFI call works on Strings, while the correct way to manage encodings is with raw data.
Transform back and forth from byte arrays to strings and vice versa to maintain the correct behaviour"
^ (self basicGetEnvViaFFI: encodedString asString) asByteArray
]

{ #category : #testing }
@@ -19,60 +19,25 @@ Win32Environment >> environmentStrings [

{ #category : #accessing }
Win32Environment >> getEnv: aVariableName [

<todo>
"The primitive on Windows is currently broken (2017-08-05) and instead of failing it can return nil.
"The primitive on Windows currently uses the ascii version of the Windows API.
In such chase try to get value of the environment variable using FFI."

| result |

result := super getEnv: aVariableName.
^ result ifNil: [self getEnvViaFFI: aVariableName ]
^ self getEnvViaFFI: aVariableName
]

{ #category : #private }
Win32Environment >> getEnv: arg1 buffer: arg2 size: arg3 [
"If the function succeeds, the return value is the number of characters stored in the buffer pointed to by aBuffer, not including the terminating null character.
If aBuffer is not large enough to hold the data, the return value is the buffer size, in characters, required to hold the string and its terminating null character and the contents of aBuffer are undefined.
If the function fails, the return value is zero. If the specified environment variable was not found in the environment block, GetLastError returns ERROR_ENVVAR_NOT_FOUND.
Important note: arguments of this method are named like the decompiler would use it so it could be used
in the startup process for the case no source file is found.
arg1 : a name as string representing the environment variable
arg2 : the buffer
arg3 : an integer with the size of the buffer
"
^ self ffiCall: #( int GetEnvironmentVariableA ( String arg1, char *arg2, int arg3 ) ) module: #Kernel32
]

{ #category : #private }
Win32Environment >> getEnvSize: arg1 [
"
Return the buffer size of the given environment variable.
Important note: arguments of this method are named like the decompiler would use
it so it could be used in the startup process for the case no source file is
found.
Win32Environment >> getEnvViaFFI: aVariableName [
| name buffer return |

arg1 : a name as string representing the environment variable
name := aVariableName asWin32WideString.
buffer := Win32WideString new: 500.
return := OSPlatform current getEnvironmentVariable: name into: buffer size: 500.

"
^ self ffiCall: #( int GetEnvironmentVariableA ( String arg1, nil, 0 ) ) module: #Kernel32
]

{ #category : #private }
Win32Environment >> getEnvViaFFI: aVariableName [
| valueSize buffer |
valueSize := self getEnvSize: aVariableName.
valueSize = 0
ifTrue: [ ^ nil ].
buffer := ByteArray new: valueSize.
(self getEnv: aVariableName buffer: buffer size: valueSize) = (valueSize - 1)
ifFalse: [ ^ nil ].
^ buffer allButLast asString
return = 0
ifTrue: [ self error: 'Error while getting environment variable ', aVariableName ].
return > 500
ifTrue: [ self error: 'Not enough buffer space' ].

^ buffer asString
]

{ #category : #enumeration }
@@ -0,0 +1,108 @@
"
I represent a Win32 wide string, supporting non-ascii characters.
I manage the conversion between Pharo strings and Windows strings.
(Win32String fromString: 'âùö') asString = 'âùö'
"
Class {
#name : #Win32WideString,
#superclass : #FFIExternalObject,
#category : #'System-Platforms-Windows'
}

{ #category : #'instance creation' }
Win32WideString class >> fromByteArray: byteArray [
^ self new
handle: byteArray;
yourself
]

{ #category : #'instance creation' }
Win32WideString class >> fromHandle: handle [
^ self new
handle: handle;
yourself
]

{ #category : #'instance creation' }
Win32WideString class >> fromString: aString [
| r wideString anUTF8String codepage |
wideString := self new: aString size.
anUTF8String := aString utf8Encoded asString.
codepage := 65001.

r := OSPlatform current
multiByteToWideCharacterCodepage: codepage
flags: 0
input: anUTF8String
inputLen: anUTF8String size + 1
output: wideString
outputLen: wideString byteSize.

r = 0 ifTrue: [ self error: 'Error while transforming utf8 string ', aString, ' using codepage ', codepage asString ].

^ wideString
]

{ #category : #'instance creation' }
Win32WideString class >> new: size [
^ self new
handle: (ByteArray new: (size + 1) * 2);
yourself.

]

{ #category : #converting }
Win32WideString >> asString [
| out r codepage |

codepage := 65001.
out := ByteArray new: (self size * 4) + 1.

r := OSPlatform current
wideCharacterToMultiByteCodepage: codepage
flags: 0
input: self
inputLen: self size + 1
output: out
outputLen: out size.

r = 0 ifTrue: [ self error: 'Error while transforming windows wide string using codepage ', codepage asString ].

^ (out first: r - 1) utf8Decoded
]

{ #category : #converting }
Win32WideString >> asWin32WideString [
^ self.
]

{ #category : #accessing }
Win32WideString >> byteSize [
^ self handle isExternalAddress
ifTrue: [ (self size + 1) * 2 ]
ifFalse: [ self handle size ]
]

{ #category : #printing }
Win32WideString >> printOn: aStream [
aStream
nextPutAll: 'a ' ;
nextPutAll: self class name;
nextPut: $(;
print: self asString;
nextPut: $)
]

{ #category : #accessing }
Win32WideString >> size [
| size pos |
size := 0.
pos := 1.

[ (self handle unsignedByteAt: pos) = 0 and: [ (self handle unsignedByteAt: pos + 1) = 0 ] ]
whileFalse: [ size := size + 1.
pos := pos + 2 ].

^ size
]
@@ -31,6 +31,18 @@ WinPlatform >> family [
^#Windows
]

{ #category : #'library path' }
WinPlatform >> ffiLibraryName [

^ #Kernel32
]

{ #category : #'environment-variables' }
WinPlatform >> getEnvironmentVariable: lpName into: lpBuffer size: nSize [
"Primitive to obtain an environment variable using windows Wide Strings"
^ self ffiCall: #(ulong GetEnvironmentVariableW(Win32WideString lpName, Win32WideString lpBuffer, ulong nSize))
]

{ #category : #accessing }
WinPlatform >> getPwdViaFFI: buffer size: bufferSize [
"This method calls the Standard C Library getcwd() function. The name of the argument (arg1) should fit decompiled version. This method is used in getting the current working directory. getcwd is preffered over pwd because getcwd takes care of re-initialization of environment variables, whereas pwd needs implict re- initialization.
@@ -69,6 +81,12 @@ WinPlatform >> menuShortcutString [
^ 'ctrl'
]

{ #category : #'string-manipulation' }
WinPlatform >> multiByteToWideCharacterCodepage: codepage flags: flags input: input inputLen: inputLen output: output outputLen: outputLen [

^self ffiCall: #(int MultiByteToWideChar(uint codepage, ulong flags, String input, int inputLen, Win32WideString output, int outputLen ))
]

{ #category : #accessing }
WinPlatform >> virtualKey: virtualKeyCode [
"Win32Platform virtualKey: $C charCode"
@@ -80,3 +98,16 @@ WinPlatform >> virtualKey: virtualKeyCode [
^(#($a nil $c $d nil $f $g nil nil nil nil $l $m $n nil $p nil nil $s nil nil $v nil $x nil $z)
at: virtualKeyCode-64) ifNotNil: [:char | char charCode]
]

{ #category : #'string-manipulation' }
WinPlatform >> wideCharacterToMultiByteCodepage: codepage flags: flags input: input inputLen: inputLen output: output outputLen: outputLen [
^self ffiCall: #(int WideCharToMultiByte(uint codepage,
ulong flags,
Win32WideString input,
int inputLen,
String output,
int outputLen,
0,
0
))
]

0 comments on commit ecb9f52

Please sign in to comment.
You can’t perform that action at this time.