Skip to content
This repository

Win32 shlex #1064

Merged
merged 7 commits into from over 2 years ago

3 participants

Jörgen Stenarson Thomas Kluyver Fernando Perez
Jörgen Stenarson
Collaborator

Suggested more complete fix for issue #592. Using ctypes to call a windows function for doing shlex like splitting.

Had to comment out the unicode strings in test_arg_split to get the tests to run (see #1063).

Thomas Kluyver
Collaborator

Note that we've got modules in utils named _process_win32 and _process_posix. That's probably the place to put platform specific logic like this.

IPython/utils/tests/test_process.py
((5 lines not shown))
65 66
 def test_arg_split():
66 67
     """Ensure that argument lines are correctly split like in a shell."""
67 68
     tests = [['hi', ['hi']],
68 69
              [u'hi', [u'hi']],
69 70
              ['hello there', ['hello', 'there']],
70  
-             [u'h\N{LATIN SMALL LETTER A WITH CARON}llo', [u'h\N{LATIN SMALL LETTER A WITH CARON}llo']],
  71
+#             [u'h\N{LATIN SMALL LETTER A WITH CARON}llo', [u'h\N{LATIN SMALL LETTER A WITH CARON}llo']],
2
Thomas Kluyver Collaborator

I'd rather not comment out this test. I don't think it's essential to use a name escape: it should work with a \u escape.

Jörgen Stenarson Collaborator
jstenar added a note November 28, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Thomas Kluyver
Collaborator

Do we need a pure-python fallback? Is there any situation in which all this ctypes stuff could fail?

Fernando Perez
Owner

@takluyver, you're our unicode guru; is this good to go or does it need any further work?

Thomas Kluyver
Collaborator

Well, it doesn't really change anything on the posix side (I'm assuming Jörgen just copied and pasted the arg_split function for the posix version, I haven't checked it character-for-character). The windows ctypes stuff doesn't mean much to me.

@jstenar: I'd just like to clarify: is there any version of Windows, any locale, or other setting, under which it could fail? I know that ctypes code can cause segfaults if it goes wrong, and I can't test it here. Also, is there any way to access this functionality through a library like pywin32 - even if we have this as a fallback when that's not installed? I'm a bit wary of relying on ctypes code.

Jörgen Stenarson
Collaborator

As far as I know CommandLineToArgvW is not dependent on locales. It is however available only from Windows 2000 and forward. I could add a try/except guard to catch if it is missing and fall back to the old posix implementation (by copying it, because I can't import _process_posix on windows).

I could not find this function in pywin32.

Thomas Kluyver
Collaborator
Fernando Perez
Owner

We're definitely not worrying about windows 98! XP and newer is more than enough of a cutoff, I think.

Jörgen Stenarson
Collaborator

I moved the posix version of arg_split to _process_common and use that as a fallback in _process_win32 if CommandLineToArgvW. That way we will have a fallback and not just crash.

Fernando Perez
Owner

Great, thanks! @takluyver, this is looking pretty baked out then, right? If both you and @jstenar are happy with it, merge away! @jstenar, thanks for the work :)

ps - double-check that any issues supposed to be closed do get closed, I've seen github recently flake out and not closing issues listed in commits.

Thomas Kluyver takluyver merged commit c72bbc7 into from November 30, 2011
Thomas Kluyver takluyver closed this November 30, 2011
Thomas Kluyver
Collaborator

Great. Tested and merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
25  IPython/utils/_process_common.py
@@ -15,6 +15,7 @@
15 15
 # Imports
16 16
 #-----------------------------------------------------------------------------
17 17
 import subprocess
  18
+import shlex
18 19
 import sys
19 20
 
20 21
 from IPython.utils import py3compat
@@ -143,3 +144,27 @@ def getoutputerror(cmd):
143 144
         return '', ''
144 145
     out, err = out_err
145 146
     return py3compat.bytes_to_str(out), py3compat.bytes_to_str(err)
  147
+
  148
+
  149
+def arg_split(s, posix=False):
  150
+    """Split a command line's arguments in a shell-like manner.
  151
+
  152
+    This is a modified version of the standard library's shlex.split()
  153
+    function, but with a default of posix=False for splitting, so that quotes
  154
+    in inputs are respected."""
  155
+
  156
+    # Unfortunately, python's shlex module is buggy with unicode input:
  157
+    # http://bugs.python.org/issue1170
  158
+    # At least encoding the input when it's unicode seems to help, but there
  159
+    # may be more problems lurking.  Apparently this is fixed in python3.
  160
+    is_unicode = False
  161
+    if (not py3compat.PY3) and isinstance(s, unicode):
  162
+        is_unicode = True
  163
+        s = s.encode('utf-8')
  164
+    lex = shlex.shlex(s, posix=posix)
  165
+    lex.whitespace_split = True
  166
+    tokens = list(lex)
  167
+    if is_unicode:
  168
+        # Convert the tokens back to unicode.
  169
+        tokens = [x.decode('utf-8') for x in tokens]
  170
+    return tokens
5  IPython/utils/_process_posix.py
@@ -23,7 +23,7 @@
23 23
 
24 24
 # Our own
25 25
 from .autoattr import auto_attr
26  
-from ._process_common import getoutput
  26
+from ._process_common import getoutput, arg_split
27 27
 from IPython.utils import text
28 28
 from IPython.utils import py3compat
29 29
 
@@ -192,3 +192,6 @@ def system(self, cmd):
192 192
 # programs think they are talking to a tty and produce highly formatted output
193 193
 # (ls is a good example) that makes them hard.
194 194
 system = ProcessHandler().system
  195
+
  196
+
  197
+
30  IPython/utils/_process_win32.py
@@ -18,11 +18,15 @@
18 18
 # stdlib
19 19
 import os
20 20
 import sys
  21
+import ctypes
21 22
 
  23
+from ctypes import c_int, POINTER
  24
+from ctypes.wintypes import LPCWSTR, HLOCAL
22 25
 from subprocess import STDOUT
23 26
 
24 27
 # our own imports
25 28
 from ._process_common import read_no_interrupt, process_handler
  29
+from . import py3compat
26 30
 from . import text
27 31
 
28 32
 #-----------------------------------------------------------------------------
@@ -146,3 +150,29 @@ def getoutput(cmd):
146 150
     if out is None:
147 151
         out = ''
148 152
     return out
  153
+
  154
+try:
  155
+    CommandLineToArgvW = ctypes.windll.shell32.CommandLineToArgvW
  156
+    CommandLineToArgvW.arg_types = [LPCWSTR, POINTER(c_int)]
  157
+    CommandLineToArgvW.res_types = [POINTER(LPCWSTR)]
  158
+    LocalFree = ctypes.windll.kernel32.LocalFree
  159
+    LocalFree.res_type = HLOCAL
  160
+    LocalFree.arg_types = [HLOCAL]
  161
+    
  162
+    def arg_split(commandline, posix=False):
  163
+        """Split a command line's arguments in a shell-like manner.
  164
+
  165
+        This is a special version for windows that use a ctypes call to CommandLineToArgvW
  166
+        to do the argv splitting. The posix paramter is ignored.
  167
+        """
  168
+        #CommandLineToArgvW returns path to executable if called with empty string.
  169
+        if commandline.strip() == "":
  170
+            return []
  171
+        argvn = c_int()
  172
+        result_pointer = CommandLineToArgvW(py3compat.cast_unicode(commandline.lstrip()), ctypes.byref(argvn))
  173
+        result_array_type = LPCWSTR * argvn.value
  174
+        result = [arg for arg in result_array_type.from_address(result_pointer)]
  175
+        retval = LocalFree(result_pointer)
  176
+        return result
  177
+except AttributeError:
  178
+    from ._process_common import arg_split
30  IPython/utils/process.py
@@ -22,9 +22,10 @@
22 22
 
23 23
 # Our own
24 24
 if sys.platform == 'win32':
25  
-    from ._process_win32 import _find_cmd, system, getoutput, AvoidUNCPath
  25
+    from ._process_win32 import _find_cmd, system, getoutput, AvoidUNCPath, arg_split
26 26
 else:
27  
-    from ._process_posix import _find_cmd, system, getoutput
  27
+    from ._process_posix import _find_cmd, system, getoutput, arg_split
  28
+
28 29
 
29 30
 from ._process_common import getoutputerror
30 31
 from IPython.utils import py3compat
@@ -103,31 +104,6 @@ def pycmd2argv(cmd):
103 104
         else:
104 105
             return [sys.executable, cmd]
105 106
 
106  
-
107  
-def arg_split(s, posix=False):
108  
-    """Split a command line's arguments in a shell-like manner.
109  
-
110  
-    This is a modified version of the standard library's shlex.split()
111  
-    function, but with a default of posix=False for splitting, so that quotes
112  
-    in inputs are respected."""
113  
-
114  
-    # Unfortunately, python's shlex module is buggy with unicode input:
115  
-    # http://bugs.python.org/issue1170
116  
-    # At least encoding the input when it's unicode seems to help, but there
117  
-    # may be more problems lurking.  Apparently this is fixed in python3.
118  
-    is_unicode = False
119  
-    if (not py3compat.PY3) and isinstance(s, unicode):
120  
-        is_unicode = True
121  
-        s = s.encode('utf-8')
122  
-    lex = shlex.shlex(s, posix=posix)
123  
-    lex.whitespace_split = True
124  
-    tokens = list(lex)
125  
-    if is_unicode:
126  
-        # Convert the tokens back to unicode.
127  
-        tokens = [x.decode('utf-8') for x in tokens]
128  
-    return tokens
129  
-
130  
-
131 107
 def abbrev_cwd():
132 108
     """ Return abbreviated version of cwd, e.g. d:mydir """
133 109
     cwd = os.getcwdu().replace('\\','/')
22  IPython/utils/tests/test_process.py
@@ -62,16 +62,32 @@ def test_find_cmd_fail():
62 62
     nt.assert_raises(FindCmdError,find_cmd,'asdfasdf')
63 63
 
64 64
     
  65
+@dec.skip_win32
65 66
 def test_arg_split():
66 67
     """Ensure that argument lines are correctly split like in a shell."""
67 68
     tests = [['hi', ['hi']],
68 69
              [u'hi', [u'hi']],
69 70
              ['hello there', ['hello', 'there']],
70  
-             [u'h\N{LATIN SMALL LETTER A WITH CARON}llo', [u'h\N{LATIN SMALL LETTER A WITH CARON}llo']],
  71
+             # \u01ce == \N{LATIN SMALL LETTER A WITH CARON}
  72
+             # Do not use \N because the tests crash with syntax error in
  73
+             # some cases, for example windows python2.6.
  74
+             [u'h\u01cello', [u'h\u01cello']],
71 75
              ['something "with quotes"', ['something', '"with quotes"']],
72 76
              ]
73 77
     for argstr, argv in tests:
74 78
         nt.assert_equal(arg_split(argstr), argv)
  79
+    
  80
+@dec.skip_if_not_win32
  81
+def test_arg_split_win32():
  82
+    """Ensure that argument lines are correctly split like in a shell."""
  83
+    tests = [['hi', ['hi']],
  84
+             [u'hi', [u'hi']],
  85
+             ['hello there', ['hello', 'there']],
  86
+             [u'h\u01cello', [u'h\u01cello']],
  87
+             ['something "with quotes"', ['something', 'with quotes']],
  88
+             ]
  89
+    for argstr, argv in tests:
  90
+        nt.assert_equal(arg_split(argstr), argv)
75 91
 
76 92
 
77 93
 class SubProcessTestCase(TestCase, tt.TempFileMixin):
@@ -100,6 +116,10 @@ def test_getoutput(self):
100 116
     def test_getoutput_quoted(self):
101 117
         out = getoutput('python -c "print (1)"')
102 118
         self.assertEquals(out.strip(), '1')
  119
+
  120
+    #Invalid quoting on windows
  121
+    @dec.skip_win32
  122
+    def test_getoutput_quoted2(self):
103 123
         out = getoutput("python -c 'print (1)'")
104 124
         self.assertEquals(out.strip(), '1')
105 125
         out = getoutput("python -c 'print (\"1\")'")
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.