update parallel code for py3k #552

Merged
merged 6 commits into from Jul 3, 2011

Projects

None yet

3 participants

@minrk
Member
minrk commented Jul 3, 2011

This is primarily tweaks of bytes/unicode, but other fixes include:

  • some integer division
  • added co_kwonlyargcount to code objects
  • a few places to handle map/range being objects

non-copying numpy is disabled on py3k, because arrays are not reconstructed properly on the other side (see #478). They are just pickled instead for now.

With these changes, almost all parallel tests pass on my Python 3.2. Notable exceptions are: sync_imports, and MUX engine death for some reason. But in general, everything behaves as expected.

I made the changes in my branch of ipython-py3k here, and backported the changes to master.

@minrk minrk update parallel code for py3k
This is primarily tweaks of bytes/unicode, but other fixes include:

* some integer division
* added co_kwonlyargcount to code objects
* a few places to handle map/range being objects

non-copying numpy is disabled on py3k, because arrays are not reconstructed properly on the other side. This is because pyzmq
always receives a byte array (memoryview.itemsize=1), which confuses numpy.

With these changes, almost all parallel tests pass.  Notable exceptions are: sync_imports, and MUX engine death.
a82262e
@takluyver takluyver and 1 other commented on an outdated diff Jul 3, 2011
IPython/parallel/util.py
@@ -101,6 +101,12 @@ class ReverseDict(dict):
# Functions
#-----------------------------------------------------------------------------
+def ensure_bytes(s):
+ """ensure that an object is bytes"""
+ if isinstance(s, unicode):
+ s = s.encode(sys.getdefaultencoding(), 'replace')
@takluyver
takluyver Jul 3, 2011 Member

So on Python 2, this function should never be called with code points > 128?

@minrk
minrk Jul 3, 2011 Member

That's true. It's only being used on UUIDs, which will be ascii when all is going well, but if they are from miscreant ZMQ sockets, they will not be ascii, and we don't want invalid connections to raise errors in the Hub.

@takluyver
takluyver Jul 3, 2011 Member

Do we know where the error will show up if the UUIDs do contain bytes above 128? Will it be obvious what the problem is?

@takluyver takluyver commented on an outdated diff Jul 3, 2011
IPython/parallel/client/client.py
@@ -48,6 +49,11 @@ from .asyncresult import AsyncResult, AsyncHubResult
from IPython.core.profiledir import ProfileDir, ProfileDirError
from .view import DirectView, LoadBalancedView
+if sys.version_info[0] >= 3:
+ # xrange is used in a coupe 'isinstance' tests in py2
@takluyver
takluyver Jul 3, 2011 Member

couple

(I've spent too long proofreading stuff - I find it hard to refrain from picking up on typos...)

@takluyver takluyver and 1 other commented on an outdated diff Jul 3, 2011
IPython/parallel/controller/heartmonitor.py
if heart_id is None:
- heart_id = str(uuid.uuid4())
+ heart_id = ensure_bytes(uuid.uuid4())
@takluyver
takluyver Jul 3, 2011 Member

uuid.uuid4() returns an uuid.UUID object, which is not a subclass of any string type. I think we need to do ensure_bytes(str(uuid.uuid4())).

@minrk
minrk Jul 3, 2011 Member

bytes(uuid) and str(uuid) return different values ( str(uuid) returns b64-encoded value ). I'll make this change, though.

@takluyver
takluyver Jul 3, 2011 Member

bytes(uuid) is a TypeError on Python 3. I assume we're using the characters of str(uuid) on both platforms, which looks like it's hex. The alternative is the uuid.bytes value, which is bytes on both platforms, but less readable.

@minrk
minrk Jul 3, 2011 Member

Ah, I didn't realize that python3 did the wrong thing. I'll do the safe thing, then.

On Jul 3, 2011, at 10:46, takluyverreply@reply.github.com wrote:

    if heart_id is None:
  •        heart_id = str(uuid.uuid4())
    
  •        heart_id = ensure_bytes(uuid.uuid4())
    

bytes(uuid) is a TypeError on Python 3. I assume we're using the characters of str(uuid) on both platforms, which looks like it's hex. The alternative is the uuid.bytes value, which is bytes on both platforms, but less readable.

Reply to this email directly or view it on GitHub:
https://github.com/ipython/ipython/pull/552/files#r56872

@takluyver takluyver and 1 other commented on an outdated diff Jul 3, 2011
IPython/parallel/controller/hub.py
@@ -562,8 +562,9 @@ class Hub(SessionFactory):
return
record = init_record(msg)
msg_id = record['msg_id']
- record['engine_uuid'] = queue_id
- record['client_uuid'] = client_id
+ # Unicode in records
+ record['engine_uuid'] = queue_id.decode('utf8', 'replace')
+ record['client_uuid'] = client_id.decode('utf8', 'replace')
@takluyver
takluyver Jul 3, 2011 Member

Can these uuids ever contain non-ascii characters?

@minrk
minrk Jul 3, 2011 Member

yes, if they are not generated UUIDs.

@takluyver
takluyver Jul 3, 2011 Member

Will it still work correctly in that case if we're replacing characters we can't decode with utf-8? Should we use an encoding where any sequence of bytes is valid, like cp1252? Or can it only ever be valid utf-8?

@minrk
minrk Jul 3, 2011 Member

These are only unicode objects that we have encoded and passed over the wire. I don't think JSON can support cp1252. That said, I'm enforcing that this be ascii for now, and I'll address unicode cases later when I have more time to think about it.

@takluyver takluyver commented on the diff Jul 3, 2011
IPython/parallel/controller/hub.py
@@ -1179,7 +1182,7 @@ class Hub(SessionFactory):
'io' : io_dict,
}
if rec['result_buffers']:
- buffers = map(str, rec['result_buffers'])
+ buffers = map(bytes, rec['result_buffers'])
@takluyver
takluyver Jul 3, 2011 Member

What happens to these? Code reading suggests that the list gets sent over ZMQ by the get_results method below. Is this not packing them into JSON, which therefore requires str rather than bytes?

@minrk
minrk Jul 3, 2011 Member

They do not get packed into JSON, they are sent raw without copying over the network.

@takluyver
takluyver Jul 3, 2011 Member

Just tested this - on Python 3, bytes(x) is a TypeError if x is a string (i.e. unicode), and on Python 2, bytes(x)/str(x) is a UnicodeEncodeError if x is unicode with any non-ascii codepoint. So this map will only work if the buffers are already in bytes. Should it be ensure_bytes instead?

@minrk
minrk Jul 3, 2011 Member

No.

These are buffer/memoryview objects, so bytes is the right thing.

-MinRK

On Jul 3, 2011, at 10:57, takluyverreply@reply.github.com wrote:

@@ -1179,7 +1182,7 @@ class Hub(SessionFactory):
'io' : io_dict,
}
if rec['result_buffers']:

  •        buffers = map(str, rec['result_buffers'])
    
  •        buffers = map(bytes, rec['result_buffers'])
    

Just tested this - on Python 3, bytes(x) is a TypeError if x is a string (i.e. unicode), and on Python 2, bytes(x)/str(x) is a UnicodeEncodeError if x is unicode with any non-ascii codepoint. So this map will only work if the buffers are already in bytes. Should it be ensure_bytes instead?

Reply to this email directly or view it on GitHub:
https://github.com/ipython/ipython/pull/552/files#r56875

@takluyver
takluyver Jul 3, 2011 Member

OK, that makes sense, then.

@takluyver takluyver commented on the diff Jul 3, 2011
IPython/parallel/controller/scheduler.py
@@ -204,7 +208,7 @@ class TaskScheduler(SessionFactory):
try:
idents,msg = self.session.feed_identities(msg)
except ValueError:
- self.log.warn("task::Invalid Message: %r"%msg)
+ self.log.warn("task::Invalid Message: %r",msg)
@takluyver
takluyver Jul 3, 2011 Member

I assume something downstream now handles substitution, since I see you've made this change below as well.

@minrk
minrk Jul 3, 2011 Member

I learned that we should actually always do this, because it means string formatting does not happen unless the message is actually to be printed. It is particularly valuable for large debug messages.

@fperez
fperez Jul 3, 2011 Member

I didn't know that, neat trick. We should be on the lookout for this pattern in more places.

@takluyver takluyver and 1 other commented on an outdated diff Jul 3, 2011
IPython/utils/codeutil.py
@@ -23,6 +23,7 @@ __docformat__ = "restructuredtext en"
# Imports
#-------------------------------------------------------------------------------
+import sys
import new, types, copy_reg
def code_ctor(*args):
@takluyver
takluyver Jul 3, 2011 Member

While we're on this, can code_ctor use types.CodeType rather than new.code? new is completely gone in Python 3, and 2to3 doesn't change it automatically.

@minrk
minrk Jul 3, 2011 Member

sure thing.

@takluyver
Member

Thanks for doing this, Min. Apart from those couple of things (the point about uuid4 is the only one that I'm confident needs to be changed), that all looks sensible. It passes the test suite on Python 2 on my machine. I get 8 errors when it's ported to Python 3, which is substantially better than before.

We should also update the what's new section of the docs, where I'd said that the parallel machinery wasn't working in Python 3 for the 0.11 release. We probably want to say that there are some known issues, but users can start trying it out.

@fperez fperez commented on an outdated diff Jul 3, 2011
IPython/parallel/util.py
@@ -101,6 +101,12 @@ class ReverseDict(dict):
# Functions
#-----------------------------------------------------------------------------
+def ensure_bytes(s):
@fperez
fperez Jul 3, 2011 Member

Suggestion: call it asbytes instead? I'm thinking of following the patter in numpy, where asarray is what we use everywhere to ensure that if something comes in as a list/tuple we get an array out, but if it's an array it stays unmodified. Just a suggestion, not a major sticking point.

@fperez
Member
fperez commented Jul 3, 2011

Note, I just pulled this PR and ran the test suite on ubuntu 10.10, python 2.6, and got two failures:

======================================================================
ERROR: test_map (IPython.parallel.tests.test_lbview.TestLoadBalancedView)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/tests/test_lbview.py", line 56, in test_map
    r = self.view.map_sync(f, data)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 342, in map_sync
    return self.map(f,*sequences,**kwargs)
  File "", line 2, in map
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 68, in spin_after
    ret = f(self, *args, **kwargs)
  File "", line 2, in map
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 46, in save_ids
    ret = f(self, *args, **kwargs)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 1046, in map
    return pf.map(*sequences)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/remotefunction.py", line 206, in map
    ret = self.__call__(*sequences)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/remotefunction.py", line 174, in __call__
    if sys.version_info[0] >= 3:
NameError: global name 'sys' is not defined

======================================================================
ERROR: test_map (IPython.parallel.tests.test_view.TestView)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/tests/test_view.py", line 238, in test_map
    r = view.map_sync(f, data)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 342, in map_sync
    return self.map(f,*sequences,**kwargs)
  File "", line 2, in map
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 68, in spin_after
    ret = f(self, *args, **kwargs)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/view.py", line 577, in map
    return pf.map(*sequences)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/remotefunction.py", line 206, in map
    ret = self.__call__(*sequences)
  File "/home/fperez/usr/lib/python2.6/site-packages/IPython/parallel/client/remotefunction.py", line 174, in __call__
    if sys.version_info[0] >= 3:
NameError: global name 'sys' is not defined

----------------------------------------------------------------------
Ran 107 tests in 62.799s

FAILED (SKIP=10, errors=2)

The fix is trivial, just adding import sys at the top of remotefunction.py makes the test suite pass.

@fperez fperez commented on an outdated diff Jul 3, 2011
IPython/parallel/controller/sqlitedb.py
@@ -54,7 +60,11 @@ def _convert_dict(ds):
if ds is None:
return ds
else:
- return extract_dates(json.loads(ds))
+ if isinstance(ds, bytes):
+ # If I understand the sqlite doc correctly, this will always be utf8
+ ds = ds.decode('utf8')
+ d = json.loads(ds)
+ return extract_dates(d)
@fperez
fperez Jul 3, 2011 Member

No need to make the extra d variable here, a single extract_dates(json.loads(ds)) is just as clear and saves both code and a variable allocation.

@fperez fperez commented on an outdated diff Jul 3, 2011
IPython/parallel/tests/clienttest.py
@@ -41,9 +41,11 @@ def crash():
if sys.platform.startswith('win'):
import ctypes
ctypes.windll.kernel32.SetErrorMode(0x0002);
-
- co = types.CodeType(0, 0, 0, 0, b'\x04\x71\x00\x00',
- (), (), (), '', '', 1, b'')
+ args = [ 0, 0, 0, 0, b'\x04\x71\x00\x00', (), (), (), '', '', 1, b'']
+ if sys.version_info[0] >= 3:
+ args.insert(1, 0)
@fperez
fperez Jul 3, 2011 Member

I'm sure this is correct, but it's a bit opaque. A comment indicating why this extra argument is necessary would be useful (changes in the bytecode structure in py3?).

@fperez
Member
fperez commented Jul 3, 2011

Other than my (small) inline comments, this looks pretty good to me, and I'd be happy to see it in 0.11 so that Thomas could also perhaps cut out a 3k version of IPython, either with 0.11 or soon thereafter.

@minrk
Member
minrk commented Jul 3, 2011

One of the things I have to address is whether identities can be non-ASCII. The default behavior is that they always are, but they don't technically have to be. If I do enforce that, a few things will be cleaner.

On Jul 3, 2011, at 12:36, fperezreply@reply.github.com wrote:

Other than my (small) inline comments, this looks pretty good to me, and I'd be happy to see it in 0.11 so that Thomas could also perhaps cut out a 3k version of IPython, either with 0.11 or soon thereafter.

Reply to this email directly or view it on GitHub:
#552 (comment)

@fperez
Member
fperez commented Jul 3, 2011

On Sun, Jul 3, 2011 at 12:44 PM, minrk
reply@reply.github.com
wrote:

One of the things I have to address is whether identities can be non-ASCII.  The default behavior is that they always are, but they don't technically have to be.  If I do enforce that, a few things will be cleaner.

What do you see as the problem with making this a requirement? This
is fairly low-level stuff, so I'm OK imposing the ascii restriction at
this level, if it pays off in simplifying certain things (and I say
that as someone whose name doesn't fit in ascii :)

@minrk
Member
minrk commented Jul 3, 2011

Okay, I'll do that for now, and make a more careful pass at a later point to relax the restriction.

On Jul 3, 2011, at 12:47, fperezreply@reply.github.com wrote:

On Sun, Jul 3, 2011 at 12:44 PM, minrk
reply@reply.github.com
wrote:

One of the things I have to address is whether identities can be non-ASCII. The default behavior is that they always are, but they don't technically have to be. If I do enforce that, a few things will be cleaner.

What do you see as the problem with making this a requirement? This
is fairly low-level stuff, so I'm OK imposing the ascii restriction at
this level, if it pays off in simplifying certain things (and I say
that as someone whose name doesn't fit in ascii :)

Reply to this email directly or view it on GitHub:
#552 (comment)

@fperez
Member
fperez commented Jul 3, 2011

On Sun, Jul 3, 2011 at 12:55 PM, minrk
reply@reply.github.com
wrote:

Okay, I'll do that for now, and make a more careful pass at a later point to relax the restriction.

Great, thanks. With this and #504 going in, we're probably ready to cut the RC.

minrk added some commits Jul 3, 2011
@minrk minrk fix sys import in remotefunction 503ae62
@minrk minrk enforce ascii identities in parallel code
This should be temporary

also use uuid.bytes to get a binary uuid in the HeartMonitor
79b3544
@minrk minrk cleanup per review
* remove extraneous 'd' variable in sqlitedb._convert_dict
* don't use removed 'new' module in codeutil
* add note about 3k-added kwonlyargcount in crash test
* fix 'coupe' typo in client.client comment
* rename ensure_bytes asbytes
9830094
@minrk minrk fix handling of unicode in KV loader
CaselessStrEnum had to be updated to typecheck basestring instead of str

tests included
48b0ab3
@takluyver

If we're using .bytes, it can be bytes above 128, and need not be valid utf-8 either. I don't know if this matters for the heart_id, I just wanted to check, because I'd assumed we were using the hex characters from str(uuid.uuid4()).

Owner
minrk replied Jul 3, 2011

heart ids are different, because the heartmonitor is purely low-level zmq, and never gets serialized by bytes-phobic packages, so there's no reason to use horribly inefficient unicode objects.

@minrk minrk merged commit 2e9f113 into ipython:master Jul 3, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment