-
Notifications
You must be signed in to change notification settings - Fork 1
Weekly Report VIII (July 16~July 22)
The major progress I've made this week is to continue migrating dpkt
to Python 3. Now there is a initial version which could run on Python 3. Except a few specific module(which needs some more time to investigate), most module can work properly and pass the all tests on both Python 2 and 3. The following is a checklist for the migration job.
Summary: 66 modules total, 63 modules done, 2 modules pending, 1 module in progress.
Specifically, there are some key points and modifications during the migration.
1. chr
and ord
built-in function
For ord(c)
, given a string of length one, it'll return an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string. While for chr(i)
, it'll return a string of one character whose ASCII code is the integer i.
In Python 2, both of the two function's usage is straight forward. For example, we have the following code snippet.
l = buf.split(chr(IAC))
However, in Python 3, please note that most time in dpkt
we'll deal with data with the type of bytes
. Thus it is improper if the buf
is of the type of bytes
while the chr()
function returns str
. In order to solve this problem, we update the code as follows to provide support for both Python 2 and 3.
if sys.version_info < (3,):
l = buf.split(chr(IAC))
else:
l = buf.split(struct.pack("B", IAC))
Similarly, for ord
function, we could have the snippet as follows,
o = ord(w[0])
where w
is a string and o
is an integer. Yet in Python 3, every element of bytes
array is an integer, thus it is no need for the calling of ord
any more.
Due to the expandability consideration, we add a compatible
module in the project, and it'll provide some functions that are both compatible for Python 2 and 3. Currently there is only one function, namely, ord
. Please see the implementation below.
if sys.version_info < (3,):
def compatible_ord(char):
return ord(char)
else:
def compatible_ord(char):
return char
Using the compatible
module, the contributor only need to modify the client code as follows.
o = compatible.compatible_ord(w[0])
2. StringIO
and BytesIO
From What’s New In Python 3.0:
The
StringIO
andcStringIO
modules are gone. Instead, import theio
module and useio.StringIO
orio.BytesIO
for text and data respectively.
Thus in the project, every time we use StringIO
or cStringIO
, we need to update the code to make it compatible with both Python 2 and Python 3. See the sample code below.
try:
import StringIO
fobj = StringIO.StringIO(data)
except ImportError:
import io
fobj = io.BytesIO(data)
3. next
method of the iterator
In Python 2 iterators have a .next()
method you use to get the next value from the iterator. For instance,
>>> i = iter(range(5))
>>> i.next()
0
>>> i.next()
1
This special method has in Python 3 been renamed to .__next__()
to be consistent with the naming of special attributes elswhere in Python. However, we should generally not call it directly, but instead use the builtin is next()
function. This function is also available from Python 2.6. Here is an example.
for _ in range(cnt):
try:
ts, pkt = next(iter(self))
4. Format string
Sometimes we use the format string in the project. Below is an example.
s = '%s\x00%s\x00' % (self.filename, self.mode)
However, recall that we deal with bytes
data but not str
in Python 3. So the format string would not work properly in Python 3. The above code needs to be updated as follows.
if sys.version_info < (3,):
s = '%s\x00%s\x00' % (self.filename, self.mode)
else:
s = self.filename + b'\x00' + self.mode + b'\x00'
Dictionary methods
In Python 2 dictionaries have the methods
iterkeys()
,itervalues()
anditeritems()
that return iterators instead of lists. In Python 3 the standardkeys()
,values()
anditems()
return dictionary views, which are iterators, so the iterator variants become pointless and are removed.
In our project, we can use try / catch
to provide both support for Python 2 and 3.
try:
values = d.itervalues()
except AttributeError:
values = d.values()
Test related
Currently the build is always failing due to our solution to metaclass problem. See the following snapshot.
- How to deal with this problem?
- Should we config the
travis-ci
to add Python 3 build? - Should we add some test cases to increase the test coverage?
Metaclasses
Should we update http.py
and sip.py
module just like the way we solve dpkt.py
?
Also there is an alternative solution on https://wiki.python.org/moin/PortingToPy3k/BilingualQuickRef#metaclasses.
Syntax for creating instances with different metaclasses is very different between Python 2 and 3. Use the ability to call type instances as a way to portably create such instances. For example (from the
flufl.enum
package):
# Define the Enum class using metaclass syntax compatible with both Python 2
# and Python 3.
Enum = EnumMetaclass(str('Enum'), (), {
'__doc__': 'The public API Enum class.',
})
Here EnumMetaclass is the metaclass (duh!) and Enum is the class you're creating which has the custom metaclass. You pass in the base classes (of which there are none, hence the empty tuple) and the dictionary of attributes for the class you're creating.
As discussed with Kiran, this week and next week are two "critical" weeks. Hopefully we'll have an initial working dpkt
package on Python by the end of next week. There are still some tasks need to be finished during the next week.
- Finish the pending module migration (see the table in the very beginning of the report to get more details).
- Setup
tox
on local machine and make sure the package is able to pass all the test cases on Python 2 and 3. - Maybe update the
travis-ci
automatic build check of the project. - Solve the problems and comments Kiran and other contributors give.
- Finish writing the migration notes.
- Update the code based on the review.