Weekly Report VIII (July 16~July 22)

Weekly Report VIII

Done this week

The major progress I've made this week is to continue migrating dpkt to Python 3. Now there is a initial version which could run on Python 3. Except a few specific module(which needs some more time to investigate), most module can work properly and pass the all tests on both Python 2 and 3. The following is a checklist for the migration job.

Summary: 66 modules total, 63 modules done, 2 modules pending, 1 module in progress.

Specifically, there are some key points and modifications during the migration.

1. chr and ord built-in function

For ord(c), given a string of length one, it'll return an integer representing the Unicode code point of the character when the argument is a unicode object, or the value of the byte when the argument is an 8-bit string. While for chr(i), it'll return a string of one character whose ASCII code is the integer i.

In Python 2, both of the two function's usage is straight forward. For example, we have the following code snippet.

l = buf.split(chr(IAC))

However, in Python 3, please note that most time in dpkt we'll deal with data with the type of bytes. Thus it is improper if the buf is of the type of bytes while the chr() function returns str. In order to solve this problem, we update the code as follows to provide support for both Python 2 and 3.

if sys.version_info < (3,):
    l = buf.split(chr(IAC))
else:
    l = buf.split(struct.pack("B", IAC))

Similarly, for ord function, we could have the snippet as follows,

o = ord(w[0])

where w is a string and o is an integer. Yet in Python 3, every element of bytes array is an integer, thus it is no need for the calling of ord any more.

Due to the expandability consideration, we add a compatible module in the project, and it'll provide some functions that are both compatible for Python 2 and 3. Currently there is only one function, namely, ord. Please see the implementation below.

if sys.version_info < (3,):
    def compatible_ord(char):
        return ord(char)
else:
    def compatible_ord(char):
        return char

Using the compatible module, the contributor only need to modify the client code as follows.

o = compatible.compatible_ord(w[0])

2. StringIO and BytesIO

From What’s New In Python 3.0:

The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.

Thus in the project, every time we use StringIO or cStringIO, we need to update the code to make it compatible with both Python 2 and Python 3. See the sample code below.

try:
    import StringIO
    fobj = StringIO.StringIO(data)
except ImportError:
    import io
    fobj = io.BytesIO(data)

3. next method of the iterator

In Python 2 iterators have a .next() method you use to get the next value from the iterator. For instance,

>>> i = iter(range(5))
>>> i.next()
0
>>> i.next()
1

This special method has in Python 3 been renamed to .__next__() to be consistent with the naming of special attributes elswhere in Python. However, we should generally not call it directly, but instead use the builtin is next() function. This function is also available from Python 2.6. Here is an example.

for _ in range(cnt):
    try:
        ts, pkt = next(iter(self))

4. Format string

Sometimes we use the format string in the project. Below is an example.

s = '%s\x00%s\x00' % (self.filename, self.mode)

However, recall that we deal with bytes data but not str in Python 3. So the format string would not work properly in Python 3. The above code needs to be updated as follows.

if sys.version_info < (3,):
    s = '%s\x00%s\x00' % (self.filename, self.mode)
else:
    s = self.filename + b'\x00' + self.mode + b'\x00'

Issues need discussion

Dictionary methods

In Python 2 dictionaries have the methods iterkeys(), itervalues() and iteritems() that return iterators instead of lists. In Python 3 the standard keys(), values() and items() return dictionary views, which are iterators, so the iterator variants become pointless and are removed.

In our project, we can use try / catch to provide both support for Python 2 and 3.

try:
    values = d.itervalues()
except AttributeError:
    values = d.values()

Test related

Currently the build is always failing due to our solution to metaclass problem. See the following snapshot.

How to deal with this problem?
Should we config the travis-ci to add Python 3 build?
Should we add some test cases to increase the test coverage?

Metaclasses

Should we update http.py and sip.py module just like the way we solve dpkt.py?

Also there is an alternative solution on https://wiki.python.org/moin/PortingToPy3k/BilingualQuickRef#metaclasses.

Syntax for creating instances with different metaclasses is very different between Python 2 and 3. Use the ability to call type instances as a way to portably create such instances. For example (from the flufl.enum package):

# Define the Enum class using metaclass syntax compatible with both Python 2
# and Python 3.
Enum = EnumMetaclass(str('Enum'), (), {
    '__doc__': 'The public API Enum class.',
})

Here EnumMetaclass is the metaclass (duh!) and Enum is the class you're creating which has the custom metaclass. You pass in the base classes (of which there are none, hence the empty tuple) and the dictionary of attributes for the class you're creating.

General Plan

Plan for next week

As discussed with Kiran, this week and next week are two "critical" weeks. Hopefully we'll have an initial working dpkt package on Python by the end of next week. There are still some tasks need to be finished during the next week.

Finish the pending module migration (see the table in the very beginning of the report to get more details).
Setup tox on local machine and make sure the package is able to pass all the test cases on Python 2 and 3.
Maybe update the travis-ci automatic build check of the project.
Solve the problems and comments Kiran and other contributors give.
Finish writing the migration notes.
Update the code based on the review.

Provide feedback

Saved searches