Skip to content

Python runtime version check is slow #804

Closed
@dgelessus

Description

@dgelessus

The code generated by ksc to check the version of the Python runtime is quite slow. To measure this, I compiled the spec common/bytes_with_io.ksy, which parses the entire input as a single byte array, so there's practically no overhead from type definitions. The current ksc 0.9 snapshot (ce18dc1, kaitai-io/kaitai_struct_compiler@f725faf) generates the following Python code:

# This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild

from pkg_resources import parse_version
import kaitaistruct
from kaitaistruct import KaitaiStruct, KaitaiStream, BytesIO


if parse_version(kaitaistruct.__version__) < parse_version('0.9'):
    raise Exception("Incompatible Kaitai Struct Python API: 0.9 or later is required, but you have %s" % (kaitaistruct.__version__))

class BytesWithIo(KaitaiStruct):
    """Helper type to work around Kaitai Struct not providing an `_io` member for plain byte arrays.
    """
    def __init__(self, _io, _parent=None, _root=None):
        self._io = _io
        self._parent = _parent
        self._root = _root if _root else self
        self._read()

    def _read(self):
        self.data = self._io.read_bytes_full()

With this code, running python3 -c 'import bytes_with_io' takes about 260 ms (on average, on my machine). For comparison, if I remove the version check and related import:

--- bytes_with_io.py
+++ bytes_with_io.py
@@ -1,13 +1,9 @@
 # This is a generated file! Please edit source .ksy file and use kaitai-struct-compiler to rebuild
 
-from pkg_resources import parse_version
 import kaitaistruct
 from kaitaistruct import KaitaiStruct, KaitaiStream, BytesIO
 
 
-if parse_version(kaitaistruct.__version__) < parse_version('0.9'):
-    raise Exception("Incompatible Kaitai Struct Python API: 0.9 or later is required, but you have %s" % (kaitaistruct.__version__))
-
 class BytesWithIo(KaitaiStruct):
     """Helper type to work around Kaitai Struct not providing an `_io` member for plain byte arrays.
     """

I get times of around 40 ms. Starting Python and running nothing takes 38 ms, which means that the KS code (runtime and generated code) only takes 2 ms, and import pkg_resources and the version check takes 220 ms! (Practically all of this time comes from import pkg_resources, not from the actual version check code.) This is a noticeable delay, especially for small scripts that don't process a lot of data, which often take less than 200 ms to execute (without the version check). So I experimented with some possible solutions for reducing the time needed for the version check:

  • Removing the version check entirely. This is obviously not a good solution, but it is very fast. 😛
  • Parsing __version__ using packaging.version.Version. The behavior is identical to pkg_resources.parse_version, because parse_version actually uses packaging.version internally. The disadvantage is that it requires a dependency on the packaging library. (pkg_resources has its own internal copy of packaging, so even if pkg_resources is available, import packaging might not work.) This version check takes about 20 ms (the entire command takes 60 ms). Just noticed that this isn't actually an option for us. Even if we make the next version of kaitaistruct depend on packaging, older versions won't have that dependency, so packaging might not be installed when using an older runtime and the check wouldn't work.
  • Parsing __version__ by hand into a tuple of ints and comparing that (something like tuple(int(part) for part in kaitaistruct.__version__.split(".")) < (0, 9)). This requires no external dependencies, but can only handle simple version strings like 0.9.1 - something like 0.9.1.dev, which is allowed in Python version strings, won't work. This version check has no noticeable performance impact (the entire command takes 40 ms).
  • Adding a new kaitaistruct.__version_info__ attribute that stores the version number as a tuple of ints, which can be compared. This is similar to the previous solution, except that it doesn't require any parsing in the generated code, and doesn't need to handle non-numeric version parts (those parts can be left out of __version_info__). The disadvantage is that it requires a change to the kaitaistruct runtime. This is not a big issue though - as long as __version__ stays, older version checks will continue to work, and new version checks need to handle old runtimes that don't have __version_info__ (something like getattr(kaitaistruct, "__version_info__", (0, 8)) < (0, 9)). This version check has no noticeable performance impact (the entire command takes 40 ms).

I personally prefer the last solution, because it's the simplest, has the least runtime overhead, and doesn't require any external dependencies. But if anyone has arguments for the other solutions, or a better idea that I haven't thought of, let me know.

(based on these Gitter messages by myself: https://gitter.im/kaitai_struct/Lobby?at=5f372b993e6ff00c28939f29)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions