Skip to content

Another stab at high performance I/O #77

@mossblaser

Description

@mossblaser

This is an attempt at addressing the shortcomings of #63's (PR #73) approach.

Unfortunately, it appears quite easy to hit Python's limits when doing I/O. In doing so however, it occurs to me that we can split our consideration of the problem in two:

  • Complex and performance insensitive (e.g. boot, diagnostics, control, even app loading)
  • Simple and performance sensitive: bulk transfers (i.e. read/write)

There are two exceptions to this dichotomy that I am aware of: get_machine() and the checking parts of app loading. These commands currently spew large numbers of requests to poll the machine and as a result are relatively bad for performance. When discussing these commands ST has pulled the face you would expect and seemed to be open to the idea that these would eventually become primitive commands the machine would handle itself. As a result I will ignore these for now.

The tasks in the complex/performance-insensitive category are already adequately serviced by a Python implementation such as the one in Master and it seems this will remain the case indefinitely, particularly once the exceptions noted above are resolved. Indeed, once performance is not considered an issue I cannot easily see any legitimate reason why a concurrent implementation is a benefit -- the present blocking interface should suffice.

Bulk transfers, however, are liable to stress the Python implementation. Indeed, as #73 demonstrates, performance improvements are possible simply using windowing and also through the use of parallel Ethernet links. I propose that reads/writes should be made a special-case:

  • For minimal, python-only-functionality, the existing implementation in master should remain
  • An optional C-based I/O subsystem targeted solely at bulk, parallel read/writes should be available which can be transparently enabled (in the same way as numpy).

I propose that this C I/O subsystem exist as a separate, pure-C (or C++?) library which can easily be used in isolation (e.g. could be easily used in future distributed machine loading tasks). I'm not even convinced that Python bindings are desirable: the interface should be straight-forward enough not to warrant anything more than ctypes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions