Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
We now bind the dynamically created methods to their class instead of generating them for each instance separately. Now, they should get created only once, at module load, and from then on get the same treatment as regularly defined methods. This should lead to better support for serialization by pickle, by ensuring that the methods are an integral part of the class, and thereby always available to instances, also after unpickling. Furthermore, this brings significant speedups and reduced memory usage for pydot objects. Notes on pickle: - This commit prevents errors after unpickling like the following: AttributeError: 'Dot' object has no attribute 'write_png' AttributeError: 'Dot' object has no attribute 'get_bgcolor' - The alternative of recreating the methods during unpickling by letting `__setstate__` or `__reduce__` call `__init__()` again was not chosen because it seems fundamentally wrong [[1]] [[2]] [[3]]. - The other alternative of serializing the instance methods along with the rest of the instance was not chosen, because: - It also feels fundamentally wrong. Why serialize methods that should never differ between instances to begin with? - Pickle refuses to work on the instance-bound methods, even after lambdas are changed to functions. This could be seen by temporarily removing `__getstate__`: AttributeError: Can't pickle local object 'Dot.__init__.<locals>.new_method' AttributeError: Can't pickle local object 'Common.create_attribute_methods.<locals>.func' - Even if an alternative to `pickle` like `dill` is used, the serializing of the methods means slower pickling and increased size of the representations. - Moving these methods from the instance to the class also eliminates the need for the current custom `__getstate__` and `__setstate__` methods. They will be removed in a later commit. [1]: https://docs.python.org/3.9/library/pickle.html#pickling-class-instances [2]: https://stackoverflow.com/questions/50308214/python-3-alternatives-for-getinitargs#comment87633434_50308545 [3]: https://nedbatchelder.com/blog/202006/pickles_nine_flaws.html#h_init_isnt_called Notes on performance: - Module load time increased between 1.5 to 2.0ms, which is only 2% on the load time reported by bash's `time`, but 60% on the `self` for `pydot` reported by `python -X importtime`. - Time saved on each object instantiation, according to `timeit`: 0.1 to 0.3ms, depending on the class, but for all it translates into a further 74 to 78% drop relative to pydot 1.4.1. - The more objects in a graph, the sooner there will be a net time saving. I estimate the break-even point to be around 5 to 15 objects. The speedups are most noticeable when building graphs from the ground up programmatically, like `test_names_of_a_thousand_nodes` from the test suite, which cuts its run time by almost 300ms or 92%. - Memory savings reported by `pympler.asizeof()` show the sizes of Dot, Edge and Node instances are cut by 20, 9 and 7 KiB respectively, which translates to 87 to 90% reductions. - Measured using: Celeron N3350, Linux 4.19 amd64, Debian 10.6, CPython 3.7.3, timeit, -X importtime, dill 0.3.2, Pympler 0.9. Further notes: - As part of the dynamic creation of each method, its function is now also renamed (`__name__` and `__qualname__`) to the name it would have had if it had been defined in the class itself. This prevents that a temporary, generic name shows up in tracebacks. - The binding to the class is performed from class decorators. Some alternatives that were considered: - Metaclasses: Got it to work, but more complex than class decorators and may cause metaclass conflicts for the subclassing user because of their inheritance rules. Many authors suggest to use decorators instead of metaclasses where possible. [[4]] [[5]] - `__init_subclass__`: Got it to work from class `Common`, but the problem is that we are not customizing all its subclasses in the same way: Subclass `Dot` needs output format methods, some other subclasses need DOT attribute getters/setters, and some others do not need anything at all. This can be solved by adding arguments to the `__init_subclass__` signature and/or inspection of the subclass, plus the necessary logic to switch between the different cases then. Or, to prevent that, by overhauling the class hierarchy and adding some new classes between `Common` and the subclasses. In both cases, class decorators seem a lot simpler in the end. [4]: https://www.python.org/dev/peps/pep-3129/#rationale [5]: https://realpython.com/python-metaclasses/#is-this-really-necessary - `create_attributes_methods()`, which creates the `get_*`/`set_*` methods, is transformed to a decorator factory: It now takes only the set of DOT attributes and returns the true decorator function that will add exactly that set to the class. It can be used as a parameterized class decorator [[6]] [[7]] or from a metaclass. - Since it does not need to be inherited by subclasses anymore, it is moved from class `Common` to module-level. - An alternative to using a parameterized decorator was to let the decorator determine which set of DOT attributes to apply based on inspection of the class. But switching based on the class name string seems fragile. Switching by type is difficult for custom subclasses because `issubclass()` does not work when classes are still being created. Reading the set from a class attribute can work, as `create_format_methods()` shows, though I wonder if it will work from a metaclass `__new__()` for example, which runs much earlier in the class creation process. Also, it raises questions about the meaning of such attribute in subclasses that already inherit all the methods created for their base class. For example, `Cluster` needs to override the set with its own while `Subgraph` keeps the set of its parent. Also, the sets of DOT attributes are currently kept as global constants and would then either have to be moved to the classes (API change) or kept in two places. So, in this case, the parameterized solution seemed to require the least modifications and provide an easy way to pass custom DOT attributes for further subclassing. [6]: https://stackoverflow.com/questions/681953/how-to-decorate-a-class/44556596#44556596 [7]: https://stackoverflow.com/questions/5929107/decorators-with-parameters (This commit is part of a series of changes in how the `get_*`, `set_*`, `create_*` and `write_*` methods are dynamically created.)
- Loading branch information