Skip to content

Commit

Permalink
Dyn methods: Bind to class
Browse files Browse the repository at this point in the history
We now bind the dynamically created methods to their class instead of
generating them for each instance separately. Now, they should get
created only once, at module load, and from then on get the same
treatment as regularly defined methods.

This should lead to better support for serialization by pickle, by
ensuring that the methods are an integral part of the class, and
thereby always available to instances, also after unpickling.

Furthermore, this brings significant speedups and reduced memory usage
for pydot objects.

Notes on pickle:

- This commit prevents errors after unpickling like the following:

      AttributeError: 'Dot' object has no attribute 'write_png'
      AttributeError: 'Dot' object has no attribute 'get_bgcolor'

- The alternative of recreating the methods during unpickling by
  letting `__setstate__` or `__reduce__` call `__init__()` again was
  not chosen because it seems fundamentally wrong [[1]] [[2]] [[3]].

- The other alternative of serializing the instance methods along with
  the rest of the instance was not chosen, because:

  - It also feels fundamentally wrong. Why serialize methods that
    should never differ between instances to begin with?

  - Pickle refuses to work on the instance-bound methods, even after
    lambdas are changed to functions. This could be seen by temporarily
    removing `__getstate__`:

        AttributeError: Can't pickle local object 'Dot.__init__.<locals>.new_method'
        AttributeError: Can't pickle local object 'Common.create_attribute_methods.<locals>.func'

  - Even if an alternative to `pickle` like `dill` is used, the
    serializing of the methods means slower pickling and increased size
    of the representations.

- Moving these methods from the instance to the class also eliminates
  the need for the current custom `__getstate__` and `__setstate__`
  methods. They will be removed in a later commit.

[1]: https://docs.python.org/3.9/library/pickle.html#pickling-class-instances
[2]: https://stackoverflow.com/questions/50308214/python-3-alternatives-for-getinitargs#comment87633434_50308545
[3]: https://nedbatchelder.com/blog/202006/pickles_nine_flaws.html#h_init_isnt_called

Notes on performance:

- Module load time increased between 1.5 to 2.0ms, which is only 2% on
  the load time reported by bash's `time`, but 60% on the `self` for
  `pydot` reported by `python -X importtime`.
- Time saved on each object instantiation, according to `timeit`: 0.1
  to 0.3ms, depending on the class, but for all it translates into a
  further 74 to 78% drop relative to pydot 1.4.1.
- The more objects in a graph, the sooner there will be a net time
  saving. I estimate the break-even point to be around 5 to 15 objects.
  The speedups are most noticeable when building graphs from the ground
  up programmatically, like `test_names_of_a_thousand_nodes` from the
  test suite, which cuts its run time by almost 300ms or 92%.
- Memory savings reported by `pympler.asizeof()` show the sizes of Dot,
  Edge and Node instances are cut by 20, 9 and 7 KiB respectively,
  which translates to 87 to 90% reductions.
- Measured using: Celeron N3350, Linux 4.19 amd64, Debian 10.6, CPython
  3.7.3, timeit, -X importtime, dill 0.3.2, Pympler 0.9.

Further notes:

- As part of the dynamic creation of each method, its function is
  now also renamed (`__name__` and `__qualname__`) to the name it would
  have had if it had been defined in the class itself. This prevents
  that a temporary, generic name shows up in tracebacks.

- The binding to the class is performed from class decorators. Some
  alternatives that were considered:
  - Metaclasses: Got it to work, but more complex than class decorators
    and may cause metaclass conflicts for the subclassing user because
    of their inheritance rules. Many authors suggest to use decorators
    instead of metaclasses where possible. [[4]] [[5]]
  - `__init_subclass__`: Got it to work from class `Common`, but the
    problem is that we are not customizing all its subclasses in the
    same way: Subclass `Dot` needs output format methods, some other
    subclasses need DOT attribute getters/setters, and some others do
    not need anything at all. This can be solved by adding arguments to
    the `__init_subclass__` signature and/or inspection of the
    subclass, plus the necessary logic to switch between the different
    cases then. Or, to prevent that, by overhauling the class hierarchy
    and adding some new classes between `Common` and the subclasses. In
    both cases, class decorators seem a lot simpler in the end.

[4]: https://www.python.org/dev/peps/pep-3129/#rationale
[5]: https://realpython.com/python-metaclasses/#is-this-really-necessary

- `create_attributes_methods()`, which creates the `get_*`/`set_*`
  methods, is transformed to a decorator factory: It now takes only the
  set of DOT attributes and returns the true decorator function that
  will add exactly that set to the class. It can be used as a
  parameterized class decorator [[6]] [[7]] or from a metaclass.
  - Since it does not need to be inherited by subclasses anymore, it is
    moved from class `Common` to module-level.
  - An alternative to using a parameterized decorator was to let the
    decorator determine which set of DOT attributes to apply based on
    inspection of the class. But switching based on the class name
    string seems fragile. Switching by type is difficult for custom
    subclasses because `issubclass()` does not work when classes are
    still being created. Reading the set from a class attribute can
    work, as `create_format_methods()` shows, though I wonder if it
    will work from a metaclass `__new__()` for example, which runs much
    earlier in the class creation process. Also, it raises questions
    about the meaning of such attribute in subclasses that already
    inherit all the methods created for their base class. For example,
    `Cluster` needs to override the set with its own while `Subgraph`
    keeps the set of its parent. Also, the sets of DOT attributes are
    currently kept as global constants and would then either have to be
    moved to the classes (API change) or kept in two places. So, in
    this case, the parameterized solution seemed to require the least
    modifications and provide an easy way to pass custom DOT attributes
    for further subclassing.

[6]: https://stackoverflow.com/questions/681953/how-to-decorate-a-class/44556596#44556596
[7]: https://stackoverflow.com/questions/5929107/decorators-with-parameters

(This commit is part of a series of changes in how the `get_*`,
`set_*`, `create_*` and `write_*` methods are dynamically created.)
  • Loading branch information
peternowee committed Oct 26, 2020
1 parent 57a8adc commit b675868
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 74 deletions.
20 changes: 20 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# `pydot` changelog

2.0.0 (unreleased)
------------------

- Object instantiation times are reduced by over 95%. Module import
time has slightly increased, but is only one-off and should be offset
by the gains after around 5 to 15 pydot objects. Object memory sizes
are reduced by over 85%. (#242)

API:

API (minor change or few affected):
- `create_attribute_methods()`, which adds getter and setter methods
for DOT attributes to a class, is moved out of class `Common` and
works in two steps now: First pass it the set of attribute names and
it will return a second function that can be called on the class. The
two steps can be combined in a parameterized class decorator:
`@create_attribute_methods(my_dot_attribute_set)`. (#242)
- `Dot.formats` changes from an instance attribute to a class
attribute. (#242)


1.4.2 (unreleased)
------------------
Expand Down
169 changes: 95 additions & 74 deletions pydot.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,86 @@ def graph_from_incidence_matrix(matrix, node_prefix='', directed=False):
return graph


def add_function_to_class(cls, name, func):
"""Bind a function object as a method to a class
Rename the provided function (both its __name__ and __qualname__)
and bind it as a method to the provided class so that it is treated
the same as methods defined in the class itself.
@param cls: class to bind the method to
@param name: intended name of the function as a string
@param func: function object to be bound
"""
func.__name__ = name
if hasattr(func, '__qualname__'):
func.__qualname__ = '.'.join([cls.__name__, func.__name__])
setattr(cls, func.__name__, func)


def create_attribute_methods(obj_attributes):
"""Create function to add DOT attribute getters and setters to a class.
This function is a decorator factory that returns a function
create_attribute_methods_decorator that creates getter and setter
methods (get_'name' and set_'name') for the DOT attributes in
obj_attributes and binds them to the class it is called on. The
returned function can be used as a class decorator.
@param obj_attributes: set of DOT attribute name strings
"""
def create_attribute_methods_decorator(cls):
"""Add DOT attribute getters and setters to a class.
This function creates getter and setter methods (get_'name' and
set_'name') for a set of DOT attributes and binds these methods
to class cls. Refer to the call of create_attribute_methods()
to know for which specific set.
@param cls: class to which to bind the created methods
"""
for attr in obj_attributes:

# Generate all the Setter methods.
#
def func(self, x, a=attr):
self.obj_dict['attributes'].__setitem__(a, x)
add_function_to_class(cls, 'set_{}'.format(attr), func)

# Generate all the Getter methods.
#
def func(self, a=attr): # pylint: disable=function-redefined
return self.__get_attribute__(a)
add_function_to_class(cls, 'get_{}'.format(attr), func)

return cls

return create_attribute_methods_decorator


def create_format_methods(cls):
"""Decorator for class Dot to define its format output methods.
This decorator function automatically creates all the methods
enabling the creation of output in any of the supported formats.
"""
for frmt in cls.formats:
def func(self, f=frmt, prog='dot', encoding=None):
"""Refer to docstring of method `create`."""
return self.create(format=f, prog=prog, encoding=encoding)
add_function_to_class(cls, 'create_{fmt}'.format(fmt=frmt),
func)

for frmt in cls.formats+['raw']:
def func(self, path, f=frmt, prog='dot', encoding=None):
"""Refer to docstring of method `write.`"""
self.write(path, format=f, prog=prog, encoding=encoding)
add_function_to_class(cls, 'write_{fmt}'.format(fmt=frmt),
func)

return cls


class Common(object):
"""Common information to several classes.
Expand Down Expand Up @@ -526,25 +606,6 @@ def get_sequence(self):
return self.obj_dict['sequence']


def create_attribute_methods(self, obj_attributes):

#for attr in self.obj_dict['attributes']:
for attr in obj_attributes:

# Generate all the Setter methods.
#
def func(x, a=attr):
self.obj_dict['attributes'].__setitem__(a, x)
setattr(self, 'set_'+attr, func)

# Generate all the Getter methods.
#
def func(a=attr): # pylint: disable=function-redefined
return self.__get_attribute__(a)
setattr(self, 'get_'+attr, func)



class Error(Exception):
"""General error handling class.
"""
Expand All @@ -563,7 +624,7 @@ def __str__(self):
return self.value



@create_attribute_methods(NODE_ATTRIBUTES)
class Node(Common):
"""A graph node.
Expand Down Expand Up @@ -616,8 +677,6 @@ def __init__(self, name = '', obj_dict = None, **attrs):
self.obj_dict['name'] = quote_if_necessary(name)
self.obj_dict['port'] = port

self.create_attribute_methods(NODE_ATTRIBUTES)

def __str__(self):
return self.to_string()

Expand Down Expand Up @@ -687,7 +746,7 @@ def to_string(self):
return node + ';'



@create_attribute_methods(EDGE_ATTRIBUTES)
class Edge(Common):
"""A graph edge.
Expand Down Expand Up @@ -735,7 +794,6 @@ def __init__(self, src='', dst='', obj_dict=None, **attrs):
self.obj_dict[ 'sequence' ] = None
else:
self.obj_dict = obj_dict
self.create_attribute_methods(EDGE_ATTRIBUTES)

def __str__(self):
return self.to_string()
Expand Down Expand Up @@ -875,9 +933,7 @@ def to_string(self):
return ' '.join(edge) + ';'





@create_attribute_methods(GRAPH_ATTRIBUTES)
class Graph(Common):
"""Class representing a graph in Graphviz's dot language.
Expand Down Expand Up @@ -914,7 +970,6 @@ class Graph(Common):
graph_instance.obj_dict['attributes']['fontname']
"""


def __init__(self, graph_name='G', obj_dict=None,
graph_type='digraph', strict=False,
suppress_disconnected=False, simplify=False, **attrs):
Expand Down Expand Up @@ -950,8 +1005,6 @@ def __init__(self, graph_name='G', obj_dict=None,
self.set_parent_graph(self)


self.create_attribute_methods(GRAPH_ATTRIBUTES)

def __str__(self):
return self.to_string()

Expand Down Expand Up @@ -1628,8 +1681,7 @@ def __init__(self, graph_name='',
self.obj_dict['type'] = 'subgraph'




@create_attribute_methods(CLUSTER_ATTRIBUTES)
class Cluster(Graph):

"""Class representing a cluster in Graphviz's dot language.
Expand Down Expand Up @@ -1663,7 +1715,6 @@ class Cluster(Graph):
cluster_instance.obj_dict['attributes']['fontname']
"""


def __init__(self, graph_name='subG',
obj_dict=None, suppress_disconnected=False,
simplify=False, **attrs):
Expand All @@ -1678,13 +1729,8 @@ def __init__(self, graph_name='subG',
self.obj_dict['type'] = 'subgraph'
self.obj_dict['name'] = quote_if_necessary('cluster_'+graph_name)

self.create_attribute_methods(CLUSTER_ATTRIBUTES)






@create_format_methods
class Dot(Graph):
"""A container for handling a dot language file.
Expand All @@ -1693,49 +1739,24 @@ class Dot(Graph):
the base class 'Graph'.
"""


formats = [
'canon', 'cmap', 'cmapx',
'cmapx_np', 'dia', 'dot',
'fig', 'gd', 'gd2', 'gif',
'hpgl', 'imap', 'imap_np', 'ismap',
'jpe', 'jpeg', 'jpg', 'mif',
'mp', 'pcl', 'pdf', 'pic', 'plain',
'plain-ext', 'png', 'ps', 'ps2',
'svg', 'svgz', 'vml', 'vmlz',
'vrml', 'vtx', 'wbmp', 'xdot', 'xlib']

def __init__(self, *argsl, **argsd):
Graph.__init__(self, *argsl, **argsd)

self.shape_files = list()
self.formats = [
'canon', 'cmap', 'cmapx',
'cmapx_np', 'dia', 'dot',
'fig', 'gd', 'gd2', 'gif',
'hpgl', 'imap', 'imap_np', 'ismap',
'jpe', 'jpeg', 'jpg', 'mif',
'mp', 'pcl', 'pdf', 'pic', 'plain',
'plain-ext', 'png', 'ps', 'ps2',
'svg', 'svgz', 'vml', 'vmlz',
'vrml', 'vtx', 'wbmp', 'xdot', 'xlib']

self.prog = 'dot'

# Automatically creates all
# the methods enabling the creation
# of output in any of the supported formats.
for frmt in self.formats:
def new_method(
f=frmt, prog='dot',
encoding=None):
"""Refer to docstring of method `create`."""
return self.create(
format=f, prog=prog, encoding=encoding)
name = 'create_{fmt}'.format(fmt=frmt)
setattr(self, name, new_method)

for frmt in self.formats+['raw']:
def new_method(
path, f=frmt, prog='dot',
encoding=None):
"""Refer to docstring of method `write.`"""
self.write(
path, format=f, prog=prog,
encoding=encoding)
name = 'write_{fmt}'.format(fmt=frmt)
setattr(self, name, new_method)

def __getstate__(self):

dict = copy.copy(self.obj_dict)
Expand Down

0 comments on commit b675868

Please sign in to comment.