Cleanup transforms.py. #7394

Merged
merged 3 commits into from Feb 21, 2017

Conversation

Projects
None yet
7 participants
Contributor

anntzer commented Nov 3, 2016

Simplify the implementation of the various contains-like methods.

Switch to modern property declarations. Cleanup some docstrings.

def get_points(self):
- return NotImplementedError()
+ raise NotImplementedError
@tacaswell

tacaswell Nov 3, 2016

Owner

Should this be raise NotImplementedError() ?

@anntzer

anntzer Nov 3, 2016

Contributor

https://docs.python.org/3/reference/simple_stmts.html#the-raise-statement:
Otherwise, raise evaluates the first expression as the exception object. It must be either a subclass or an instance of BaseException. If it is a class, the exception instance will be obtained when needed by instantiating the class with no arguments.

@tacaswell

tacaswell Nov 3, 2016

Owner

huh, never used it like that.

@anntzer

anntzer Nov 3, 2016

Contributor

raise ValueError is pretty common? (if you're too lazy (hem hem) to write a proper error message)

tacaswell added this to the 2.1 (next point release) milestone Nov 3, 2016

+ """
+ (property) :attr:`xmin` is the left edge of the bounding box.
+ """
+ return np.min(self.get_points()[:, 0])
@tacaswell

tacaswell Nov 3, 2016

Owner

why np.min over min?

@anntzer

anntzer Nov 3, 2016

Contributor

So that nan passes through -- which is probably the right thing to do? (min(np.nan, 1) is 1, np.min(np.nan, 1) is np.nan.) (and in fact needed for the various new implementations of the contains/overlaps methods)

Member

story645 commented Nov 3, 2016 edited

I know you're not touching this, but is all the if DEBUG code in there necessary? (And does it ever get run considering DEBUG=false at the top of that code...)

Owner

tacaswell commented Nov 3, 2016

I am a tad concerned about performance regressions on this.

attn @mdboom as you last touched / wrote this code in 2008.

Contributor

anntzer commented Nov 3, 2016

@story645 There's a bunch of if DEBUG a bit everywhere in the codebase, if you want to clean them all up go ahead :-)

Member

story645 commented Nov 3, 2016

git grep says the problem isn't so bad...bunch of .cpp and backend files 😅, but otherwise it's offsetbox, texmanger and transforms.

- return self.get_points()[:, 1]
- intervaly = property(_get_intervaly, None, None, """
+ @property
+ def intervaly(self):
@NelleV

NelleV Nov 5, 2016

Contributor

Doesn't that make this in the public API?
Can we keep this private?

@NelleV

NelleV Nov 5, 2016

Contributor

Never mind… Sorry about the noise.

@anntzer

anntzer Nov 5, 2016

Contributor

You bet I would be extra careful not to add any public API :-)
intervaly was a public property; it is still a public property when set like this. (And in both cases the function that implements the getter is available as Bbox.invervaly.fget; it used to also be available as Bbox._get_intervaly but this PR removes that.)

@NelleV

Thanks @anntzer for the clean up.
This looks good to me.

lib/matplotlib/transforms.py
- return ((x0 < x1
- and (x >= x0 and x <= x1))
- or (x >= x1 and x <= x0))
+ return self.xmin <= x <= self.xmax
@NelleV

NelleV Nov 5, 2016

Contributor

That's so much better…

NelleV changed the title from Cleanup transforms.py. to [MRG+1] Cleanup transforms.py. Nov 5, 2016

lib/matplotlib/transforms.py
@@ -955,62 +897,62 @@ def update_from_data_xy(self, xy, ignore=None, updatex=True, updatey=True):
def _set_x0(self, val):
self._points[0, 0] = val
self.invalidate()
- x0 = property(BboxBase._get_x0, _set_x0)
+ x0 = property(BboxBase.x0.fget, _set_x0)
@QuLogic

QuLogic Nov 5, 2016

Member

You could write this in the decorator form as:

@BboxBase.x0.setter
def x0(self, val):
    self._points[0, 0] = val
    self.invalidate()

or at least, it seems to work for me...

@anntzer

anntzer Nov 5, 2016

Contributor

No, because this would update the setter of BboxBase too. I could write

@partial(property, BboxBase.x0.fget)
def x0(self, val): ...

but that hardly seems like an improvement in readability...

@QuLogic

QuLogic Nov 5, 2016

Member

Ah, never mind then...

@Kojoley

Kojoley Nov 8, 2016 edited

Member

It does not rewrite the setter of base class for me (on both 2.7 and 3.5).

class A(object):
    def __init__(self):
         self._z = None

    @property
    def z(self):
         return self._z

    @z.setter
    def z(self, val):
         print('A')
         self._z = val

class B(A):
    @A.z.setter
    def z(self, val):
         print('B')
         self._z = val

a = A()
a.z = 123
print(a.z)

b = B()
b.z = 456
print(b.z)


class C(object):
    def __init__(self):
         self._z = None

    @property
    def z(self):
         return self._z

class D(A):
    @C.z.setter
    def z(self, val):
         print('D')
         self._z = val

try:
    c = C()
    c.z = 123
    print('Error')
except AttributeError:
    print('Pass')

d = D()
d.z = 456
print(d.z)
A
123
B
456
Pass
D
456
@anntzer

anntzer Nov 8, 2016

Contributor

Oh, indeed prop.setter returns a new property, leaving the previous one untouched. The docs actually explicitly say that:

A property object has getter, setter, and deleter methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function.

(emphasis mine)

One new thing I learned today... and I'll update the PR accordngly.

Contributor

anntzer commented Nov 5, 2016

@tacaswell I think your concerns about performance are very valid. For example

%%timeit np.random.seed(0); N = 100; bbs = [Bbox(t) for t in np.random.rand(N, 2, 2)]; xs = np.random.rand(N)
[b.containsx(x) for b, x in zip(bbs, xs)]

is three times slower.
I'll see what I can do.

QuLogic changed the title from [MRG+1] Cleanup transforms.py. to Cleanup transforms.py. Nov 5, 2016

Contributor

anntzer commented Nov 5, 2016

I tried adding a _normalized_extents property that would compute all of xmin, xmax, ymin, ymax at the same time (this seemed to be the main performance loss) so that they could be used later, but it seems that that's not enough... basically it seems that something like (xmin, xmax), (ymin, ymax) = np.sort(self.get_points(), axis=1) will never be as fast as the old implementation.

Now the question is, does this actually matter? It would be nice to have some actual performance tests (on "real" plots) to check that.

Member

Kojoley commented Nov 8, 2016

@anntzer If I understand you correctly you are talking about fast onepass minmax algorithm. While it makes 25% less comparisons (1.5*n+1 vs 2*n) you will see no difference on CPython because of interpreter overhead. After all len(self.get_points()) is always 2, so there is no point of using it.

Contributor

anntzer commented Nov 8, 2016

The point is that

x1, x2 = ...
return x1 < x < x2 if x1 < x2 else x2 < x < x1

will be faster than anything that goes through a function (/ property) call.

Member

Kojoley commented Nov 8, 2016

Ah, so you worry about added function call/property access overhead?

class Base(object):
    intervalx = 123, 456


class A(Base):
    def fully_containsx(self, x):
        x0, x1 = self.intervalx
        return ((x0 < x1
                 and (x > x0 and x < x1))
                or (x > x1 and x < x0))


class B(Base):
    @property
    def xmin(self):
        return min(self.intervalx)

    @property
    def xmax(self):
        return max(self.intervalx)

    def fully_containsx(self, x):
        return self.xmin < x < self.xmax
>python -m timeit "from prop import A; a = A();" "for _ in range(10000): a.fully_containsx(321)"
100 loops, best of 3: 2.77 msec per loop

>python -m timeit "from prop import B; b = B();" "for _ in range(10000): b.fully_containsx(321)"
100 loops, best of 3: 7.96 msec per loop
Contributor

anntzer commented Nov 8, 2016

That's basically the difference I measured, yes.

Current coverage is 61.86% (diff: 90.62%)

Merging #7394 into master will decrease coverage by 4.56%

@@             master      #7394   diff @@
==========================================
  Files           109        173     +64   
  Lines         46780      56092   +9312   
  Methods           0          0           
  Messages          0          0           
  Branches          0          0           
==========================================
+ Hits          31075      34702   +3627   
- Misses        15705      21390   +5685   
  Partials          0          0           

Powered by Codecov. Last update dfd38f7...1859a22

@#{review_dismissed_event.actor.login} NelleV dismissed their review Dec 19, 2016

!!

Member

QuLogic commented Jan 19, 2017

Any updates? Needs a rebase.

Contributor

anntzer commented Jan 19, 2017

Given the reasonable performance concerns I don't think this can go forward unless we somehow add performance regression tests and show that there won't be a significant slowdown.

Contributor

NelleV commented Jan 19, 2017

I wouldn't add a performance regression tests, but I think it is reasonable to benchmark the two options implementations.

Contributor

anntzer commented Jan 19, 2017

But this is so much down the stack that it could literally affect anything, so I don't think a microbenchmark really makes sense. I have no idea, for example, whether it would make plotting a lot of artists much slower because it may be doing a bunch of bbox comparisons... or perhaps not.

@anntzer anntzer Cleanup transforms.py.
Simplify the implementation of various methods.  Note the switch to
`np.min` rather than `min`, which avoids silently dropping `nan`s.

Switch to modern property declarations.  Cleanup some docstrings.
c2aabe9
Contributor

anntzer commented Feb 21, 2017

@tacaswell I restored a fast(?) implementation of overlaps/fully_overlaps (and made it faster by not checking for nan anymore, relying on the fact that the comparison operators will return False in presence of nans...). Other than that I don't see any other possible performance issue right now.

@anntzer anntzer Restore fast(er) impl. of `overlaps`.
aa406bf
lib/matplotlib/transforms.py
- y1 = max(y1, np.max(ys))
-
- return Bbox.from_extents(x0, y0, x1, y1)
+ x0 = min(bbox.xmin for bbox in bboxes)
@tacaswell

tacaswell Feb 21, 2017

Owner

Can np.nan or np.inf come out of these?

@anntzer

anntzer Feb 21, 2017

Contributor

I guess I should use np.min everywhere (which passes nan through, whereas builtin min drops them). (At least this would be consistent with bbox.xmin.)

@anntzer

anntzer Feb 21, 2017

Contributor

Added a note to that effect.

@tacaswell

Looks good to me

@anntzer anntzer Don't drop nans in bbox computations.
297a753

@QuLogic QuLogic merged commit 2bce760 into matplotlib:master Feb 21, 2017

5 checks passed

codecov/patch 91.17% of diff hit (target 80%)
Details
codecov/project/library Absolute coverage decreased by -0.08% but relative coverage increased by +28.78% compared to cf0d260
Details
codecov/project/tests 98.86% (target 97.7%)
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

anntzer deleted the anntzer:transforms-cleanup branch Feb 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment