Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cythoned homsets #14214

Open
simon-king-jena opened this issue Mar 2, 2013 · 80 comments
Open

Cythoned homsets #14214

simon-king-jena opened this issue Mar 2, 2013 · 80 comments

Comments

@simon-king-jena
Copy link
Member

Homsets arise whenever a coercion map or a conversion map is involved. Hence, homsets are ubiquitous in Sage and should thus be as a fast as possible.

Therefore I suggest change sage.categories.homset into a cython module. A clean solution seems out of reach at the moment (see comments), and so I just provide a couple of speed-ups here.

Apply

  • trac_14214-cython_homset.patch
  • trac_14214-backup_cache.patch

Depends on #14279
Depends on #18758

CC: @nthiery

Component: performance

Keywords: Hom, cached method

Author: Simon King

Issue created by migration from https://trac.sagemath.org/ticket/14214

@simon-king-jena
Copy link
Member Author

comment:1

Ideally, sage.categories.homset.Homset should be a cdef class, with cdef attributes _domain, _codomain, __category.

Problem and potential long term solutions

sage.modular.abvar.homspace.EndomorphimSubring inherits from sage.categories.homset.Homset and from sage.rings.ring.Ring. Making Homset cdef with cdef attributes results in incompatible base classes.

  • One could think of creating a new cdef class sage.categories.homset.EndomorphismRing inheriting from sage.rings.ring.Ring and copying the methods implemented in sage.categories.homset.Homset, and use it in sage.modular.abvar.homspace.
    • However, that's code duplication, and I doubt that we want that.
  • One could think of dropping sage.rings.ring.Ring and let EndomorphismSubring entirely rely on the ring functionality that is provided by the category framework.
    • I tried, but this means opening a can of worms. One would then also need to implement the new coercion model in sage.modular. But replacing EndomorphismSubring.__call__ by an _element_constructor_ does not work, because Homset has a __call__ method, too. Hence, we would then also need to use the new coercion model for sage.categories.homset. Yes, that's desirable. But it is not available in short time.
  • One could try to make some important basic methods of Homset fast, by inheritance from WithEqualityById that I introduced in Cythoned UniqueRepresentation #14054. It would be desirable that two homsets are equal if and only if they are identical if and only if both domain, codomain and category are identical.
    • Unfortunately, the old parts of sage (about elliptic curves and stuff) rely on the assumption that we do not have unique parents and do not have unique homsets. This would be another can of worms.

Short term improvements suggested in my patch

  • Changing sage.categories.homset into a Cython module means: We can cdef the homset cache, and can thus use fast get/set methods for TripleDict. That's a speed-up.
  • One can make Homset cdef, but keep the _domain and _codomain attributes Python attributes. Then, there are no layout conflicts.
  • I suggest to change __category into a single underscore attribute, to avoid name mangling.
  • Accessing a Python attribute _domain defined for a Homset can be surprisingly slow. It is fast, if one has an instance of Homset itself. But it is slow if one considers an instance of a sub-class, such as EndomorphismSubring. It turns out that calling a cached method is faster than accessing the attribute! Hence, I suggest using cached method decorators on the domain() and codomain() methods (these methods are internally used in EndomorphismSubring anyway).
  • As a step towards uniqueness, I suggest that Homset.__cmp__ tests by identity rather than equality, and uses cmp only on non-identical domain/codomain/category.

Making Homset cdef requires to put Set_generic into sage.structure.parent.pxd.

Timings will be in the next post.

@simon-king-jena
Copy link
Member Author

comment:2

Timings with the patch:

sage: E = J0(11)
sage: F = J0(22)
sage: H = Hom(ZZ,QQ)
sage: HE = Hom(E,E)
sage: HF = Hom(E,F)
sage: D = {H:1,HE:2,HE:3}

sage: timeit("Hom(E,E)", number = 10^4)
10000 loops, best of 3: 9.25 µs per loop
sage: timeit("Hom(E,F)", number = 10^4)
10000 loops, best of 3: 122 µs per loop
sage: timeit("Hom(ZZ,QQ)", number = 10^4)
10000 loops, best of 3: 12 µs per loop

sage: timeit("hash(HE)", number = 10^6)
1000000 loops, best of 3: 2.41 µs per loop
sage: timeit("hash(H)", number = 10^6)
1000000 loops, best of 3: 1.46 µs per loop
sage: timeit("hash(HF)", number = 10^6)
1000000 loops, best of 3: 1.49 µs per loop

sage: timeit("H in D", number = 10^6)
1000000 loops, best of 3: 1.48 µs per loop
sage: timeit("HE in D", number = 10^6)
1000000 loops, best of 3: 2.38 µs per loop
sage: timeit("HF in D", number = 10^6)
1000000 loops, best of 3: 1.48 µs per loop

sage: timeit("HF.domain()", number = 10^6)
1000000 loops, best of 3: 392 ns per loop
sage: timeit("HE.domain()", number = 10^6)
1000000 loops, best of 3: 415 ns per loop
sage: timeit("H.domain()", number = 10^6)
1000000 loops, best of 3: 356 ns per loop
sage: timeit("HF._domain", number = 10^6)
1000000 loops, best of 3: 524 ns per loop
sage: timeit("HE._domain", number = 10^6)
1000000 loops, best of 3: 881 ns per loop
sage: timeit("H._domain", number = 10^6)
1000000 loops, best of 3: 502 ns per loop

sage: %timeit TestSuite(H).run()
100 loops, best of 3: 7.01 ms per loop
sage: %timeit TestSuite(HE).run(skip=['_test_elements'])
10 loops, best of 3: 38.8 ms per loop
sage: %timeit TestSuite(HF).run(skip=['_test_elements'])
10 loops, best of 3: 91.3 ms per loop

Timings without the patch:

sage: timeit("hash(HE)", number = 10^6)
1000000 loops, best of 3: 3.97 µs per loop
sage: timeit("hash(H)", number = 10^6)
1000000 loops, best of 3: 2.74 µs per loop
sage: timeit("hash(HF)", number = 10^6)
1000000 loops, best of 3: 2.7 µs per loop

sage: timeit("Hom(E,E)", number = 10^4)
10000 loops, best of 3: 12.9 µs per loop
sage: timeit("Hom(E,F)", number = 10^4)
10000 loops, best of 3: 132 µs per loop
sage: timeit("Hom(ZZ,QQ)", number = 10^4)
10000 loops, best of 3: 21.6 µs per loop

sage: timeit("H in D", number = 10^6)
1000000 loops, best of 3: 2.77 µs per loop
sage: timeit("HE in D", number = 10^6)
1000000 loops, best of 3: 4 µs per loop
sage: timeit("HF in D", number = 10^6)
1000000 loops, best of 3: 2.81 µs per loop

sage: timeit("HF.domain()", number = 10^6)
1000000 loops, best of 3: 1.13 µs per loop
sage: timeit("HE.domain()", number = 10^6)
1000000 loops, best of 3: 1.74 µs per loop
sage: timeit("H.domain()", number = 10^6)
1000000 loops, best of 3: 1.18 µs per loop
sage: timeit("HF._domain", number = 10^6)
1000000 loops, best of 3: 520 ns per loop
sage: timeit("HE._domain", number = 10^6)
1000000 loops, best of 3: 933 ns per loop
sage: timeit("H._domain", number = 10^6)
1000000 loops, best of 3: 522 ns per loop

sage: %timeit TestSuite(H).run()
100 loops, best of 3: 7.79 ms per loop
sage: %timeit TestSuite(HE).run(skip=['_test_elements'])
10 loops, best of 3: 42.3 ms per loop
sage: %timeit TestSuite(HF).run(skip=['_test_elements'])
1 loops, best of 3: 95.7 ms per loop

Conclusion

  • There is a clear improvement in all timings.

Some details I am not yet totally happy with (todo on a different ticket?):

  • Getting a homset out of the cache could still be faster. The time-limiting factor is that one needs to find an appropriate category, if it is not explicitly passed as an argument. But since the cache is weak anyway, couldn't we simply cache the same homset twice, namely as Hom(X,Y) and as Hom(X,Y,C)? This would give a considerable speed-up.
  • The hash is much too slow. Ideally, one would simply inherit from WithEqualityById, but that's non-trivial, as I have pointed out above.

Doctests pass for me. Needs review!

@simon-king-jena
Copy link
Member Author

comment:3

Hooray, the startup_time plugin reports:

+With 37% confidence, startup time decreased by at least 0.5%
+With 82% confidence, startup time decreased by at least 0.25%
+With 91% confidence, startup time decreased by at least 0.1%

I think it would only be significant if it was 95% confidence. Nevertheless, it is a decrease of the startup time!

@nthiery
Copy link
Contributor

nthiery commented Mar 3, 2013

comment:4

Replying to @simon-king-jena:

Some details I am not yet totally happy with (todo on a different ticket?):

  • Getting a homset out of the cache could still be faster. The time-limiting factor is that one needs to find an appropriate category, if it is not explicitly passed as an argument. But since the cache is weak anyway, couldn't we simply cache the same homset twice, namely as Hom(X,Y) and as Hom(X,Y,C)? This would give a considerable speed-up.

Yes, I don't why we should not cache Hom(X,Y). By the way, is there a
reason why Hom is not a (weak) cached function rather than handling
it's cache by hand?

Thanks for your work on this! Get in touch with me in case no one
volunteers shortly for the review.

Cheers,
Nicolas

@simon-king-jena
Copy link
Member Author

comment:5

Replying to @nthiery:

Yes, I don't why we should not cache Hom(X,Y).

OK, doing it soon.

In fact, what I do now: Try to implement the new coercion framework for all homsets (hence: Remove __call__, use element classes). I am already getting less than a couple of hundred errors...

By the way, is there a
reason why Hom is not a (weak) cached function rather than handling
it's cache by hand?

Yes, actually three reasons:

  • A weak cached function has a weak reference on the value, but not on the keys. But we need weak references on the keys, for otherwise a memory leak arises.
  • In the coercion framework, the domain of a map must be identical (and not just equal) to the parents of elements-to-be-coerced. Hence, if X' is equal to X, then we need that Hom(X,Y) is not identical with Hom(X',Y). But a weak cached function would compare by equality, and this would give rise to errors in those parts of Sage that still work with non-unique parents. TripleDict does compare the keys by identity.
  • We know that the keys are triples (domain, codomain, category). I think we have benchmarks showing that TripleDict is faster than a python dictionary.

@simon-king-jena
Copy link
Member Author

comment:6

The second patch is not tested yet. But it yields:

sage: sage: timeit("Hom(E,E)", number = 10^4)
10000 loops, best of 3: 1.9 µs per loop
sage: sage: timeit("Hom(E,F)", number = 10^4)
10000 loops, best of 3: 1.43 µs per loop
sage: sage: timeit("Hom(ZZ,QQ)", number = 10^4)
10000 loops, best of 3: 1.52 µs per loop

Apply trac_14214-cython_homset.patch trac_14214-backup_cache.patch

@simon-king-jena
Copy link
Member Author

comment:7

The patchbot finds one failing test, and this is actually a test that relies on non-unique parent behaviour:

        Coercing a vector space morphism into the parent of a second vector
        space morphism will unify their parents. ::

            sage: U = QQ^3
            sage: V = QQ^4
            sage: W = QQ^3
            sage: X = QQ^4
            sage: H = Hom(U, V)
            sage: K = Hom(W, X)

            sage: A = matrix(QQ, 3, 4, [0]*12)
            sage: f = H(A)
            sage: B = matrix(QQ, 3, 4, range(12))
            sage: g = K(B)
            sage: f.parent() is g.parent()
            False

            sage: h = H(g)
            sage: f.parent() is h.parent()
            True

But in this example, at least with my patch, we have U is W and H is K. So, I think this test can be removed. Will do so in a minute.

@simon-king-jena
Copy link
Member Author

comment:8

Done!

Apply trac_14214-cython_homset.patch trac_14214-backup_cache.patch

@nbruin
Copy link
Contributor

nbruin commented Mar 3, 2013

comment:9

Replying to @simon-king-jena:

... those parts of Sage that still work with non-unique parents.

What follows is a bit of a rant that probably doesn't immediately affect your patch here. However, it's something I've seen crop up around the coercion/category framework in the last couple of weeks a little and I think has some relevance for work in this direction in general:

One design consideration I haven't seen mentioned much is that some parent constructors have very good reasons to return a fresh copy every time one is requested (i.e., don't do the cached_function metaclass thing that UniqueRepresentation does). The reason is that some parents are complicated structures (they tend to model complicated mathematical objects) that have too many aspects about them for them to be implemented as immutable. An example is EllipticCurve, where a Mordell-Weil basis is definitely not part of the construction data, but the basis choice (if found at all!) can be important and dealing with that unexpectedly changing is just too painful.

One could design these things differently, e.g., don't make the Mordell-Weil basis part of the elliptic curve itself, but the reality is that design choice hasn't been made in sage and changing that design now will be very difficult.

The problem arises elsewhere too: Sometimes you DO want multiple isomorphic-but-not-identical copies of objects. It's convenient to have multiple bivariate polynomial rings over QQ, for instance. We have that in sage by making generator names part of the definition of polynomial rings.

For other objects, for instance elliptic curves, there are no convenient generator names to distinguish multiple copies (the fundamental constructor being EllipticCurve([a1,a2,a3,a4,a6])), but there is still a reasonable need for having multiple nonidentical copies (never mind the practical mutability properties).

It terms of the coercion framework I have hopes these are not too much of an issue. I think elliptic curves are sufficiently high up the chain of complexity that people are happy to apply explicit homomorphisms to move things around, and hopefully this is true for most objects (the elephants in the room here are number fields: they are expected to behave nicely wrt coercion and yet are sufficiently complex that they have data hanging off them that makes it hard to consider them immutable. A lot of the problems may be hidden because it's relatively hard to run into exactly the same representation of the field twice)

The consistent thing for such non-unique-by-necessity parents is to make "==" indeed just "is", so that uniqueness is simply forced. This means

EllipticCurve([1,2,3,4,5]) != EllipticCurve([1,2,3,4,5])

which is fine with me, since they are isomorphic but not uniquely so. However, I'm not sure how easy it will be to convince other people of such a strict interpretation of "==".

Of course such EqualityById but non-CachedRepresentative parents are useless as dictionary keys in most situations in the sense that D[p]=a can always be written as p.D=a (the only way of getting a is by having a hold of p itself anyway and there is no way to regain a hold of p once you've lost it), but that might not be too much of an issue.

Anyway, please keep in mind: It's very unlikely (and probably even undesirable) that all parents will ever be essentially CachedRepresentative (or equivalent) and convenience requirements will likely prevent us from making all of them EqualityById. I suspect this puts bounds on how far you can push the coercion framework and various homomorphism caching strategies.

@simon-king-jena
Copy link
Member Author

comment:10

Replying to @nbruin:

Replying to @simon-king-jena:

... those parts of Sage that still work with non-unique parents.

What follows is a bit of a rant that probably doesn't immediately affect your patch here.

Since the patch doesn't touch the way how things are cached, it doesn't affect the patch. But it does affect what I try to do next, so, let's discuss it.

One design consideration I haven't seen mentioned much is that some parent constructors have very good reasons to return a fresh copy every time one is requested (i.e., don't do the cached_function metaclass thing that UniqueRepresentation does).
...

For other objects, for instance elliptic curves, there are no convenient generator names to distinguish multiple copies (the fundamental constructor being EllipticCurve([a1,a2,a3,a4,a6])), but there is still a reasonable need for having multiple nonidentical copies (never mind the practical mutability properties).

It terms of the coercion framework I have hopes these are not too much of an issue.

They are an issue, if the equality-by-id paradigma is used in some parts and isn't used in other parts of the coercion framework. They are not an issue, if the coercion framework consistently uses equality-by-id even on non-unique parents, as I am now explaining.

I am fine with parents P1, P2 that evaluate equal but are non-identical. But then, I think it makes sense that Hom(P1,X) and Hom(P2,X) are non-identical, too. The question is whether they should evaluate equal, and here I actually have no very strong opinion, except that I like fast hash and fast equality test...

Let f be a coercion map from P1 to X. Now, we consider an element p of P2. In elder versions of Sage, it used to be the case that f was stored in a usual dictionary as an attribute of X. "Usual dictionary" means: If you have P2 and inspect the cache for a coercion to X, you'll find f. But f.domain() is not the same as P2.

Being "not the same as" P2 means: We can't simply coerce p via f---we must first coerce p into P1 and then apply f. Namely, if P1==P2, we certainly expect that there is a coercion from P2 to P1. But it could be that, for example, a non-trivial base change is involved in this coercion.

This is why now the coercion maps are stored in a MonoDict, whose keys are compared by identity: Looking for a coercion from P2 to X, you will correctly find that it has not been found yet, and so you have the chance to detect the coercion from P2 to X via P1 with a non-trivial base change.

I think this is a sane approach and it works fine with non-unique parents.

This is why I think that non-unique parents are not an issue in the coercion framework---but it is essential to be consistent, i.e., inside of the coercion framework one should consistently use comparison by identity even for non-unique parents (as in the P1-versus-P2 example above).

The consistent thing for such non-unique-by-necessity parents is to make "==" indeed just "is", so that uniqueness is simply forced. This means

EllipticCurve([1,2,3,4,5]) != EllipticCurve([1,2,3,4,5])

which is fine with me, since they are isomorphic but not uniquely so. However, I'm not sure how easy it will be to convince other people of such a strict interpretation of "==".

I think it is enough to convince the coercion model...

Of course such EqualityById but non-CachedRepresentative parents are useless as dictionary keys

In fact, this would be a problem. We can not simply use CachedRepresentation here, because this uses comparison-by-== (and, as I tried to explain, in the coercion model we want comparison-by-id even for non-unique parents).

But couldn't we make Homset have the ClasscallMetaclass, inherit from EqualityById and use the current cache logic from Hom in its __classcall__ method? The advantage would be that in some parts of Sage Homsets are created without calling the Hom function. Putting the Hom-logic into a __classcall__, we would have Homsets cached in this case, too!

Anyway, please keep in mind: It's very unlikely (and probably even undesirable) that all parents will ever be essentially CachedRepresentative (or equivalent) and convenience requirements will likely prevent us from making all of them EqualityById.

Inside of the coercion framework, I think nothing prevents us from comparing non-unique parents by identity---except old code. In that way, we have freedom to choose non-trivial coercions between non-identical copies of a parent.

@simon-king-jena
Copy link
Member Author

comment:11

PS: What I currently try on top of this patch is merely to implement the new coercion framework (element_constructor and friends) for Homsets. Only if this is achieved, I would think of changing the cache logic.

@simon-king-jena
Copy link
Member Author

comment:12

One remark on the reliability of the startup_time plugin:

With the previous version of the patches, we had:

-With 25% confidence, startup time increased by at least 0.25%
-With 44% confidence, startup time increased by at least 0.1%
+With 82% confidence, startup time decreased by at least 0.5%
+With 97% confidence, startup time decreased by at least 0.25%
+With 99.3% confidence, startup time decreased by at least 0.1%

Hence, we had a significant speed-up of 0.25% of startup time.

With the current version of the patches (which only differ in the removal of one doc test, but there is no change in the code!!), we have

-With 25% confidence, startup time increased by at least 0.25%
-With 44% confidence, startup time increased by at least 0.1%
+With 85% confidence, startup time increased by at least 0.5%
+With 98% confidence, startup time increased by at least 0.25%
+With 99.4% confidence, startup time increased by at least 0.1%

Hence, we have a significant slow-down of 0.25% of startup time.

But since the code didn't change, perhaps it isn't significant, after all?

@nbruin
Copy link
Contributor

nbruin commented Mar 4, 2013

comment:13

Replying to @simon-king-jena:

I think it is enough to convince the coercion model...

Yes, I agree that the coercion model can work with "is" even if "==" doesn't. It may surprise people every now and again, but the alternative is almost certainly worse (besides being less efficient).

In fact, this would be a problem. We can not simply use CachedRepresentation here, because this uses comparison-by-== (and, as I tried to explain, in the coercion model we want comparison-by-id even for non-unique parents).

Good catch. However, the comparison-by-== happens on construction parameters. Hence, if someone implements some equal-but-not-identical rings R and S then we'll have

sage: R=my_equal_but_not_identical_ring()
sage: S=my_equal_but_not_identical_ring()
sage: P1=PolynomialRing(R,'x')
sage: P2=PolynomialRing(S,'x') #this will look up P1
sage: P1 is P2
True
sage: P2.base_ring() is R
True
sage: S(1)+P2.1
Error: No coercion from S into P2.

That's a serious problem. If you're implementing something that can be used in further (functorial) (cached) parent constructions that are expected to have coercions they have to be EqualityById. Otherwise you get the mess as above.

I doubt things like elliptic curves disappear into such cached parent constructions, which was why I thought the problem might not be that bad.

But couldn't we make Homset have the ClasscallMetaclass, inherit from EqualityById and use the current cache logic from Hom in its __classcall__ method?

Homsets are important in coercion and have proper mathematical meaning. If you make them consider their arguments by identity, you're basically telling people "you can use == as an equivalence relation on your class to some extent, but as far as maps are concerned, id is what really matters". I'm fine with that, but you are restricting people's ability to define and use == as they see fit.

Inside of the coercion framework, I think nothing prevents us from comparing non-unique parents by identity---except old code. In that way, we have freedom to choose non-trivial coercions between non-identical copies of a parent.

I think you're right on that in principle. There is a bit of an issue that "coercions between equal-but-not identical" parents are fundamentally between ""equivalent" objects, whereas the coercion framework seems to work best when there's a hierarchy. It's an invitation to loops in the coercion graph, inviting non-commuting paths. However, I think that's more an issue of the mathematical problem that of how we're modelling it.

@simon-king-jena
Copy link
Member Author

comment:14

Needed to update both patches, because a patch in #14159 has changed.

Apply trac_14214-cython_homset.patch trac_14214-backup_cache.patch

@simon-king-jena
Copy link
Member Author

comment:15

I made some changes to the __cmp__ method of homsets. The cmp behaviour changed in a way that is a problem in my attempt to use the new coercion model for morphisms.

Hence, cmp should be changed, and this ticket needs work!

@simon-king-jena
Copy link
Member Author

Work Issues: Do not change cmp too much

@simon-king-jena
Copy link
Member Author

comment:16

Updated!

Apply trac_14214-cython_homset.patch trac_14214-backup_cache.patch

@simon-king-jena

This comment has been minimized.

@nbruin
Copy link
Contributor

nbruin commented Mar 14, 2013

comment:17

Do you have a reason to make domain and codomain cached? Their implementation already consists of just an attribute lookup. Are you getting a performance increase from it? There definitely are drawbacks:

  • the method gets wrapped, so there's extra junk sitting in memory for that
  • the cache takes up memory for an attribute that's already there anyway.

@simon-king-jena
Copy link
Member Author

comment:18

Replying to @nbruin:

Do you have a reason to make domain and codomain cached? Their implementation already consists of just an attribute lookup. Are you getting a performance increase from it?

Yes. See comment:2. A cached method is much much faster than a Python method that does nothing but return an attribute, and it is a little faster than directly accessing the attribute in a "timeit".

Things are slightly different if you access an attribute inside of a method. At some point, I had added a test method that accesses self._domain a million times or calls self.domain() a million times. In the case of HF in comment:2, the latter was faster, but in other cases (it seems: In cases with a short mro), direct attribute access was faster.

Anyway, that's why I like to keep the cached method and self._domain and self._codomain around.

Note, however, that one could get rid of self._domain and self._codomain. Namely, instead of

    self._domain = X

one could do

    self.domain.set_cache(X)

in Homset.__init__, and change the cached method into

@cached_method
def domain(self):
    pass

Note that keeping the _domain and _codomain attributes is actually a cheat:

  • The purpose of this ticket is to make Homset a cdef class.
  • I would like to make _codomain and _domain cdef attributes, because this would be faster than a cached method. But this would result in a layout conflict, as stated in comment:1
  • If you try to directly call Homset to create an instance, you would get an error, because you can't easily assign attributes to a cdef class. Hence, I work with the implicit assumption that all homsets are actually instances of non-cdef classes inheriting from Homset.

Do you see a way to solve this problem in a clean way (i.e., with cdef attributes)? I'd appreciate to know!

@nbruin
Copy link
Contributor

nbruin commented May 23, 2013

comment:54

Ping!. The present patches here pass doctests on the bot!

I think conceptually what you want in this case is really just a nondata descriptor that acts as a backstopper to provide (essentially) a read-only attribute stored in __cached_methods if no instance dictionary is available (or more accurately: if it isn't used) The code at [comment:40] provides that.

cdef class CachedAttributeType(object):
    cdef public str name
    def __init__(self, name):
        self.name = name
    def __get__(self,instance,cls):
        cdef dict CM
        try:
            CM = getattr(instance,"__cached_methods")
            return CM[self.name]
        except TypeError, KeyError:
            raise AttributeError("%s instance has no attribute '%s'"%(cls,self.name))

Since the __get__ here is very close the one of the faster code paths in a properly cythonized lazy_attribute (see #14615), defining the backed attribute either way:

  _domain=CachedAttributeType("_domain")
  
  @lazy_attribute
  def _codomain(self);
      raise AttributeError("You should have set me already!")

is going to give the same performance. You may be able to shave off a few nanoseconds by replacing some of the auto-generated cython code with explicit Python C-API calls. I suspect we'll have to live with many parents not having a __dict__ for the foreseeable future.

(note that any slowdowns due to deeper MRO will go away once the cython upgrade is in).

@simon-king-jena
Copy link
Member Author

comment:55

Replying to @nbruin:

Ping!. The present patches here pass doctests on the bot!

I think conceptually what you want in this case is really just a nondata descriptor that acts as a backstopper to provide (essentially) a read-only attribute stored in __cached_methods if no instance dictionary is available (or more accurately: if it isn't used) The code at [comment:40] provides that.

Agreed.

The first question is: Where shall we put the code?

cdef class CachedAttributeType(object):
    cdef public str name
    def __init__(self, name):
        self.name = name
    def __get__(self,instance,cls):
        cdef dict CM
        try:
            CM = getattr(instance,"__cached_methods")
            return CM[self.name]
        except TypeError, KeyError:
            raise AttributeError("%s instance has no attribute '%s'"%(cls,self.name))

I guess you either mean getattr(instance, "__cached_methods", None) or except (AttributeError,KeyError): (in any case, it should be with brackets, otherwise it will not catch the KeyError).

And I guess you could get it still a bit faster, if you do def __get__(self, Parent instance, cls), because our homsets are parents, and because the cdef public attribute __cached_methods could be accessed more quickly if the instance is known to be a Parent.

Since the __get__ here is very close the one of the faster code paths in a properly cythonized lazy_attribute (see #14615), defining the backed attribute either way:

  _domain=CachedAttributeType("_domain")
  
  @lazy_attribute
  def _codomain(self);
      raise AttributeError("You should have set me already!")

is going to give the same performance.

And that's the second question: Are you really sure that lazy_attribute will be as fast as CachedAttributeType in the case of no __dict__?

Note that I tested to cythonize lazy_attribute and to cimport Parent. It failed (circular import or so). So, we come back to the first question: Where shall we put the code? I think it makes sense to cdefine instance being Parent, but this would be impossible in sage/misc/lazy_attribute.pyx.

And the third question is: Shall we really make this ticket depend on a proper cythonized version of lazy_attribute? Note that currently the lazy attribute or cached attribute would never come into play, because all our homsets are Python subclasses of the (now cythonized) Homset.

@nbruin
Copy link
Contributor

nbruin commented May 23, 2013

comment:56

Replying to @simon-king-jena:

And I guess you could get it still a bit faster, if you do def __get__(self, Parent instance, cls), because our homsets are parents, and because the cdef public attribute __cached_methods could be accessed more quickly if the instance is known to be a Parent.

Excellent idea! if __cached_method isn't in a slot then we wouldn't want to use it anyway. It does mean that we lose flexibility: If for some reason, we have a class that can't inherit directly from Parent but does want to cooperate in this protocol, it won't be able to.

And that's the second question: Are you really sure that lazy_attribute will be as fast as CachedAttributeType in the case of no __dict__?

My timings suggested this. However, I suspect that providing cdef access to __cached_method will be a (small) measurable gain, so with that specialization you should be faster.

Note that I tested to cythonize lazy_attribute and to cimport Parent. It failed (circular import or so).

That can't be lazy_attribute, or at least not my version. It doesn't have any imports!

So, we come back to the first question: Where shall we put the code? I think it makes sense to cdefine instance being Parent, but this would be impossible in sage/misc/lazy_attribute.pyx.

If it's a parent-specific support class it can just go with the parent source, right?

And the third question is: Shall we really make this ticket depend on a proper cythonized version of lazy_attribute?

Not if you go the Parent-specific CachedAttribute route. That would be a bad name, by the way. Perhaps ParentAttributeBacker is better.

all our homsets are Python subclasses of the (now cythonized) Homset.

Talk about academic code then ... I'm sure we'll get cythonized homset instances at some point.

@simon-king-jena
Copy link
Member Author

comment:57

Replying to @nbruin:

Note that I tested to cythonize lazy_attribute and to cimport Parent. It failed (circular import or so).

That can't be lazy_attribute, or at least not my version. It doesn't have any imports!

sage.structure.parent imports lazy_attribute, and (to my surprise, I must say) cimporting Parent in sage.misc.lazy_attribute did crash.

So, we come back to the first question: Where shall we put the code? I think it makes sense to cdefine instance being Parent, but this would be impossible in sage/misc/lazy_attribute.pyx.

If it's a parent-specific support class it can just go with the parent source, right?

Hey, that's a good idea!

Not if you go the Parent-specific CachedAttribute route. That would be a bad name, by the way. Perhaps ParentAttributeBacker is better.

ParentNonDataDescriptor?

@simon-king-jena
Copy link
Member Author

comment:58

I think I will be able to provide a third patch with the non data descriptor tonight.

@simon-king-jena
Copy link
Member Author

comment:59

Replying to @simon-king-jena:

I think I will be able to provide a third patch with the non data descriptor tonight.

That said: Could we perhaps make this depend on #11935?

@nbruin
Copy link
Contributor

nbruin commented May 23, 2013

comment:60

Replying to @simon-king-jena:

That said: Could we perhaps make this depend on #11935?

You're doing the work, so you decide. Looking at the tickets, it seems to me this one is less invasive and hence easier to get in: It's already passing doctests; the only thing left is some backstop code that currently won't be exercised anyway.

@simon-king-jena
Copy link
Member Author

comment:61

Replying to @nbruin:

Replying to @simon-king-jena:

That said: Could we perhaps make this depend on #11935?

You're doing the work, so you decide. Looking at the tickets, it seems to me this one is less invasive and hence easier to get in: It's already passing doctests; the only thing left is some backstop code that currently won't be exercised anyway.

Yes. And in addition, when I tried to apply a dependency of #11935, I got several errors. So, let's first fix this, and then care about #12876 and finally #11935.

@nthiery
Copy link
Contributor

nthiery commented May 23, 2013

comment:62

For the record: #12876 and #11935 are applying smoothly on 5.10 beta4 and are passing all tests.
And have been in needs review for one year and rebased a couple times. And much depend on them.

So I don't really care about the order, but it would be nice to have them in shortly. And I am bit lazy to rebase them once again.

@simon-king-jena
Copy link
Member Author

comment:63

Replying to @nthiery:

For the record: #12876 and #11935 are applying smoothly on 5.10 beta4 and are passing all tests.

OK, if that's the case then these go in first, because

  1. they add features, while this is just about potential speed gain in future
  2. they are more likely to break if we wait any longer.

@nthiery
Copy link
Contributor

nthiery commented May 23, 2013

comment:64

Replying to @simon-king-jena:

Replying to @nthiery:

For the record: #12876 and #11935 are applying smoothly on 5.10 beta4 and are passing all tests.

OK, if that's the case then these go in first, because

  1. they add features, while this is just about potential speed gain in future
  2. they are more likely to break if we wait any longer.

Thanks Simon, I appreciate that; I know the work it costs you!

@nbruin
Copy link
Contributor

nbruin commented May 25, 2013

comment:65

Replying to @nbruin:

Replying to @simon-king-jena:

And I guess you could get it still a bit faster, if you do def __get__(self, Parent instance, cls), because our homsets are parents, and because the cdef public attribute __cached_methods could be accessed more quickly if the instance is known to be a Parent.

Excellent idea! if __cached_method isn't in a slot then we wouldn't want to use it anyway. It does mean that we lose flexibility: If for some reason, we have a class that can't inherit directly from Parent but does want to cooperate in this protocol, it won't be able to.

Note that _element_class on Parent is a @lazy_attribute. That code could benefit from a Parent instance too, so perhaps you DO want to use an (adapted form of) @lazy_attribute for both. In the case of _domain and _codomain, the wrapped function would be irrelevant.

@jdemeyer jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013
@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014
@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.2, sage-6.3 May 6, 2014
@sagetrac-vbraun-spam sagetrac-vbraun-spam mannequin modified the milestones: sage-6.3, sage-6.4 Aug 10, 2014
@simon-king-jena
Copy link
Member Author

comment:70

TODO: Provide a branch...

@simon-king-jena
Copy link
Member Author

comment:71

Replying to @simon-king-jena:

sage.modular.abvar.homspace.EndomorphimSubring inherits from sage.categories.homset.Homset and from sage.rings.ring.Ring. Making Homset cdef with cdef attributes results in incompatible base classes.

  • One could think of creating a new cdef class sage.categories.homset.EndomorphismRing inheriting from sage.rings.ring.Ring and copying the methods implemented in sage.categories.homset.Homset, and use it in sage.modular.abvar.homspace.
    • However, that's code duplication, and I doubt that we want that.
  • One could think of dropping sage.rings.ring.Ring and let EndomorphismSubring entirely rely on the ring functionality that is provided by the category framework.

Note that this should now be a lot easier, by #18756.

  • I tried, but this means opening a can of worms. One would then also need to implement the new coercion model in sage.modular. But replacing EndomorphismSubring.__call__ by an _element_constructor_ does not work, because Homset has a __call__ method, too. Hence, we would then also need to use the new coercion model for sage.categories.homset. Yes, that's desirable. But it is not available in short time.

See #14279, but both this ticket and #14279 need work.

Short term improvements suggested in my patch

  • Accessing a Python attribute _domain defined for a Homset can be surprisingly slow. It is fast, if one has an instance of Homset itself. But it is slow if one considers an instance of a sub-class, such as EndomorphismSubring. It turns out that calling a cached method is faster than accessing the attribute! Hence, I suggest using cached method decorators on the domain() and codomain() methods (these methods are internally used in EndomorphismSubring anyway).

If I recall correctly, that's not the case any longer. However, if we are able to avoid double inheritance from Homset and Ring, there is not obstruction for making _domain and _codomain cdef.

@simon-king-jena
Copy link
Member Author

Changed dependencies from #14159, #12951, #13184 to #14279

@simon-king-jena
Copy link
Member Author

Changed work issues from How to store domain and codomain? to none

@simon-king-jena
Copy link
Member Author

comment:73

Since the patches have bit-rotted for so long, I guess it makes sense to start from scratch. Most part of the discussion here was about different ways to store the _domain and _codomain. Since it should now be possible to have fast ring-behaviour defined via category framework, I suppose one can simply make them cdef attributes, which should be the fastest solution.

@simon-king-jena
Copy link
Member Author

Changed dependencies from #14279 to #14279 #18756

@simon-king-jena
Copy link
Member Author

Changed dependencies from #14279 #18756 to #14279 #18758

@simon-king-jena
Copy link
Member Author

comment:75

I confused two tickets. In fact, it is #18758 which makes it possible to define a ring structure via category framework without too much slow-down.

@mkoeppe mkoeppe removed this from the sage-6.4 milestone Dec 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants