Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make submersion and tangent_submersion abstract in LevelSet #1681

Merged
merged 13 commits into from Nov 17, 2022

Conversation

luisfpereira
Copy link
Collaborator

Partially answers #1680 for LevelSet.

Missing:

  • docstrings

Need to discuss:

  • should value somehow also be forced to be defined or passing as argument to parent works fine? (I think it should be OK as argument)
  • is the API naming OK? e.g. point and vector, point, respectively?

@ninamiolane @ymontmarin

@ymontmarin
Copy link
Collaborator

ymontmarin commented Oct 18, 2022

Looks promising !!!

  • I think embedding_space should follow the same logic, as you know the submersion and its tangent before any instantiation you know the space in which it submerse. Again, it can look strange that we can instantiate the space outside

maybe with something like

@abs.abstract
def define_embedding_space(self):
    return

@property
def embedding_space(self):
    if not hasattr(self, '_embedding_space'):
         self._embedding_space = self.define_embedding_space()
    return self._embedding_space

or alternatively:

self._embedding_space = self.define_embedding_space()

in __init__ but then we have to be sure that child class will use super().__init__()

  • Maybe tangent does not necessary needs to be abstract and we can implement a default autodiff implementation instead, for class with no close form of the tangent immersion

Let us discuss !

Copy link
Collaborator

@ninamiolane ninamiolane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I also like the fact that having them as abstract method enforces a consistent API of these functions.

geomstats/geometry/base.py Outdated Show resolved Hide resolved
@ninamiolane
Copy link
Collaborator

ninamiolane commented Oct 18, 2022

Re: "should value somehow also be forced to be defined or passing as argument to parent works fine? (I think it should be OK as argument)"

I think that value could be hardcoded to 0. because the submersion can always be redefined (by subtracting value) to have value zero.

Re: API
Looks good!

@ninamiolane
Copy link
Collaborator

ninamiolane commented Oct 18, 2022

Another inconsistency in LevelSet: it has default_coords_type being intrinsic, but it uses the shape of the embedding space. The docstring also mention that the shapes of points should be [..., dim] which seems to refer to intrinsic coordinates.

@ninamiolane
Copy link
Collaborator

Another comment: tangent_submersion should me a method but not be abstract but it does not have to be overwritten. If it is not provided, then we use gs.autodiff.jacobian(submersion). <--- that piece of code would go in the LevelSet method.

@ninamiolane
Copy link
Collaborator

Another question: we want to put define_embedding_space to prevent the users to change the embedding space once the object (say sphere) of a class (say Hypersphere) subclassing LevelSet has been instantiated.

But what prevents users to still do "sphere.embedding_space = new_space" again?

@ymontmarin
Copy link
Collaborator

Another inconsistency in LevelSet: it has default_coords_type being intrinsic, but it uses the shape of the embedding space. The docstring also mention that the shapes of points should be [..., dim] which seems to refer to intrinsic coordinates.

Maybe my question is stupid but: we agree we have intrinsic => shape = [..., dim], but what about the other way around ? I guess there is some case in Poincaré disk where extrinsinc coordinate are also shape = [..., dim] ? Should we do a check.

@ymontmarin
Copy link
Collaborator

Another comment: tangent_submersion should me a method but not be abstract but it does not have to be overwritten. If it is not provided, then we use gs.autodiff.jacobian(submersion). <--- that piece of code would go in the LevelSet method.

I think it is what is written in PullbackDiffeo

@ymontmarin
Copy link
Collaborator

Another question: we want to put define_embedding_space to prevent the users to change the embedding space once the object (say sphere) of a class (say Hypersphere) subclassing LevelSet has been instantiated.

But what prevents users to still do "sphere.embedding_space = new_space" again?

Because embedding_space will be a property without a setter hook, so it can't be set by the user

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@luisfpereira
Copy link
Collaborator Author

@ninamiolane @ymontmarin I think this may be ready. A new review from your side may be worthwhile, as several things have changed. I think it implements "our vision".

@codecov
Copy link

codecov bot commented Oct 27, 2022

Codecov Report

Merging #1681 (557f9af) into master (4b031ac) will decrease coverage by 11.39%.
The diff coverage is 91.16%.

❗ Current head 557f9af differs from pull request most recent head 035d053. Consider uploading reports for the commit 035d053 to get more accurate results

@@             Coverage Diff             @@
##           master    #1681       +/-   ##
===========================================
- Coverage   91.95%   80.56%   -11.38%     
===========================================
  Files         129      117       -12     
  Lines       13327    12604      -723     
===========================================
- Hits        12253    10153     -2100     
- Misses       1074     2451     +1377     
Flag Coverage Δ
autograd ?
numpy ?
pytorch 80.56% <91.16%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
geomstats/geometry/special_euclidean.py 88.47% <60.00%> (-0.13%) ⬇️
geomstats/geometry/discrete_curves.py 91.70% <87.50%> (-1.28%) ⬇️
geomstats/geometry/hyperboloid.py 84.49% <90.91%> (-0.06%) ⬇️
geomstats/geometry/special_orthogonal.py 97.52% <90.91%> (-0.41%) ⬇️
geomstats/geometry/base.py 91.93% <96.00%> (-0.52%) ⬇️
...omstats/geometry/full_rank_correlation_matrices.py 100.00% <100.00%> (ø)
geomstats/geometry/grassmannian.py 93.94% <100.00%> (+0.33%) ⬆️
geomstats/geometry/hypersphere.py 81.33% <100.00%> (-6.69%) ⬇️
geomstats/geometry/pre_shape.py 65.93% <100.00%> (-26.55%) ⬇️
geomstats/geometry/stiefel.py 76.67% <100.00%> (-2.99%) ⬇️
... and 59 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@ninamiolane ninamiolane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you! I've just put minor comments. Maybe double check the docstrings of the submersions so that they match the fact that the value is now 0 (I've put one comment to that effect in SE(n).

I'm still not a huge fan of the _define_embedding_space, because:

  • this would be the only method across all manifolds that is "talking to the user" (with the keyword "define") and this feels wrong. All the other methods are purely impersonal and geometry-facing.
  • if it's main role is to prevent the user from changing the embedding_space attribute, then this goes against our vision of "not preventing anything, but letting the user use the manifolds at their own risk". This would also be prevented by the new gold rule which is "never modify the attributes of a manifold, unless it's its numerics objects (ExpSolver)".

But I forget the details of our debate about this : happy to go with it if I'm forgetting important aspect of this choice.

geomstats/geometry/base.py Outdated Show resolved Hide resolved
geomstats/geometry/base.py Outdated Show resolved Hide resolved
geomstats/geometry/grassmannian.py Show resolved Hide resolved
geomstats/geometry/special_euclidean.py Outdated Show resolved Hide resolved
geomstats/geometry/special_euclidean.py Show resolved Hide resolved
@luisfpereira
Copy link
Collaborator Author

luisfpereira commented Nov 2, 2022

@ninamiolane, regarding _define_embedding_space I would say the main strength of this implementation is that it forces the contributor to define an embedding space (since we are asking _define_embedding_space as abstract method). In other words, the "contract" for creating a new child of LevelSet is very clear for the contributor.

I can only see an alternative to this, but without the strength we have above (i.e. a less knowledgeable contributor may not know he/she has to define embedding_space):

  • embedding space is define in __init__ as self.embedding_space = ...

The alternative of having a property (or function) simply called embedding_space does not work because it would instantiate the embedding space everytime we do space.embedding_space.

Do you have any alternatives in mind? Would we prefer the alternative I mentioned to the current implementation?

Edit: What I would probably like do to make the current implementation more coherent with the second point you make is to not have embedding_space as a property, but have it defined in LevelSet has self.embedding_space = self._define_embedding_space(). What do you think?

@ninamiolane
Copy link
Collaborator

I would indeed prefer the alternative of the attribute. To tell the less knowledgeable contributor he/she has to define embedding_space, we could have a clear error message: the first time self.embedding_space is needed, if it is not defined then the error tells the user to define it.

I know we talked a lot about this already. Maybe we could poll everybody at the next meeting to get a vote?

@ymontmarin
Copy link
Collaborator

ymontmarin commented Nov 3, 2022

From what i remember from the discussion the starting point was that in python methods implementation are defined when child class are written, meaning it is decided before any instantiation. Whereas argument of __init__ may not (it allows definition to be outside the object). In addition, having a method defining an attribute only once ensure it canot be modified in the same way method should not be modify.

As we discuss, it is more natural to have immersion (and its derivative) as a method, since it is a function anyway, and it is known when child class are written. Mathematically, the immersion is the tuple $(\phi, E)$ so it makes sense the the embedding space should be at the same level as the immersion, hence a method.

But if we step back, what we really want is that embedding_space implementation is written when a child clas is written, at the same moment as the immersion (it is weird to require that immersion should be implemented when child class is written and not requiring the definition of the space it goes in).
To be more precise, in the current implementation it is possible to have code such as:

class Child(LevelSet):
    def immersion(self, x):
          XXX

e = Euclidean(n)
a = Child(dim=m, embedding_space=e)

which is pretty weird to me, because immersion implementation is written inside the object, and the space it goes to is written outside.
The most natural way of ensuring this cannot happen, is to use method (or some python dark magic) and mark it as abstract.

But I agree with @ninamiolane that a method _define_embedding_space feel weird and different. But actually, I do not see the word define as a name talking to the coder, but as a description of what happen in the method, something like _instantiate_embedding_space is more accurate. Still, it is different from the other methods name that are mathematic. So lexically i totally agree that it does not feel right !

So for a better use experience, it could be a property ! But then the instantiation would occur every time the attribute is asked.

A python magic that can work is this:

class LevelSet(ABC):
    def __init__(self, dim, default_coords_type="intrinsic", **kwargs):
        super().__init__(dim=dim, default_coords_type=default_coords_type, **kwargs)

    def __getattribute__(self, name):
        if name == 'embedding_space':
            if not hasattr(self, '_embedding_space'):
                self._embedding_space = super().__getattribute__('embedding_space')
            return self._embedding_space
        return super().__getattribute__(name)

    @property
    @abstractmethod
    def embedding_space(self):
        pass

    @abstractmethod
    def immersion(self):
        pass

In this way, embedding_space is defined using a property so it is closer to mathematic naming, it is marked as abstract and should be written in the child class with immersion but we still have the lazy computation which call it only once and it does not impact the user experience.

We will have

class Hypersphere(LevelSet):
    def __init__(self, dim):
        super().__init__(dim=dim, default_coords_type='extrasinc', **kwargs)

    @property
    def embedding_space(self):
        return Euclidean(self.dim + 1)

    def immersion(self, x):
        return gs.norm(x) - 1

@luisfpereira
Copy link
Collaborator Author

I think @ymontmarin's Child example shows clearly why it may be inadequate to have the definition at __init__ level.

I fully disagree with the idea of instantiating it every time: it will negatively impact performance and will not allow to change any numerical parameter of algorithms associated with a given embedding space.

I don't like so much the tricks of checking if it hasattr (for me it looks like an unnecessary condition - the impact in performance is negligible, but there's still a dictionary key lookup every time we access embedding space). Besides, we know it will have, for sure, the attribute after instantiation. This is the reason why I started this PR by having embedding_space has a property and changed it (today) to simply have it has attribute (instantiated in the init of LevelSet) - also, we need to instantiate it in LevelSet anyway to have access to shape. This is strong due to our "new" way of thinking stating we allow mathematical attributes to change, but at user's risk.

The semantics of _define_embedding_space do not look so strange to me. It is not so far from _create_basis we already have in several other classes (the reason of being of that method is performance).

I agree with the poll @ninamiolane, but after our discussions, I kind of think the solution current implemented in the PR is the strongest technically speaking.

In order to move on, do you agree I merge this solution for now (after addressing @ninamiolane's comments) and then we update it based on our poll? (The implementation on this PR is by far much cleaner than the currently available in master.)

@ymontmarin
Copy link
Collaborator

ymontmarin commented Nov 3, 2022

@LPereira95 I also add some minor comments in the code.

About the main discussion:

  • under the hood, any attribute access is a key lookup, which is a constant time operation. Having 2 instead of one does not change performance
  • I also fully disagree with the idea of instantiating it every time, hence the classical trick of using xxx and _xxx
  • "also, we need to instantiate it in LevelSet anyway to have access to shape." I don't understand the difference you are talking about. If you use self.embbeding_space.shape with the property implementation, it will call _define_embedding_space so the instantiation also happen in __init__. In any case, none of the above solve the problem of circular need: _define_embedding_space may need attributes (e.g. dim) that are defined in __init__ but __init__ need embedding_space.shape that is define in _define_embedding_space
  • So maybe the semantic should simply be _create_embedding_space ?

I think the circular issue (and other comments) need to be addressed before merging. I only give suggestion for the love <3 of geomstats; let me know for curiosity, what people want :)

An idea for circular loop issue while following @LPereira95 implementation choice:

class LevelSet(ABC):
    def __init__(self, dim, shape=None, default_coords_type="extrinsic", **kwargs):
        self.embedding_space = self._create_embedding_space(dim, shape, default_coords_type, **kwargs)
        if shape is None:
            shape = self.embedding_space.shape
        super().__init__(dim=dim, shape=shape, default_coords_type=default_coords_type, **kwargs)

    @abstractmethod
    def _create_embedding_space(self, dim, shape, default_coords_type, **kwargs):
        pass

    @abstractmethod
    def immersion(self):
        pass

Such that implementation of Hypersphere looks like

class Hypersphere(LevelSet):

    def _create_embedding_space(self, dim, **kwargs):
        return Euclidean(dim + 1)

    def immersion(self, x):
        return gs.norm(x) - 1

@luisfpereira
Copy link
Collaborator Author

Thanks a lot @ymontmarin, we deeply appreciate your comments and suggestions. All this discussion is to try to come up with a solution that pleases everyone and is technically robust.

Thanks for the clarification on the constant time operation for key lookup (in fact I should have thought better before my comment before).

For the third point, I mean that since we definitely have to instantiate embedding_space in the init of LevelSet to access shape, it looks unnecessary to me to instantiate it "implicitly" by having a property verifying if _embedding_space already exist and instantiating it if not. The current solution is more explicit in my opinion.

Your point on circularity is in fact interesting, but I would argue there's not really a circularity. What's happening is that we normally need the arguments passed in child init to instantiate the embedding space. I would say it is a good practice to store init args as attributes (e.g. sklearn forces you to do that to allow a better __repr__), so this solution actually forces us to follow a good practice. Passing arguments to _define/create_embedding_space does not feel right to me. The case of the Hypersphere is a little particular in the sense that it asks for dim, which is also an attribute of Manifold. (But I can fully understand why you are raising this issue, and I'm completely open to find out a better solution)

Both _create and _define work for me.

@ymontmarin
Copy link
Collaborator

ymontmarin commented Nov 3, 2022

Ok I get the implicit reasoning for the instanciation, I agree with you !
I agree with the sklearn practice, it is also the practice i follow !

But in our case the argument are setted as attribute of the object during the call super().__init__(...) but this line is AFTER the line self.embedding_space = self._create_embedding_space(), which mean than when _create_embedding_space is called the arguments have not been setted as attribute yet ! So self has no dim attribute and _create_embedding_space can't access dim.

The natural solution (which is what happen in sklearn etc...) would be to have super().__init__(...) BEFORE self.embedding_space = self._create_embedding_space(), but in this situation, we also need to have self.embedding_space to access its shape as default shape BEFORE super().__init__(...). Hence the circularity.

Maybe I am wrong but for me this code will fail:

class LevelSet(Manifold):
    def __init__(self, dim, shape=None, default_coords_type="extrinsic", **kwargs):
        self.embedding_space = self._create_embedding_space()
        if shape is None:
            shape = self.embedding_space.shape
        super().__init__(dim=dim, shape=shape, default_coords_type=default_coords_type, **kwargs)

    @abstractmethod
    def _create_embedding_space(self):
        pass

    @abstractmethod
    def immersion(self, x):
        pass

class Hypersphere(LevelSet):

    def _create_embedding_space(self):
        return Euclidean(self.dim + 1)

    def immersion(self, x):
        return gs.norm(x) - 1

And maybe I did not totally understand your response

@ymontmarin
Copy link
Collaborator

@LPereira95 Nevermind, I dit not see you were resetting dim as attribute manually (it is set twice but I don't think it is a problem). So I think it is ok then !

@ymontmarin
Copy link
Collaborator

class LevelSet(Manifold):
    def __init__(self, dim, shape=None, default_coords_type="extrinsic", **kwargs):
        self.embedding_space = self._create_embedding_space()
        if shape is None:
            shape = self.embedding_space.shape
        super().__init__(dim=dim, shape=shape, default_coords_type=default_coords_type, **kwargs)

    @abstractmethod
    def _create_embedding_space(self):
        pass

    @abstractmethod
    def immersion(self, x):
        pass

class Hypersphere(LevelSet):
    def __init__(self, dim):
        self.dim = dim
        super().__init__(dim)

    def _create_embedding_space(self):
        return Euclidean(self.dim + 1)

    def immersion(self, x):
        return gs.norm(x) - 1

will indeed work and looks pretty clean to me :) Sorry for the misunderstanding !

@luisfpereira
Copy link
Collaborator Author

@ninamiolane I think I've addressed everything here. I'll just ensure tests are passing and then I'll merge.

@ymontmarin
Copy link
Collaborator

ymontmarin commented Nov 16, 2022

Hello @LPereira95

What about the shapes computation ?
image

@luisfpereira
Copy link
Collaborator Author

Hi @ymontmarin!

It looks like that comments are pending or I really missed them out...

@ymontmarin
Copy link
Collaborator

@LPereira95 What does the label "pending" mean ? I am not so used to review on github

@luisfpereira
Copy link
Collaborator Author

Your review has not been submitted @ymontmarin. Only you have access to it. There should be a button on top with "Submit review".

Copy link
Collaborator

@ymontmarin ymontmarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First time using the review tool !

tuple(-n + 1 + i for i in range(n - 1))
if gs.ndim(point) > len(self.shape)
else tuple(-n + i for i in range(n))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't axis = tuple(range(n - len(self.shape), n)) working ? I think you want the all to apply to the axis involved in the shape of point which are the latest. It may be shorter and even work with general batching

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ymontmarin!

This solution fails for:

space = Hyperboloid(3)

space.belongs(space.random_point())

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may work in all cases:

axis = tuple(range(n - len(self.shape), n)) if (n - len(self.shape)) >= 0 else ()

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not so sure this version increases readability (it may even make it harder to understand). Besides, we can easily achieve general batching by replacing 1 by gs.ndim(point) - len(self.shape) in the first range...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a simplified and more readable version may be:

n_batch = gs.ndim(point) - len(self.shape)
axis = tuple(range(-n + n_batch, 0))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it, none of the two solutions are very universal.

point is of shape [*batch_shape, *self.shape]
immersed_point is of shape [*batch_shape, *sub_shape]

If the subermesion only give a real sub_value_shape = () otherwise we want the all on the sub_shape axis.

So maybe a sorter, more readable and universal way that work is:

        belongs = self.embedding_space.belongs(point, atol)
        if not gs.any(belongs):
            return belongs

        submersed_point = self.submersion(point)
        constraint = gs.isclose(submersed_point, 0.0, atol=atol)

        sub_ndim = gs.ndim(submersed_point) - gs.ndim(point) + len(self.shape)
        if sub_ndim:
            constraint = gs.all(constraint, axis=tuple(range(-sub_ndim, 0)))

        return gs.logical_and(belongs, constraint)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LPereira95 My message was written before i saw your two messages ! I think it is almost the same :) (apart from the tuple construction here being inside the if)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we came up with similar solutions independently. These ones I think are pretty universal. I'll merge now.

if gs.ndim(base_point) > len(self.shape)
or gs.ndim(vector) > len(self.shape)
else tuple(-n + i for i in range(n))
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

geomstats/geometry/special_euclidean.py Outdated Show resolved Hide resolved
@ymontmarin
Copy link
Collaborator

@LPereira95 Ok !!! Thank you :) sorry I miss that, I left out the only two points (qhape stuff) that was not adress with the discussion and so on

@luisfpereira
Copy link
Collaborator Author

Test failures are unrelated... merging.

@luisfpereira luisfpereira merged commit 603c228 into geomstats:master Nov 17, 2022
@luisfpereira luisfpereira deleted the level_set branch November 22, 2022 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants