Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor learning module #1873

Merged
merged 30 commits into from Jul 11, 2023
Merged

Conversation

luisfpereira
Copy link
Collaborator

@luisfpereira luisfpereira commented Jun 29, 2023

This PR refactors learning in order to make an estimator consistently use space (instead of space and/or metric).

1

Additionally, some estimators or optimizers were "promoted" to their own class. In particular:

  • in FrechetMean

    • optimizers: BaseGradientDescent, GradientDescent, BatchGradientDescent, and AdaptiveGradientDescent

    • algorithms: FrechetMean, CircleMean, ElasticMean, LinearMean

  • in GeodesicRegression:

    • optimizers: RiemannianGradientDescent
  • in ExponentialBarycenter

    • optimizers: GradientDescent

Notice these classes are not created from scratch, but from already-existing code.

Regarding the optimizers, the main goal is to make it easier to control hyperparameters (following what has been done in geomstats.numerics.optimizers), as well as to try to find common abstractions we may be missing out (for now there's nothing very obvious, but is nice to see adaptations of gradient descent appearing "everywhere" -> maybe later we can merge them).

Regarding the estimators, it makes it easier to identity what's the algorithm we are actually using. Notice the API is not impacted, as e.g. to use CircleMean, it suffices to pass a circle to FrechetMean.

I've added the method set to set parameters of e.g. an optimizer used within an estimator. This serves two purposes:

  1. it simplifies the __init__ method by not allowing such parameters to be defined from there
  2. it solves the issue of having parameters that are ignored: e.g. GeodesicRegression using method="extrinsic", requires different parameters than using method="intrinsic". set verifies if the parameter we are trying to set actually exists, so it can prevent bugs.

I've made set return self such that the following snippet of code is valid (this is very common in OOP, but not seem very often in python -> let me know how natural it feels to you):

estimator = GeodesicRegression(space, method="riemannian").set(max_iter=20)

2

I've added a note with "required metric methods" to the docstrings of most of the estimators, such that we can understand the minimum we need to define in order to run them.

3

In TangentPCA and ToTangentPCA I use the notion of _geometry in order to also be able to use the space to compute the exp: if a space is equipped, geometry=space.metric, otherwise geometry=space, followed by geometry.metric.

A lot of other little things where updated, but mostly for consistency among estimators.

There's still work that needs to be done in this part, especially related with inheritance.

EDIT: tests are successful now!

closes #715

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@luisfpereira luisfpereira marked this pull request as ready for review July 10, 2023 16:22
@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Merging #1873 (f6b96ff) into master (4f828b6) will decrease coverage by 3.01%.
The diff coverage is 91.03%.

@@            Coverage Diff             @@
##           master    #1873      +/-   ##
==========================================
- Coverage   91.44%   88.43%   -3.01%     
==========================================
  Files         141      141              
  Lines       13450    13570     +120     
==========================================
- Hits        12298    11999     -299     
- Misses       1152     1571     +419     
Flag Coverage Δ
autograd 87.66% <91.03%> (?)
numpy ?
pytorch 82.35% <69.03%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
geomstats/learning/aac.py 45.60% <ø> (-52.80%) ⬇️
geomstats/learning/expectation_maximization.py 81.12% <80.15%> (+1.23%) ⬆️
geomstats/learning/preprocessing.py 86.54% <81.49%> (+3.78%) ⬆️
geomstats/learning/geodesic_regression.py 83.61% <86.96%> (+3.74%) ⬆️
geomstats/learning/mdm.py 97.73% <92.31%> (-2.27%) ⬇️
geomstats/learning/riemannian_mean_shift.py 94.03% <92.60%> (+0.28%) ⬆️
geomstats/learning/kalman_filter.py 95.35% <93.88%> (-0.51%) ⬇️
geomstats/learning/wrapped_gaussian_process.py 97.78% <94.74%> (+1.49%) ⬆️
geomstats/learning/frechet_mean.py 95.74% <95.08%> (-0.28%) ⬇️
geomstats/learning/kmeans.py 89.42% <96.43%> (+3.70%) ⬆️
... and 10 more

... and 26 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@ninamiolane ninamiolane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much cleaner and interpretable, thank you so much!!

Two remarks:

  • the optimizers (e.g. BaseGradientDescent) are currently in the files where they're used (e.g. in frechet_mean): since the logic of optimization is now abstracted, would it make sense to have a separate folder, in the spirit of what you did for the numerics folders?
  • pymanopt provides optimizers: could we make our abstraction of optimizers compatible with theirs, so that we can use their optimizers on demand?

These two remarks are not necessarily for this PR, we can talk about them in another one, if they make sense.

Thanks again!!

@luisfpereira
Copy link
Collaborator Author

@ninamiolane, the optimizers are still particular for the cases we they're defined (that's why I keep them there). The goal is the medium term would be to see if we can get a common abstraction and then move them to e.g. numerics.

Having a seamlessly integration with pymanopt would be great. I think we can attempt this soon.

@luisfpereira luisfpereira merged commit 78025da into geomstats:master Jul 11, 2023
12 of 15 checks passed
@luisfpereira luisfpereira deleted the learning-refactor branch July 11, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
learning refactoring code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(nkoep): Move this into the OnlineKMeans class.
2 participants