-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconstruction measures #9
Reconstruction measures #9
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a very quick look, the code looks fine but does need a lot more doc. Why did you include a separate version of the scalers instead of basing your work on top of #8?
Another question to address is how to deal with datasets for examples. I would like to make sure skcosmo is NOT coupled to librascal or ASE, since it should be usable and useful for people with different workflows. One way of doing this would be to host pre-computed SOAP vectors for the example datasets and make them available in some way like from skcosmo_datasets import CSD
X = CSD.descriptor.SOAP_power_spectrum
Y = CSD.properties.CS_local |
This branch was not ready when I started this branch. I will rebase the branch as soon as #8 is merged.
For the degenerate datasets this interface is offered. from skcosmo.datasets import load_degenerate_manifold
degenerate_manifold = load_degenerate_manifold()
degenerate_manifold.data.SOAP_power_spectrum
degenerate_manifold.data.SOAP_bispectrum
degenerate_manifold.DESCR I did not use descriptor because this becomes a bit ambiguous with the description of the dataset as used in sklearn. I havent added the CSD dataset because this would be something for a dataset with a downloadable link which we havent decided yet. The dataset is not essential for this pull request. |
@agoscinski now that a documentation infrastructure has been merged into main, please add documentation to |
4310f3f
to
4c997a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alex,
This should be split up into multiple PRs. I would separate PRs for:
- The small changes to
setup.cfg
anddocs/requirements
that can be merged quickly - The small changes to the docs setup (napoleon) that can be merged quickly
- loading degenerate manifold data
- orthogonal procrustes
- ridgeregression2foldCV
- metrics/reconstruction_measures
- model_selection
Any tests/examples/gitignore can be put with the corresponding functions. Right now this PR is way to large to feasibly go through piece by piece, and it will be much easier to merge smaller bits at a time, even if some merges have to wait for others to be put in.
I would suggest to split these into individual commits when rebasing the PR instead of individual PRs, since everything is part of reconstruction measures and is used for the examples, therefore self contained. It makes more sense to me and is less work.
|
I'm just wary of making such a large PR, when some of it can be ready to merge fairly quickly, whereas other pieces will take a bit more iteration. Is there any reason not to make separate PRs? |
It makes more sense to me and is less work. |
It will likely result in better code, quicker merging, and will be easier for your fellow developers to review your code. I don't think it will be significantly more work for you to split it up and focus on getting smaller bits of code merged at a time. |
All of the bits depend on the reconstruction measures. If you change something in |
Codecov Report
@@ Coverage Diff @@
## main #9 +/- ##
============================================
- Coverage 100.00% 76.67% -23.33%
============================================
Files 1 12 +11
Lines 1 283 +282
Branches 0 35 +35
============================================
+ Hits 1 217 +216
- Misses 0 57 +57
- Partials 0 9 +9
Continue to review full report at Codecov.
|
examples/linear_model/plot_orthogonal_regression_nonanalytic_behavior.py
Outdated
Show resolved
Hide resolved
examples/linear_model/plot_orthogonal_regression_nonanalytic_behavior.py
Outdated
Show resolved
Hide resolved
examples/linear_model/plot_orthogonal_regression_nonanalytic_behavior.py
Outdated
Show resolved
Hide resolved
00699d8
to
de78bcd
Compare
Starting from commit b7b6c80 ("RidgeRegression2FoldCV added alpha type reg method") are new changes. Everything before is just a rebase. The commit message are relatively verbose about the additional changes. EDIT: it actually starts one commit before the one mentioned 788c769 "replace setUp to setUpClass to reduce test time" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good overall, I have a few high level questions/remarks!
This looks good to me overall, however given the size of the PR I would appreciate if someone else could give it a look as well! Also, did you complete the last point in your first comment?
Finally, the git history is a bit all over the place, could you squash related commits together? |
Okay let me squash some things, check the last point (havent done it yet) and then @rosecers can maybe have a second look on it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I resolved the comments because they were repetitive and I wanted to reduce the discussion to a single comment.
ortho_group.rvs(cls.features_small.shape[1]) | ||
) | ||
|
||
def test_global_reconstruction_error_identity(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a representative for all review comments about adding docs to the two-liner tests:
But you understood this from the code instantly, the code is self-explanatory. I don't think one should be forced to add doc to such simple code. Especially it is just repetitive. I will add the suggestions you send to it, because you already put the work of writing them. But I dont think one should be required to add doc to such simple code.
If you're planning to commit the suggestions, you can click "commit" and it will apply the suggestion and resolve the issue. I'd prefer to leave them unresolved until this happens, so the patches do not get lost in the fray. |
examples/linear_model/plot_orthogonal_regression_nonanalytic_behavior.ipynb
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @agoscinski, it looks overall good, although there are outstanding comments and questions that need to be addressed before rebasing/merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase the commits to get a clean commit history then feel free to merge.
the bash for loop did not include notebooks in subdirectories. the simplest way to solve this seemed to be using find. when copying the example notebooks to ./tox/examples no subdirectories are created.
allows the same test function being executed for multiple inputs
the initialization has only be done one time for the all class related tests
which can additionally return overlapping train and test splits
it is linear regression with the additional constraint that the weight matrix has to be an orthogonal matrix/projector
it is in certain cases a more efficient/accurate ridge regression, using a 2-fold cross-validation scheme, in comparison to sklearn.linear_model.RidgeCV
it includes GFRE, GFRD and LFRE like reconstruction measures
modules: model_selection, metrics and linear_model
2c63543
to
342850a
Compare
Implements reconstruction measures from the paper arXiv 2009.02741
This first suggestion how the reconstruction measures would be laid out. Feedback is welcomed.
Still missing:
preprocessing
branch Add KernelCentering and NormalizeScaler classes for preprocessing #8