WIP: ENH,TST: partial fit implementation with tests#47
WIP: ENH,TST: partial fit implementation with tests#47eiviani-lanl wants to merge 33 commits intomainfrom
Conversation
|
Thanks, I've added the |
4e61c9b to
d98778d
Compare
|
I just resolved the merge conflicts and force pushed up. Let me see if I can at least take a quick look over. |
| ff_model.fit(X, y) | ||
|
|
||
| classes = np.unique(y) | ||
| batch = 25 |
There was a problem hiding this comment.
I noticed that if I change the batch size to 17 locally, one of the Moore-Penrose cases ends up with a numerical issue that causes a failure.
That may be related to the requirement for well-conditioned matrices you note above. We'll need to be clear on what the limitations/weaknesses with partial_fit are, and double check that they are things we cannot improve. One useful exercise might be to check this test with i.e., MLPClassifier to see if its partial_fit is a bit more robust in this regard.
There was a problem hiding this comment.
(and of course, to assess if using rtol + partial_fit actually allows us to overcome many of these issues, or not)
There was a problem hiding this comment.
@eiviani-lanl this looks like it still needs a response
There was a problem hiding this comment.
Hmm, I can't reproduce this locally. I tried reverting the solver to the original np.linalg.pinv, as well as reverting the n_redundant argument in make_classification to 2. Can you still reproduce this error with the most recent version of model.py and test_model.py?
| # NOTE: for Moore-Penrose, a large singular value | ||
| # cutoff (rcond) is required to achieve reasonable accuracy | ||
| # with the Wisconsin breast cancer dataset | ||
| # Without rtol accuracy ~= 0.6316 |
There was a problem hiding this comment.
Excellent, that's good to know. Does this apply only for partial_fit() on that dataset, or to full fit() as well? Based on this, I wonder if it might be beneficial to:
- Perhaps add a
Notesdocstring section to thepartial_fit()function(s) as appropriate to concisely express the potential importance of settingrtolappropriately? I can't remember if we already do that forfit()proper, which of course also has the same issue with Moore-Penrose. - Perhaps also include some of your math details described in detail in a few places to the
Notessection of the docstring forpartial_fit(). I believe you currently have that info as a detailed code comment that an end user wouldn't see without looking at our source. Not sure if we want quite that much detail, but maybe useful to describe at least the core algorithm used.
There was a problem hiding this comment.
@eiviani-lanl I think this review comment still needs a response
|
I'll add a "WIP" (work in progress) to the title here since some merges aren't done correctly yet (i.e., old dependencies are added back accidentally, etc.). |
83fe2f4 to
e651a0d
Compare
Did not address structural comments.. I think I'll handle those in a separate PR * Changed solver from `pinv`/`solve` to `lstsq` as it's more robust for ill determined matrices * I don't think it's possible to use `Ridge` in `partial_fit()`, at least from my experimentation. This does reduce the flexibility in solver, but I can attempt to implement other solvers by hand in a future release. * Combined ensemble/classifier tests. * Added test for partial_fit regression performance on Boston dataset.
tylerjereddy
left a comment
There was a problem hiding this comment.
@eiviani-lanl Ok, I did another round of detailed review. Some notes:
- I added some reminders to respond to inline review comments--a fair bit more detail would be helpful in some places, some changes may make sense to make right away, others you may defer to issues, but in such cases please open the issues and refer to them in written self-review comments so it is clear to the reviewer that you're coordinating those matters.
- Moving forward, if you don't mind leaving reviewer comments unresolved that can help (let the reviewer decide if resolved--in this case it is "ok" I just checked them all myself)
- Overall, it looks pretty good--I'd say I'm a bit unclear on the reasoning for the adjusted solving approaches and did add some review comments about justifying/explaining those with crystal clarity to the user/reader in docstrings.
| @pytest.mark.parametrize("activation", activations) | ||
| @pytest.mark.parametrize("weight_scheme", weights) | ||
| @pytest.mark.parametrize("activation", ACTIVATIONS.keys()) | ||
| @pytest.mark.parametrize("weight_scheme", list(WEIGHTS.keys())) |
There was a problem hiding this comment.
Here and elsewhere--I can't quite work out why you need to perform the list() coercion for one dictionary and not the other? When I remove the coercions locally the same number of tests run and pass, so unless there's a compelling reason for it, it is probably best to remove the spurious coercions.
There was a problem hiding this comment.
This has been fixed.
| @pytest.mark.parametrize("hidden_layer_sizes", [(10,), (5, 5)]) | ||
| @pytest.mark.parametrize("n_classes", [2, 5]) | ||
| @pytest.mark.parametrize("activation", ACTIVATIONS.keys()) | ||
| @pytest.mark.parametrize("weight_init", list(WEIGHTS.keys())[1:]) |
There was a problem hiding this comment.
this particular coercion is "ok" since you can't slice the dict keys
| # NOTE: for Moore-Penrose, a large singular value | ||
| # cutoff (rcond) is required to achieve reasonable accuracy | ||
| # with the Wisconsin breast cancer dataset | ||
| # Without rtol accuracy ~= 0.6316 |
There was a problem hiding this comment.
@eiviani-lanl I think this review comment still needs a response
| model.partial_fit(X_train[start:end], y_train[start:end]) | ||
|
|
||
| actual = model.score(X_test, y_test) | ||
| # RandomForestRegressor() with default params scores 0.958 here |
There was a problem hiding this comment.
Should this be RandomForestClassifier?
| "Classifier, attr", | ||
| [ | ||
| (GFDLClassifier, "coeff_"), | ||
| (GFDLClassifier, "coeff_"), |
There was a problem hiding this comment.
This is an accidental duplication or am I missing something?
| ----- | ||
| The design matrix is incrementally updated by persisting the gram matrix | ||
| (D.T @ D) and the moment vector (D.T @ y) for each batch, then adding | ||
| the gram and moment contributions of each new batch. |
There was a problem hiding this comment.
as noted elsewhere, a bit of detail about what happens with partial fit vs. full fit for exact and non-exact solves, and why they are different, seems useful to clearly explain
There was a problem hiding this comment.
I've added a few sentences to Notes acknowledging the differences in the full fit and partial fit non-exact solve, and justifying our implementation. Notes section also includes the differences in the exact solve and possible necessity for lower rtol.
|
|
||
| def partial_fit(self, X, y): | ||
| """ | ||
| Train the gradient-free neural network on the batched training set (X, y). |
| ff_model.fit(X, y) | ||
|
|
||
| classes = np.unique(y) | ||
| batch = 25 |
There was a problem hiding this comment.
@eiviani-lanl this looks like it still needs a response
| self.coeff_ = np.linalg.pinv(self.A, rcond=self.rtol) @ self.B | ||
| else: | ||
| reg = np.identity(self.A.shape[0]) * self.reg_alpha | ||
| self.coeff_ = np.linalg.solve(self.A + reg, self.B) |
There was a problem hiding this comment.
@eiviani-lanl this one probably still needs a written response
| coef_ = np.linalg.pinv(self.As[i], rcond=self.rtol) @ self.Bs[i] | ||
| else: | ||
| reg = np.identity(self.As[i].shape[0]) * self.reg_alpha | ||
| coef_ = np.linalg.solve(self.As[i] + reg, self.Bs[i]) |
There was a problem hiding this comment.
@eiviani-lanl would be useful to clarify here with written response
| # | ||
| # Now suppose the full design matrix is formed by vertically stacking | ||
| # two batches, written as [B_1 | B_2]: | ||
| # [B_1 | B_2].T @ [B_1 | B_2] = B_1.T @ B_1 + B_2.T @ B_2 |
There was a problem hiding this comment.
I would suggest adding the following description here to clarify how our partial fit is implemented:
Assuming
Also, add a statement about how you are collecting the batch sums into the normal and gram matrix as each batch is processed.
…nsively tests rtol and alpha reg
|
@eiviani-lanl let me know when this is ready for another round of review @nray I see you've approved when all review comments are not fully resolved--can you reserve approvals for full confidence that the entire PR is ready to go? |
|
The tests are failing for reasons unrelated to the changes here. When you notice that, please proactively open an issue and help the team investigate. I've gotten that process started with gh-82. |
…d rtol differences to user-facing partial_fit() notes
…ences from full `fit()`
…to eiviani_partial_fit
|
@tylerjereddy I think this is ready for review. |

partial_fit() using normal equations
Moore Penrose Pseudoinverse:
D+ = (D.T @ D)^-1 @ D.T
Least squares solution:
D+ @ y = (D.T @ D)^-1 @ D.T @ y
We're persisting the gram (D.T @ D) and moment (D.T @ y) matrices and updating them by adding the gram and moment matrices of each consecutive batch.
Update rule:
Consider the summation representation of matrix multiplication:
(B.T @ B)_ij = sum_k (B.T_ik @ B_kj)
Now suppose the full design matrix is formed by vertically stacking
two batches, written as [B_1 | B_2]:
[B_1 | B_2].T @ [B_1 | B_2] = B_1.T @ B_1 + B_2.T @ B_2