Skip to content

WIP: ENH,TST: partial fit implementation with tests#47

Open
eiviani-lanl wants to merge 33 commits intomainfrom
eiviani_partial_fit
Open

WIP: ENH,TST: partial fit implementation with tests#47
eiviani-lanl wants to merge 33 commits intomainfrom
eiviani_partial_fit

Conversation

@eiviani-lanl
Copy link
Copy Markdown
Collaborator

partial_fit() using normal equations

Moore Penrose Pseudoinverse:
D+ = (D.T @ D)^-1 @ D.T

Least squares solution:
D+ @ y = (D.T @ D)^-1 @ D.T @ y

We're persisting the gram (D.T @ D) and moment (D.T @ y) matrices and updating them by adding the gram and moment matrices of each consecutive batch.

Update rule:
Consider the summation representation of matrix multiplication:
(B.T @ B)_ij = sum_k (B.T_ik @ B_kj)

Now suppose the full design matrix is formed by vertically stacking
two batches, written as [B_1 | B_2]:
[B_1 | B_2].T @ [B_1 | B_2] = B_1.T @ B_1 + B_2.T @ B_2

@tylerjereddy tylerjereddy added the enhancement New feature or request label Feb 5, 2026
@tylerjereddy tylerjereddy added this to the 0.2.0 milestone Feb 5, 2026
@tylerjereddy
Copy link
Copy Markdown
Collaborator

Thanks, I've added the 0.2.0 release milestone since trying to merge this right before branching for a release would probably be ill-advised.

@tylerjereddy
Copy link
Copy Markdown
Collaborator

I just resolved the merge conflicts and force pushed up. Let me see if I can at least take a quick look over.

Comment thread src/gfdl/tests/test_regression.py Outdated
Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/tests/test_model.py Outdated
ff_model.fit(X, y)

classes = np.unique(y)
batch = 25
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that if I change the batch size to 17 locally, one of the Moore-Penrose cases ends up with a numerical issue that causes a failure.

That may be related to the requirement for well-conditioned matrices you note above. We'll need to be clear on what the limitations/weaknesses with partial_fit are, and double check that they are things we cannot improve. One useful exercise might be to check this test with i.e., MLPClassifier to see if its partial_fit is a bit more robust in this regard.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and of course, to assess if using rtol + partial_fit actually allows us to overcome many of these issues, or not)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl this looks like it still needs a response

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I can't reproduce this locally. I tried reverting the solver to the original np.linalg.pinv, as well as reverting the n_redundant argument in make_classification to 2. Can you still reproduce this error with the most recent version of model.py and test_model.py?

Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/tests/test_model.py Outdated
# NOTE: for Moore-Penrose, a large singular value
# cutoff (rcond) is required to achieve reasonable accuracy
# with the Wisconsin breast cancer dataset
# Without rtol accuracy ~= 0.6316
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, that's good to know. Does this apply only for partial_fit() on that dataset, or to full fit() as well? Based on this, I wonder if it might be beneficial to:

  1. Perhaps add a Notes docstring section to the partial_fit() function(s) as appropriate to concisely express the potential importance of setting rtol appropriately? I can't remember if we already do that for fit() proper, which of course also has the same issue with Moore-Penrose.
  2. Perhaps also include some of your math details described in detail in a few places to the Notes section of the docstring for partial_fit(). I believe you currently have that info as a detailed code comment that an end user wouldn't see without looking at our source. Not sure if we want quite that much detail, but maybe useful to describe at least the core algorithm used.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl I think this review comment still needs a response

Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/tests/test_model.py
Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/tests/test_model.py Outdated
Comment thread src/gfdl/model.py
@tylerjereddy tylerjereddy changed the title ENH,TST: partial fit implementation with tests WIP: ENH,TST: partial fit implementation with tests Mar 4, 2026
@tylerjereddy
Copy link
Copy Markdown
Collaborator

I'll add a "WIP" (work in progress) to the title here since some merges aren't done correctly yet (i.e., old dependencies are added back accidentally, etc.).

Did not address structural comments.. I think I'll handle those in a separate PR

* Changed solver from `pinv`/`solve` to `lstsq` as it's more robust for ill determined matrices
  * I don't think it's possible to use `Ridge` in `partial_fit()`, at least from my experimentation. This does reduce the flexibility in solver, but I can attempt to implement other solvers by hand in a future release.
* Combined ensemble/classifier tests.
* Added test for partial_fit regression performance on Boston dataset.
Copy link
Copy Markdown
Collaborator

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl Ok, I did another round of detailed review. Some notes:

  • I added some reminders to respond to inline review comments--a fair bit more detail would be helpful in some places, some changes may make sense to make right away, others you may defer to issues, but in such cases please open the issues and refer to them in written self-review comments so it is clear to the reviewer that you're coordinating those matters.
  • Moving forward, if you don't mind leaving reviewer comments unresolved that can help (let the reviewer decide if resolved--in this case it is "ok" I just checked them all myself)
  • Overall, it looks pretty good--I'd say I'm a bit unclear on the reasoning for the adjusted solving approaches and did add some review comments about justifying/explaining those with crystal clarity to the user/reader in docstrings.

Comment thread src/gfdl/tests/test_model.py Outdated
@pytest.mark.parametrize("activation", activations)
@pytest.mark.parametrize("weight_scheme", weights)
@pytest.mark.parametrize("activation", ACTIVATIONS.keys())
@pytest.mark.parametrize("weight_scheme", list(WEIGHTS.keys()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere--I can't quite work out why you need to perform the list() coercion for one dictionary and not the other? When I remove the coercions locally the same number of tests run and pass, so unless there's a compelling reason for it, it is probably best to remove the spurious coercions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed.

@pytest.mark.parametrize("hidden_layer_sizes", [(10,), (5, 5)])
@pytest.mark.parametrize("n_classes", [2, 5])
@pytest.mark.parametrize("activation", ACTIVATIONS.keys())
@pytest.mark.parametrize("weight_init", list(WEIGHTS.keys())[1:])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this particular coercion is "ok" since you can't slice the dict keys

Comment thread src/gfdl/tests/test_model.py Outdated
# NOTE: for Moore-Penrose, a large singular value
# cutoff (rcond) is required to achieve reasonable accuracy
# with the Wisconsin breast cancer dataset
# Without rtol accuracy ~= 0.6316
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl I think this review comment still needs a response

Comment thread src/gfdl/tests/test_model.py Outdated
model.partial_fit(X_train[start:end], y_train[start:end])

actual = model.score(X_test, y_test)
# RandomForestRegressor() with default params scores 0.958 here
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be RandomForestClassifier?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

"Classifier, attr",
[
(GFDLClassifier, "coeff_"),
(GFDLClassifier, "coeff_"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an accidental duplication or am I missing something?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/gfdl/model.py
-----
The design matrix is incrementally updated by persisting the gram matrix
(D.T @ D) and the moment vector (D.T @ y) for each batch, then adding
the gram and moment contributions of each new batch.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as noted elsewhere, a bit of detail about what happens with partial fit vs. full fit for exact and non-exact solves, and why they are different, seems useful to clearly explain

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few sentences to Notes acknowledging the differences in the full fit and partial fit non-exact solve, and justifying our implementation. Notes section also includes the differences in the exact solve and possible necessity for lower rtol.

Comment thread src/gfdl/model.py

def partial_fit(self, X, y):
"""
Train the gradient-free neural network on the batched training set (X, y).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and other partial_fit docstrings, can you please add a .. versionadded:: directive to signal that this feature is new version 0.2.0? You can find examples of how this is done in other Python OSS projects. Example from another project:

Image

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread src/gfdl/tests/test_model.py Outdated
ff_model.fit(X, y)

classes = np.unique(y)
batch = 25
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl this looks like it still needs a response

Comment thread src/gfdl/model.py Outdated
self.coeff_ = np.linalg.pinv(self.A, rcond=self.rtol) @ self.B
else:
reg = np.identity(self.A.shape[0]) * self.reg_alpha
self.coeff_ = np.linalg.solve(self.A + reg, self.B)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl this one probably still needs a written response

Comment thread src/gfdl/model.py Outdated
coef_ = np.linalg.pinv(self.As[i], rcond=self.rtol) @ self.Bs[i]
else:
reg = np.identity(self.As[i].shape[0]) * self.reg_alpha
coef_ = np.linalg.solve(self.As[i] + reg, self.Bs[i])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eiviani-lanl would be useful to clarify here with written response

Comment thread src/gfdl/model.py
Comment thread src/gfdl/model.py
#
# Now suppose the full design matrix is formed by vertically stacking
# two batches, written as [B_1 | B_2]:
# [B_1 | B_2].T @ [B_1 | B_2] = B_1.T @ B_1 + B_2.T @ B_2
Copy link
Copy Markdown
Collaborator

@nray nray Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest adding the following description here to clarify how our partial fit is implemented:

$D\beta=y$ , where D is the design matrix over the full dataset.
Assuming $n$ batches, $D = [D_1, ..., D_n]^T$ where $D_i$ corresponds to the design matrix for batch $i$. Similarly, $y = [y_1, y_2, ..., y_n]^T$. Then, $\sum_{i=1}^{n}D_i^TD_i\beta=\sum_{i=1}^{n}D_i^Ty_i$

Also, add a statement about how you are collecting the batch sums into the normal and gram matrix as each batch is processed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@nray nray self-requested a review April 10, 2026 03:46
@tylerjereddy
Copy link
Copy Markdown
Collaborator

@eiviani-lanl let me know when this is ready for another round of review

@nray I see you've approved when all review comments are not fully resolved--can you reserve approvals for full confidence that the entire PR is ready to go?

@tylerjereddy
Copy link
Copy Markdown
Collaborator

The tests are failing for reasons unrelated to the changes here. When you notice that, please proactively open an issue and help the team investigate.

I've gotten that process started with gh-82.

@eiviani-lanl
Copy link
Copy Markdown
Collaborator Author

@tylerjereddy I think this is ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants