Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Make partial_fit ignore n_iter in MiniBatchDictionaryLearning #17433

Merged
merged 7 commits into from
Jun 9, 2020

Conversation

cmarmo
Copy link
Contributor

@cmarmo cmarmo commented Jun 3, 2020

Reference Issues/PRs

Fixes #6779

What does this implement/fix? Explain your changes.

Makes partial_fit ignore n_iter in MiniBatchDictionaryLearning hardcoding n_iter=1 in dict_learning_online call.

@cmarmo
Copy link
Contributor Author

cmarmo commented Jun 3, 2020

@GaelVaroquaux , do you mind to have a look? I'm in the process of familiarizing with online algorithms. Thanks!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. At the end of fit:
self.iter_offset_ = iter_offset + self.n_iter

Should this be:

self.iter_offset_ = iter_offset + 1

We would need test to make sure iter_offset_ is updated correctly.

  1. It is strange that n_iter_ is not getting updated in partial_fit (or created on the first call to partial_fit.

@cmarmo
Copy link
Contributor Author

cmarmo commented Jun 3, 2020

self.iter_offset_ = iter_offset + 1

Definitely! Thanks!

@cmarmo
Copy link
Contributor Author

cmarmo commented Jun 4, 2020

It is strange that n_iter_ is not getting updated in partial_fit (or created on the first call to partial_fit).

My understanding was that partial_fit only performs one iteration by definition so no need to store this information. Am I wrong?

@thomasjpfan
Copy link
Member

My understanding was that partial_fit only performs one iteration by definition so no need to store this information. Am I wrong?

I'm concerned with multiple calls to parital_fit:

est = MiniBatchDictionaryLearning(n_iter=10)
est.fit(...)
est.partial_fit(...)  # one iter
est.partial_fit(...)  # one iter

# Should this be true?
assert est.n_iter_ == 12

I am not familiar enough with design to know the intention with n_iter_ vs iter_offset_.

@cmarmo
Copy link
Contributor Author

cmarmo commented Jun 5, 2020

I am not familiar enough with design to know the intention with n_iter_ vs iter_offset_.

Neither am I... :D
If I understand correctly the expected behaviour is (following your example)

assert est.n_iter_ == 10
assert est.n_offset_ == 12

But the reason of this is unclear to me. Maybe @vene could help? :) Thanks!

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to me like the good fix, and a good test. Thanks a lot!!

Maybe this warrants a small note in the whats_new. Otherwise, it's good to go. This will be useful!

for sample in X:
dictV_pfit = dict2.partial_fit(sample[np.newaxis, :])

assert dictV_fit.iter_offset_ == dictV_pfit.iter_offset_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good test!!

@GaelVaroquaux GaelVaroquaux changed the title [MRG] Make partial_fit ignore n_iter in MiniBatchDictionaryLearning [MRG+1] Make partial_fit ignore n_iter in MiniBatchDictionaryLearning Jun 8, 2020
@GaelVaroquaux GaelVaroquaux requested a review from thomasjpfan June 8, 2020 12:07
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor commits, otherwise LGTM

dict2 = MiniBatchDictionaryLearning(n_components, n_iter=10,
dict_init=V, random_state=0,
shuffle=False)
dictV_fit = dict1.fit(X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce the number of variables in the test

Suggested change
dictV_fit = dict1.fit(X)
dict1.fit(X)

shuffle=False)
dictV_fit = dict1.fit(X)
for sample in X:
dictV_pfit = dict2.partial_fit(sample[np.newaxis, :])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dictV_pfit = dict2.partial_fit(sample[np.newaxis, :])
dict2.partial_fit(sample[np.newaxis, :])

for sample in X:
dictV_pfit = dict2.partial_fit(sample[np.newaxis, :])

assert dictV_fit.iter_offset_ == dictV_pfit.iter_offset_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert dictV_fit.iter_offset_ == dictV_pfit.iter_offset_
assert dict1.iter_offset_ == dict2.iter_offset_

@GaelVaroquaux
Copy link
Member

Two +1s, green CI. Merging. Thanks @cmarmo !

@GaelVaroquaux GaelVaroquaux merged commit 98f0b83 into scikit-learn:master Jun 9, 2020
@cmarmo cmarmo deleted the mbdictlearning branch June 9, 2020 09:45
viclafargue pushed a commit to viclafargue/scikit-learn that referenced this pull request Jun 26, 2020
…scikit-learn#17433)

* Make partial_fit ignore n_iter.

* Revert whiteline.

* Fix self.iter_offset.

* Add test on iter_offset_

* Add what's new entry

* Apply thomasjpfan suggestions.
@ogrisel ogrisel added this to the 0.23.2 milestone Aug 3, 2020
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 3, 2020
…scikit-learn#17433)

* Make partial_fit ignore n_iter.

* Revert whiteline.

* Fix self.iter_offset.

* Add test on iter_offset_

* Add what's new entry

* Apply thomasjpfan suggestions.
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…scikit-learn#17433)

* Make partial_fit ignore n_iter.

* Revert whiteline.

* Fix self.iter_offset.

* Add test on iter_offset_

* Add what's new entry

* Apply thomasjpfan suggestions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MiniBatchDictionaryLearning: partial_fit with wrong behaviour?
4 participants