New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] idf_ setter for TfidfTransformer. #10899

Merged
merged 8 commits into from Apr 5, 2018

Conversation

Projects
None yet
3 participants
@serega
Contributor

serega commented Apr 1, 2018

Reference Issues/PRs

Fixes #7102.

What does this implement/fix? Explain your changes.

The existing property idf_, which returns array of learned idf values does not have the corresponding setter. Internally, idf vector is stored in diagonal matrix.
The change transforms the setter value into the diaganal matrix using spdiags. The setter allows to restore the state of fitted TfidfTransformer using idf vector stored elsewhere.

@rth

Thanks for this PR @serega !

A few comments are below,

Show outdated Hide outdated sklearn/feature_extraction/tests/test_text.py Outdated
@idf_.setter
def idf_(self, value):
value = np.asarray(value, dtype=np.float64)
n_features = value.shape[0]

This comment has been minimized.

@rth

rth Apr 1, 2018

Member

n_features is the vocabulary size, we should check here that the provided shape is consistent with it.

@rth

rth Apr 1, 2018

Member

n_features is the vocabulary size, we should check here that the provided shape is consistent with it.

This comment has been minimized.

@serega

serega Apr 2, 2018

Contributor

TfidfTransformer does not have the vocabulary, TfidfVectorizer does. I added validation to TfidfVectorizer

@serega

serega Apr 2, 2018

Contributor

TfidfTransformer does not have the vocabulary, TfidfVectorizer does. I added validation to TfidfVectorizer

This comment has been minimized.

@rth

rth Apr 2, 2018

Member

Yes, right, thanks.

@rth

rth Apr 2, 2018

Member

Yes, right, thanks.

Show outdated Hide outdated sklearn/feature_extraction/text.py Outdated
Show outdated Hide outdated sklearn/feature_extraction/text.py Outdated

serega added some commits Apr 2, 2018

Show outdated Hide outdated sklearn/feature_extraction/text.py Outdated
@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 3, 2018

Member

LGTM apart for #10899 (comment)

Member

rth commented Apr 3, 2018

LGTM apart for #10899 (comment)

@rth rth changed the title from idf_ setter for TfidfTransformer. Fixes #7102 to [MRG+1] idf_ setter for TfidfTransformer. Fixes #7102 Apr 3, 2018

@rth rth changed the title from [MRG+1] idf_ setter for TfidfTransformer. Fixes #7102 to [MRG+1] idf_ setter for TfidfTransformer. Apr 3, 2018

Show outdated Hide outdated sklearn/feature_extraction/text.py Outdated
@jnothman

Otherwise LGTM

Show outdated Hide outdated sklearn/feature_extraction/tests/test_text.py Outdated
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Apr 3, 2018

Member

I think there are comments from @rth that may still need addressing

Member

jnothman commented Apr 3, 2018

I think there are comments from @rth that may still need addressing

Show outdated Hide outdated sklearn/feature_extraction/text.py Outdated
@rth

This comment has been minimized.

Show comment
Hide comment
@rth

rth Apr 5, 2018

Member

Merging, thanks @serega !

Member

rth commented Apr 5, 2018

Merging, thanks @serega !

@rth rth merged commit f890144 into scikit-learn:master Apr 5, 2018

8 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.08%)
Details
codecov/project 95.09% (+0.01%) compared to 12cdb83
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment