New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an invalid CSR matrix #7750

Merged
merged 6 commits into from Oct 26, 2016

Conversation

Projects
None yet
3 participants
@perimosocordiae
Contributor

perimosocordiae commented Oct 25, 2016

See scipy/scipy#6719 for context.

The gist is that the inverse array may have a different dtype than yt.indices, which causes trouble down the line because, in those cases, yt.indices and yt.indptr have different dtypes.

Alternately, we could insert yt.check_format(full_check=False) after modifying the sparse matrix members.

BUG: MultiLabelBinarizer makes invalid CSR matrix
See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.
@amueller

This comment has been minimized.

Member

amueller commented Oct 25, 2016

Thanks. can you add a test please?

@amueller amueller added this to the 0.18.1 milestone Oct 25, 2016

perimosocordiae added some commits Oct 25, 2016

Fixing for old numpy
Older versions don't support kwargs for `astype`
@@ -732,7 +732,7 @@ def fit_transform(self, y):
class_mapping = np.empty(len(tmp), dtype=dtype)
class_mapping[:] = tmp
self.classes_, inverse = np.unique(class_mapping, return_inverse=True)
yt.indices = inverse[yt.indices].astype(yt.indices.dtype, copy=False)

This comment has been minimized.

@jnothman

jnothman Oct 25, 2016

Member

you could use np.asarray(..., dtype=) to more closely reflect this operation, I think

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 25, 2016

Yes, it looks like that's the sort of test we should perform wherever we do CSR manipulation. Hacky hacky hack hack. Thanks @perimosocordiae.

@perimosocordiae

This comment has been minimized.

Contributor

perimosocordiae commented Oct 26, 2016

Tests pass now. Should I squash the commits?

@jnothman jnothman changed the title from BUG: MultiLabelBinarizer.fit_transform sometimes returns an invalid CSR matrix to [MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an invalid CSR matrix Oct 26, 2016

@jnothman

This comment has been minimized.

Member

jnothman commented Oct 26, 2016

no need.

@amueller

This comment has been minimized.

Member

amueller commented Oct 26, 2016

Maybe add a comment why the assert is needed and why the line is needed? It's a bit non-obvious to me.
Would checking if it's possible to convert to lil be a better test?
LGTM as is, but comments might be nice.

@amueller

This comment has been minimized.

Member

amueller commented Oct 26, 2016

Opened #7762 to track the overall problem.

@perimosocordiae

This comment has been minimized.

Contributor

perimosocordiae commented Oct 26, 2016

Okay, comments added. I skipped the full CI treatment for the comment-only changes.

Conversion to LIL format would test the symptom but not the cause of the error. It's possible that we may add checks in scipy to deal with this in the future, so I'd prefer to not rely on .tolil() failing for this case.

@amueller amueller merged commit 94c2094 into scikit-learn:master Oct 26, 2016

1 check passed

ci/circleci Your tests passed on CircleCI!
Details
@amueller

This comment has been minimized.

Member

amueller commented Oct 26, 2016

thanks :)

amueller added a commit to amueller/scikit-learn that referenced this pull request Nov 9, 2016

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an i…
…nvalid CSR matrix (scikit-learn#7750)

* BUG: MultiLabelBinarizer makes invalid CSR matrix

See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.

* Fixing for old numpy

Older versions don't support kwargs for `astype`

* Adding tests

* line-wrapping

* adding comment to tests

[ci skip]

* added rationale comment

[ci skip]

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an i…
…nvalid CSR matrix (scikit-learn#7750)

* BUG: MultiLabelBinarizer makes invalid CSR matrix

See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.

* Fixing for old numpy

Older versions don't support kwargs for `astype`

* Adding tests

* line-wrapping

* adding comment to tests

[ci skip]

* added rationale comment

[ci skip]

afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an i…
…nvalid CSR matrix (scikit-learn#7750)

* BUG: MultiLabelBinarizer makes invalid CSR matrix

See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.

* Fixing for old numpy

Older versions don't support kwargs for `astype`

* Adding tests

* line-wrapping

* adding comment to tests

[ci skip]

* added rationale comment

[ci skip]

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an i…
…nvalid CSR matrix (scikit-learn#7750)

* BUG: MultiLabelBinarizer makes invalid CSR matrix

See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.

* Fixing for old numpy

Older versions don't support kwargs for `astype`

* Adding tests

* line-wrapping

* adding comment to tests

[ci skip]

* added rationale comment

[ci skip]

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an i…
…nvalid CSR matrix (scikit-learn#7750)

* BUG: MultiLabelBinarizer makes invalid CSR matrix

See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.

* Fixing for old numpy

Older versions don't support kwargs for `astype`

* Adding tests

* line-wrapping

* adding comment to tests

[ci skip]

* added rationale comment

[ci skip]

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] BUG: MultiLabelBinarizer.fit_transform sometimes returns an i…
…nvalid CSR matrix (scikit-learn#7750)

* BUG: MultiLabelBinarizer makes invalid CSR matrix

See scipy/scipy#6719 for context.

The gist is that the `inverse` array may have a different dtype than `yt.indices`, which causes trouble down the line because, in those cases, `yt.indices` and `yt.indptr` have different dtypes.

Alternately, we could insert `yt.check_format(full_check=False)` after modifying the sparse matrix members.

* Fixing for old numpy

Older versions don't support kwargs for `astype`

* Adding tests

* line-wrapping

* adding comment to tests

[ci skip]

* added rationale comment

[ci skip]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment