lda transform and scipy sparse matrices #34

robert-howley-zocdoc · 2015-06-01T21:08:54Z

Hi,

I stumbled across an issue when pipelining the scikit count vectorizer with the lda model. In the transform method of lda the line X = np.atleast_2d(X) can be problematic due to the scipy sparse matrix returned from CountVectorizer. This issue indicates that not only does the dimension conversion not work with the scipy sparse matrix, but also that there doesn't seem to be an appetite to handle it.

My work around was to wrap the pipeline object, override transform and convert the word count sparse matrix to a dense 2D before it gets to lda's transform.

def transform(self, X, **kwargs):
    x_t = self.count_vectorizer.transform(X)
    return self.lda.transform(np.asarray(x_t.todense()), **kwargs)

If there's a better way of navigating this issue, please let me know. Otherwise, it would be great to have a check put into the transform method that avoid the use of X = np.atleast_2d(X) in the case of scipy sparse matrix.

Thanks

Rob

The text was updated successfully, but these errors were encountered:

ariddell · 2015-06-01T21:39:21Z

Thanks for this.

So I take it's just with the transform method? I see there's this check in fit_transform:
https://github.com/ariddell/lda/blob/8829470e9d78fb2ec74e5dfe8baf802e340674d5/lda/lda.py#L138

Perhaps adding this guard to transform might work.

ariddell · 2015-06-02T17:56:37Z

Another user reported this. As @robert-howley-zocdoc, says, it's a problem with the np.atleast2d

robert-howley-zocdoc · 2015-06-03T16:04:07Z

Yep, just transform. Thanks for the quick response.

ariddell closed this as completed in 16599de Jun 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lda transform and scipy sparse matrices #34

lda transform and scipy sparse matrices #34

robert-howley-zocdoc commented Jun 1, 2015

ariddell commented Jun 1, 2015

ariddell commented Jun 2, 2015

robert-howley-zocdoc commented Jun 3, 2015

lda transform and scipy sparse matrices #34

lda transform and scipy sparse matrices #34

Comments

robert-howley-zocdoc commented Jun 1, 2015

ariddell commented Jun 1, 2015

ariddell commented Jun 2, 2015

robert-howley-zocdoc commented Jun 3, 2015