Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lda transform and scipy sparse matrices #34

Closed
robert-howley-zocdoc opened this issue Jun 1, 2015 · 3 comments
Closed

lda transform and scipy sparse matrices #34

robert-howley-zocdoc opened this issue Jun 1, 2015 · 3 comments

Comments

@robert-howley-zocdoc
Copy link

Hi,

I stumbled across an issue when pipelining the scikit count vectorizer with the lda model. In the transform method of lda the line X = np.atleast_2d(X) can be problematic due to the scipy sparse matrix returned from CountVectorizer. This issue indicates that not only does the dimension conversion not work with the scipy sparse matrix, but also that there doesn't seem to be an appetite to handle it.

My work around was to wrap the pipeline object, override transform and convert the word count sparse matrix to a dense 2D before it gets to lda's transform.

def transform(self, X, **kwargs):
    x_t = self.count_vectorizer.transform(X)
    return self.lda.transform(np.asarray(x_t.todense()), **kwargs)

If there's a better way of navigating this issue, please let me know. Otherwise, it would be great to have a check put into the transform method that avoid the use of X = np.atleast_2d(X) in the case of scipy sparse matrix.

Thanks

Rob

@ariddell
Copy link
Contributor

ariddell commented Jun 1, 2015

Thanks for this.

So I take it's just with the transform method? I see there's this check in fit_transform:
https://github.com/ariddell/lda/blob/8829470e9d78fb2ec74e5dfe8baf802e340674d5/lda/lda.py#L138

Perhaps adding this guard to transform might work.

@ariddell
Copy link
Contributor

ariddell commented Jun 2, 2015

Another user reported this. As @robert-howley-zocdoc, says, it's a problem with the np.atleast2d

@robert-howley-zocdoc
Copy link
Author

Yep, just transform. Thanks for the quick response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants