Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_sparse bug with fill_value specified #1375

changhiskhan opened this issue Jun 1, 2012 · 2 comments


Copy link

commented Jun 1, 2012

originally raised on pydata mailing list:

In [50]: DataFrame({'x': [1., 1.]}).to_sparse(fill_value=0).x.mean()
Out[50]: 0.5

In [51]: DataFrame({'x': [1., 1.]}).to_sparse().x.mean()
Out[51]: 1.0

@ghost ghost assigned changhiskhan Jun 1, 2012


This comment has been minimized.

Copy link

commented Jun 1, 2012

I have looked quickly at the python code implementing this and it appears that in both SparseArray.mean and SparseArray.sum the nsparse variable is counting the number of non-sparse entries rather than the number of sparse entries, and this is the cause of the incorrect values returned from these methods. I think that setting nsparse = self.sp_index.length - self.sp_index.npoints in both methods should fix this issue, but I don't understand the code well enough to be sure that this is correct.

changhiskhan pushed a commit that referenced this issue Jun 1, 2012

This comment has been minimized.

Copy link
Contributor Author

commented Jun 1, 2012

@grsr that's pretty much what I did. Thanks for the input.
If you're looking for ways to get involved without digging too deep into the codebase, we'll soon start providing "Community" labels issues that we think are more discrete and require less staring at too much of pandas internals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.