Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Sparse concat may fill fill_value with NaN #12966

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Apr 23, 2016

# on current master
dense1 = pd.DataFrame({'A': [1, 2, 3, np.nan],
                    'B': [0, 0, 0, 0],
                    'C': [np.nan, np.nan, np.nan, np.nan],
                    'D': [1, 2, 3, 4]})
dense2 = pd.DataFrame({'E': [1, 2, 3, np.nan],
                    'F': [0, 0, 0, 0],
                    'G': [np.nan, np.nan, np.nan, np.nan],
                    'H': [1, 2, 3, 4]})
sparse1 = dense1.to_sparse()
sparse2 = dense2.to_sparse()
pd.concat([sparse2, sparse1], axis=1)
# AttributeError: 'int' object has no attribute 'ravel'

One point to be discussed is the logic for return type. Currently, SparseDataFrame is returned only when all blocks are all sparse. Because SparseDataFrame can't work properly if dense block is contained.

Thus, dense and sparse concat with axis=0 resunts in SparseDataFrame, and axis=1 results in normal DataFrame.

@sinhrks sinhrks added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels Apr 23, 2016
@sinhrks sinhrks added this to the 0.18.1 milestone Apr 23, 2016
@jreback
Copy link
Contributor

jreback commented Apr 23, 2016

I think we could return SparseDataFrame if any blocks are sparse.

@sinhrks
Copy link
Member Author

sinhrks commented Apr 23, 2016

OK. Let me clarify the detail. When returning SparseDataFrame if any block is sparse:

  • Should dense blocks be converted to sparse?
  • Or keep dense blocks as it is (allow mixture of sparse blocks and dense blocks)?
    • If so, slicing dense block(s) should be normal Series / DataFrame, rather than SparseSeries / SparseDataFrame?

@jreback
Copy link
Contributor

jreback commented Apr 23, 2016

In an ideal world we wouldn't even have SparseDataFrame, just `SparseSeries``. but since we do I wouldn't coerce anything, e.g. leave sparse as sparse and dense as dense. In general we DO preserve these kinds so things will tend to propogate anyhow.

@sinhrks sinhrks force-pushed the sparse_dfconcat branch 4 times, most recently from 368ebbc to a1d627a Compare April 24, 2016 01:38
@sinhrks
Copy link
Member Author

sinhrks commented Apr 24, 2016

we wouldn't even have SparseDataFrame

Indeed...

Fixed the return type logic and now green.

@jreback jreback closed this in 5ae1bd8 Apr 25, 2016
@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

thanks!

@sinhrks sinhrks deleted the sparse_dfconcat branch April 25, 2016 16:35
@kawochen kawochen mentioned this pull request Apr 25, 2016
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

concat erroneously sets series to NaN
2 participants