New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Sparse concat may fill fill_value with NaN #12966

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
2 participants
@sinhrks
Member

sinhrks commented Apr 23, 2016

  • closes #9765

  • tests added / passed

  • passes git diff upstream/master | flake8 --diff

  • whatsnew entry

    This also added tests related to #12174 and fixed concat(axis=1) issue.

# on current master
dense1 = pd.DataFrame({'A': [1, 2, 3, np.nan],
                    'B': [0, 0, 0, 0],
                    'C': [np.nan, np.nan, np.nan, np.nan],
                    'D': [1, 2, 3, 4]})
dense2 = pd.DataFrame({'E': [1, 2, 3, np.nan],
                    'F': [0, 0, 0, 0],
                    'G': [np.nan, np.nan, np.nan, np.nan],
                    'H': [1, 2, 3, 4]})
sparse1 = dense1.to_sparse()
sparse2 = dense2.to_sparse()
pd.concat([sparse2, sparse1], axis=1)
# AttributeError: 'int' object has no attribute 'ravel'

One point to be discussed is the logic for return type. Currently, SparseDataFrame is returned only when all blocks are all sparse. Because SparseDataFrame can't work properly if dense block is contained.

Thus, dense and sparse concat with axis=0 resunts in SparseDataFrame, and axis=1 results in normal DataFrame.

@sinhrks sinhrks added this to the 0.18.1 milestone Apr 23, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 23, 2016

Contributor

I think we could return SparseDataFrame if any blocks are sparse.

Contributor

jreback commented Apr 23, 2016

I think we could return SparseDataFrame if any blocks are sparse.

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Apr 23, 2016

Member

OK. Let me clarify the detail. When returning SparseDataFrame if any block is sparse:

  • Should dense blocks be converted to sparse?
  • Or keep dense blocks as it is (allow mixture of sparse blocks and dense blocks)?
    • If so, slicing dense block(s) should be normal Series / DataFrame, rather than SparseSeries / SparseDataFrame?
Member

sinhrks commented Apr 23, 2016

OK. Let me clarify the detail. When returning SparseDataFrame if any block is sparse:

  • Should dense blocks be converted to sparse?
  • Or keep dense blocks as it is (allow mixture of sparse blocks and dense blocks)?
    • If so, slicing dense block(s) should be normal Series / DataFrame, rather than SparseSeries / SparseDataFrame?
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 23, 2016

Contributor

In an ideal world we wouldn't even have SparseDataFrame, just `SparseSeries``. but since we do I wouldn't coerce anything, e.g. leave sparse as sparse and dense as dense. In general we DO preserve these kinds so things will tend to propogate anyhow.

Contributor

jreback commented Apr 23, 2016

In an ideal world we wouldn't even have SparseDataFrame, just `SparseSeries``. but since we do I wouldn't coerce anything, e.g. leave sparse as sparse and dense as dense. In general we DO preserve these kinds so things will tend to propogate anyhow.

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Apr 24, 2016

Member

we wouldn't even have SparseDataFrame

Indeed...

Fixed the return type logic and now green.

Member

sinhrks commented Apr 24, 2016

we wouldn't even have SparseDataFrame

Indeed...

Fixed the return type logic and now green.

@jreback jreback closed this in 5ae1bd8 Apr 25, 2016

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Apr 25, 2016

Contributor

thanks!

Contributor

jreback commented Apr 25, 2016

thanks!

@sinhrks sinhrks deleted the sinhrks:sparse_dfconcat branch Apr 25, 2016

@kawochen kawochen referenced this pull request Apr 25, 2016

Open

BUG: Sparse master issue #10627

11 of 18 tasks complete

nps added a commit to nps/pandas that referenced this pull request May 17, 2016

BUG: Sparse concat may fill fill_value with NaN
closes pandas-dev#9765

Author: sinhrks <sinhrks@gmail.com>

Closes pandas-dev#12966 from sinhrks/sparse_dfconcat and squashes the following commits:

4873e06 [sinhrks] BUG: Sparse concat may fill fill_value with NaN
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment