BUG SparseDataFrame with dense Series (#19374) #19377

datapythonista · 2018-01-24T18:04:21Z

closes SparseDataFrame with dense Series or unknown type #19374
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

gfyoung · 2018-01-24T20:03:58Z

pandas/tests/sparse/frame/test_frame.py

+    def test_constructor_from_unknown_type(self):
+        class Unknown:
+            pass
+        pytest.raises(TypeError, SparseDataFrame, Unknown())


Let's check error message for all of your pytest.raises calls.

gfyoung · 2018-01-24T20:04:09Z

pandas/tests/sparse/frame/test_frame.py

@@ -199,6 +199,31 @@ def test_constructor_from_series(self):
        # without sparse value raises error
        # df2 = SparseDataFrame([x2_sparse, y])

+    def test_constructor_from_dense_series(self):


Reference issue number under all of your added tests.

gfyoung · 2018-01-24T20:04:54Z

@datapythonista : Looks pretty good so far. Don't forget to add a whatsnew entry.

jreback

pls add a whatsnew note as well.

jreback · 2018-01-25T01:03:19Z

pandas/tests/sparse/frame/test_frame.py

+        x = Series(np.random.randn(10000), name='a')
+        assert isinstance(x, Series)
+        df = SparseDataFrame(x)
+        assert isinstance(df, SparseDataFrame)


construct an expected SDF and compare

construct a DataFrame and use .to_sparse() to construct an expected frame. we do not want to do all of these little checks, we already have well established comparison functions, e.g. tm.assert_sparse_equal for this

jreback · 2018-01-25T01:03:31Z

pandas/tests/sparse/frame/test_frame.py

+        assert isinstance(df, SparseDataFrame)
+        assert df.columns == ['b']
+
+        # No column name available


same for the rest

jreback · 2018-01-25T01:04:39Z

pandas/core/sparse/frame.py

@@ -95,6 +95,13 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None,
                                 dtype=dtype, copy=copy)
        elif isinstance(data, DataFrame):
            mgr = self._init_dict(data, data.index, data.columns, dtype=dtype)
+        elif isinstance(data, Series):
+            if columns is None and data.name is None:


need a test to hit this case

datapythonista · 2018-01-25T11:09:16Z

Thanks a lot for the comments. Sorry about the whatsnew, I added it when opening the PR, but forgot to add it to the commit.

@jreback, I didn't find a way to construct the expected SparseDataFrame without using the constructor that is being tested itself. But I added the comparison of the length and sum of the original series and the new SparseDataFrame, which I think should be enough (and it helped find a bug in my previous code).

Addressed all the other comments, let me know if you see anything else. Thanks!

jreback · 2018-01-25T12:04:01Z

pandas/core/sparse/frame.py

+            elif len(columns) != 1:
+                raise ValueError('columns must be of length one '
+                                 'if data is of type Series')
+            mgr = self._init_dict(data.to_frame(columns[0]),


don't construct like this, actually make dict

jreback · 2018-01-25T12:06:15Z

pandas/tests/sparse/frame/test_frame.py

+        x = Series(np.random.randn(10000), name='a')
+        assert isinstance(x, Series)
+        df = SparseDataFrame(x)
+        assert isinstance(df, SparseDataFrame)


construct a DataFrame and use .to_sparse() to construct an expected frame. we do not want to do all of these little checks, we already have well established comparison functions, e.g. tm.assert_sparse_equal for this

jreback · 2018-01-25T12:07:24Z

pandas/core/sparse/frame.py

+            if columns is None:
+                if data.name is None:
+                    raise ValueError('cannot pass a series '
+                                     'w/o a name or columns')


I don't think you need any of this, you can simply all to_manager with the column (in a list)

datapythonista · 2018-01-25T13:44:21Z

Thanks for the feedback @jreback, I misunderstood what the columns argument had to do, and I was overcomplicating the code.

Now it simply creates the DataFrame with the Series name if it has one, or it uses 0 as the column if it doesn't. This seems to be consistent with the Series constructor and with creating a SparseDataFrame from a DataFrame, which ignores unknown columns.

I'm not 100% sure I understood what you mean by "actually contruct dic", please let me know if this new version doesn't address this too.

jreback

small edits, other lgtm. ping on green.

jreback · 2018-01-26T12:29:51Z

pandas/tests/sparse/frame/test_frame.py

+        # GH 19393
+        # series with name
+        x = Series(np.random.randn(10000), name='a')
+        assert isinstance(x, Series)


you don't need this assert

jreback · 2018-01-26T12:30:02Z

pandas/tests/sparse/frame/test_frame.py

+        # series with name
+        x = Series(np.random.randn(10000), name='a')
+        assert isinstance(x, Series)
+        res = SparseDataFrame(x)


I prefer result and expected

jreback · 2018-01-26T12:30:15Z

pandas/tests/sparse/frame/test_frame.py

+        x = Series(np.random.randn(10000), name='a')
+        assert isinstance(x, Series)
+        res = SparseDataFrame(x)
+        assert isinstance(res, SparseDataFrame)


you don't need the assert

jreback · 2018-01-26T12:30:29Z

pandas/tests/sparse/frame/test_frame.py

+        # series with no name
+        x = Series(np.random.randn(10000))
+        assert isinstance(x, Series)
+        res = SparseDataFrame(x)


same w.r.t naming & asserts

…r, and providing useful error messages for other types (#19374)

datapythonista · 2018-01-26T16:57:15Z

Thanks once again for the comments @jreback, seems that I followed the conventions of an old test. Addressed your comments, should be all right now.

jreback · 2018-01-27T01:10:47Z

thanks @datapythonista

gfyoung added Bug Sparse Sparse Data Type labels Jan 24, 2018

gfyoung reviewed Jan 24, 2018

View reviewed changes

jreback requested changes Jan 25, 2018

View reviewed changes

jreback requested changes Jan 26, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Jan 26, 2018

BUG adding support for dense Series in the SparseDataFrame constructo…

b47d5fd

…r, and providing useful error messages for other types (#19374)

jreback approved these changes Jan 27, 2018

View reviewed changes

jreback merged commit eec7f57 into pandas-dev:master Jan 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG SparseDataFrame with dense Series (#19374) #19377

BUG SparseDataFrame with dense Series (#19374) #19377

datapythonista commented Jan 24, 2018 •

edited

gfyoung Jan 24, 2018 •

edited

gfyoung Jan 24, 2018

gfyoung commented Jan 24, 2018

jreback left a comment

jreback Jan 25, 2018

jreback Jan 25, 2018

jreback Jan 25, 2018

jreback Jan 25, 2018

datapythonista commented Jan 25, 2018

jreback Jan 25, 2018

jreback Jan 25, 2018

jreback Jan 25, 2018

datapythonista commented Jan 25, 2018

jreback left a comment

jreback Jan 26, 2018

jreback Jan 26, 2018

jreback Jan 26, 2018

jreback Jan 26, 2018

datapythonista commented Jan 26, 2018

jreback commented Jan 27, 2018

BUG SparseDataFrame with dense Series (#19374) #19377

BUG SparseDataFrame with dense Series (#19374) #19377

Conversation

datapythonista commented Jan 24, 2018 • edited

gfyoung Jan 24, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung commented Jan 24, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Jan 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Jan 25, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Jan 26, 2018

jreback commented Jan 27, 2018

datapythonista commented Jan 24, 2018 •

edited

gfyoung Jan 24, 2018 •

edited