BUG: SparseSeries init from dict fixes #16906

kernc · 2017-07-13T13:01:18Z

closes BUG: SparseSeries from dict inconsistency #16905
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff (On Windows, git diff upstream/master -u -- "*.py" | flake8 --diff might work as an alternative.)
whatsnew entry

kernc · 2017-07-13T13:01:29Z

Tests copied/adapted from tests.series.test_constructors.

codecov · 2017-07-13T13:30:44Z

Codecov Report

Merging #16906 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #16906      +/-   ##
==========================================
- Coverage   90.99%   90.97%   -0.02%     
==========================================
  Files         161      161              
  Lines       49293    49292       -1     
==========================================
- Hits        44854    44844      -10     
- Misses       4439     4448       +9

Flag	Coverage Δ
#multiple	`88.74% <100%> (-0.01%)`	⬇️
#single	`40.19% <100%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/sparse/series.py	`95.06% <100%> (-0.02%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.76% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 692b5ee...659559c. Read the comment docs.

jreback · 2017-07-13T14:01:49Z

pandas/tests/sparse/test_series.py

+
+    data = A(('col%s' % i, np.random.random()) for i in range(12))
+    s = SparseSeries(data)
+    tm.assert_numpy_array_equal(s.values.values, np.array(list(data.values())))


can you use assert_sp_series_equal (you can pass check_list=False) and then add an assert about the column ordering

jreback · 2017-07-13T14:03:04Z

pandas/tests/sparse/test_series.py

+        return dict(zip((constructor(x) for x in dates_as_str), values))
+
+    data_datetime64 = create_data(np.datetime64)
+    data_datetime = create_data(lambda x: datetime.strptime(x, '%Y-%m-%d'))


can you parameterize this test

jreback · 2017-07-13T14:03:18Z

pandas/tests/sparse/test_series.py

+    expected = SparseSeries([x[1] for x in _d],
+                            index=pd.Index([x[0] for x in _d],
+                                           tupleize_cols=False))
+    ser = ser.reindex(index=expected.index)


use result=

jreback · 2017-07-13T14:03:50Z

doc/source/whatsnew/v0.21.0.txt

@@ -179,6 +179,7 @@ Sparse
 ^^^^^^


+- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index`` (:issue:`16905`)


index= kwarg

jreback · 2017-07-13T17:43:19Z

pandas/tests/sparse/test_series.py

+
+
+def test_constructor_dict():
+    d = {'a': 0., 'b': 1., 'c': 2.}


you might be able to move some of these into from pandas.tests.series.test_api import SharedWithSparse whech we already import (rather than directly copying them).

gfyoung · 2017-07-15T04:21:23Z

doc/source/whatsnew/v0.21.0.txt

@@ -179,6 +179,7 @@ Sparse
 ^^^^^^


+- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index=`` kwarg (:issue:`16905`)


If it doesn't matter whether index is passed in, why mention it in the description?

It's a reference to the two issues fixed. With, the result was invalid; without, it crashed.

Fair enough, though ultimately instantiating from dict just didn't work at all though, which could be considered a single bug (also you're only referencing one issue here). Note that without further context, people won't be aware of that difference (whether it was incorrect or crashed), so it is preferable to be concise.

jreback · 2017-07-16T01:03:59Z

@kernc lmk if you are able to share some test code with Series, can always do a followup.

pep8speaks · 2017-07-16T16:53:08Z

Hello @kernc! Thanks for updating the PR.

In the file pandas/tests/series/test_api.py, following are the PEP8 issues :

Line 128:80: E501 line too long (83 > 79 characters)
Line 166:32: E128 continuation line under-indented for visual indent

Comment last updated on July 17, 2017 at 21:25 Hours UTC

kernc · 2017-07-16T16:56:40Z

Was able to share some test code with Series with as little as possible effort. Not sure if OK, though.

jreback · 2017-07-16T17:03:07Z

pandas/tests/series/test_api.py

+
+        result = self.Series(d, index=['b', 'c', 'd', 'a'])
+        expected = self.Series([1, 2, np.nan, 0], index=['b', 'c', 'd', 'a'])
+        tm.assert_series_equal(result, expected)


This doesn't check sparseness. I might modify assert_series_equal to dispatch to assert_sp_series_equal if both are SparseSeries.

In light of my comment below, I'd rather that we keep sparse equality and Series equality checks separate. Perhaps if we could write a function like:

def _check_series_equal(self, left, right): ...

that dispatches to tm.assert_series_equal OR tm.assert_sp_series_equal depending on test class. It would seem a little clearer implementation-wise.

gfyoung · 2017-07-16T17:22:09Z

Was able to share some test code with Series with as little as possible effort. Not sure if OK, though.

I'm a little hesitant about this code-sharing because the readability decreased IMO. self.Series for me a just a little harder to understand.

kernc · 2017-07-17T10:40:07Z

How else would you share code without self.Series? And if you'd rather not share code, how else would you ensure the two is-a-Series types get roughly the same amount of use case and API coverage?

jreback · 2017-07-17T11:06:57Z

@kernc this looks fine. ping on green.

that was failing due to introduced dispatch to assert_sp_series_equal being too strict.

gfyoung · 2017-07-17T15:08:36Z

How else would you share code without self.Series?

I personally find the name a little confusing because I only think the Series class and not SparseSeries, even though the latter is a subclass of the former. 😄

A name like self.series_klass might have been easier because it doesn't have that confusion of naming.

gfyoung · 2017-07-17T21:42:02Z

pandas/tests/series/test_api.py

+        d = {'a': 0., 'b': 1., 'c': 2.}
+        result = self.series_klass(d)
+        expected = self.series_klass(d, index=sorted(d.keys()))
+        tm.assert_series_equal(result, expected)


I think GitHub hid away my comment about this, but I think we should not interlace assert_series_equal and assert_sp_series_equal. It reduces modularity, and "assert_series_equal" is confusing for SparseSeries I find.

I propose that we do the following and define this method:

def assert_series_klass_equal(result, expected): klass_name = self.series_klass.__name__ if klass_name == "Series": tm.assert_series_equal(result, expected) elif klass_name == "SparseSeries": tm.assert_sp_series_equal(result, expected) else: raise ValueError("Invalid 'series_klass' : {name}".format(name=klass_name))

That way you also don't need to modify assert_series_equal. You can then call this method without worrying what type of Series you are comparing.

jreback · 2017-07-18T23:43:16Z

@kernc would you be ok with merging #16960, close this, then you can refactor tests in a new PR?

kernc · 2017-07-19T09:28:45Z

Of course.

kernc · 2017-07-21T23:08:22Z

Continued in #17050.

BUG: SparseSeries init from dict fixes

8b5305b

jreback added Bug Sparse Sparse Data Type labels Jul 13, 2017

jreback requested changes Jul 13, 2017

View reviewed changes

fixup! BUG: SparseSeries init from dict fixes

14f7047

jreback added this to the 0.21.0 milestone Jul 13, 2017

jreback reviewed Jul 13, 2017

View reviewed changes

gfyoung reviewed Jul 15, 2017

View reviewed changes

Move several tests to SharedWithSparse

991b99a

kernc force-pushed the sparse-series-fromdict branch from 5dcec57 to 991b99a Compare July 16, 2017 16:54

update whatsnew

7af0dae

jreback reviewed Jul 16, 2017

View reviewed changes

assert_series_equal dispatch to sp_series_equal if both are sparse

195550c

jreback approved these changes Jul 17, 2017

View reviewed changes

Fix a failing test ...

e7405bf

that was failing due to introduced dispatch to assert_sp_series_equal being too strict.

kernc added 2 commits July 17, 2017 23:23

fixup! Fix a failing test ...

bff326a

self.Series -> self.series_klass

659559c

gfyoung reviewed Jul 17, 2017

View reviewed changes

jreback mentioned this pull request Jul 18, 2017

Fixes SparseSeries initiated with dictionary raising AttributeError #16960

Merged

3 tasks

kernc closed this Jul 19, 2017

gfyoung modified the milestones: No action, 0.21.0 Jul 19, 2017

gfyoung added the Duplicate Report Duplicate issue or pull request label Jul 19, 2017

kernc mentioned this pull request Jul 21, 2017

TST: Move some Series ctor tests to SharedWithSparse #17050

Merged

4 tasks

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: SparseSeries init from dict fixes #16906

BUG: SparseSeries init from dict fixes #16906

kernc commented Jul 13, 2017

kernc commented Jul 13, 2017

codecov bot commented Jul 13, 2017 •

edited

Loading

jreback Jul 13, 2017

jreback Jul 13, 2017

jreback Jul 13, 2017

jreback Jul 13, 2017

jreback Jul 13, 2017

gfyoung Jul 15, 2017

kernc Jul 15, 2017

gfyoung Jul 15, 2017 •

edited

Loading

jreback commented Jul 16, 2017

pep8speaks commented Jul 16, 2017 •

edited

Loading

kernc commented Jul 16, 2017

jreback Jul 16, 2017

gfyoung Jul 16, 2017 •

edited

Loading

gfyoung commented Jul 16, 2017

kernc commented Jul 17, 2017

jreback commented Jul 17, 2017

gfyoung commented Jul 17, 2017

gfyoung Jul 17, 2017 •

edited

Loading

jreback commented Jul 18, 2017

kernc commented Jul 19, 2017

kernc commented Jul 21, 2017

		@@ -179,6 +179,7 @@ Sparse
		^^^^^^


		- Bug in instantiating :class:`SparseSeries` from ``dict`` with or without ``index`` (:issue:`16905`)

BUG: SparseSeries init from dict fixes #16906

BUG: SparseSeries init from dict fixes #16906

Conversation

kernc commented Jul 13, 2017

kernc commented Jul 13, 2017

codecov bot commented Jul 13, 2017 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gfyoung Jul 15, 2017 • edited Loading

Choose a reason for hiding this comment

jreback commented Jul 16, 2017

pep8speaks commented Jul 16, 2017 • edited Loading

Comment last updated on July 17, 2017 at 21:25 Hours UTC

kernc commented Jul 16, 2017

Choose a reason for hiding this comment

gfyoung Jul 16, 2017 • edited Loading

Choose a reason for hiding this comment

gfyoung commented Jul 16, 2017

kernc commented Jul 17, 2017

jreback commented Jul 17, 2017

gfyoung commented Jul 17, 2017

gfyoung Jul 17, 2017 • edited Loading

Choose a reason for hiding this comment

jreback commented Jul 18, 2017

kernc commented Jul 19, 2017

kernc commented Jul 21, 2017

codecov bot commented Jul 13, 2017 •

edited

Loading

gfyoung Jul 15, 2017 •

edited

Loading

pep8speaks commented Jul 16, 2017 •

edited

Loading

gfyoung Jul 16, 2017 •

edited

Loading

gfyoung Jul 17, 2017 •

edited

Loading