BUG: Accept dict or Series in fillna for categorical Series #18293

reidy-p · 2017-11-14T21:57:39Z

closes BUG: Series.fillna() crashes on Categorical series if value is a series #17033
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2017-11-14T22:29:30Z

Hello @reidy-p! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 20, 2017 at 20:06 Hours UTC

jschendel · 2017-11-14T23:47:32Z

pandas/tests/test_categorical.py

+        # GH 17033
+        # Test fillna for a Categorical series
+        data = ['a', np.nan, 'b', np.nan, np.nan]
+        s = pd.Series(pd.Categorical(data, categories=['a', 'b']))


Series and Categorical have already been imported so you can remove the pd. throughout the tests you added. Instances of this in existing tests should already be covered by #18277.

Thanks, I'll change that.

codecov · 2017-11-15T00:28:11Z

Codecov Report

Merging #18293 into master will decrease coverage by 0.01%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master   #18293      +/-   ##
==========================================
- Coverage   91.36%   91.34%   -0.02%     
==========================================
  Files         164      164              
  Lines       49721    49729       +8     
==========================================
- Hits        45429    45427       -2     
- Misses       4292     4302      +10

Flag	Coverage Δ
#multiple	`89.14% <94.11%> (-0.01%)`	⬇️
#single	`39.6% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.73% <100%> (ø)`	⬆️
pandas/core/categorical.py	`95.66% <93.75%> (-0.09%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d04daf...c484f49. Read the comment docs.

jreback

some comments. whatsnew for 0.22 (you can put in other enhancements)

jreback · 2017-11-15T11:15:16Z

pandas/core/categorical.py


-            if not isna(value) and value not in self.categories:
-                raise ValueError("fill value must be in categories")
+            if isinstance(value, ABCSeries):


this could be a dict as well (which you can simply convert to a series)

A dict has already been converted to a Series in the fillna function in pandas/core/generic.py.

So, as far as I can tell, by the time we reach the fillna function in pandas/core/categorical.py the value is either:

A Series (if the user passed a Series or dict)

A scalar (if the user passed a scalar)

ok, can you indicate that in a comment

can you add a comment here? otherwise lgtm.

jreback · 2017-11-15T11:16:18Z

pandas/core/categorical.py

+                if not isna(value) and value not in self.categories:
+                    raise ValueError("fill value must be in categories")
+
+                mask = values == -1


you might be able to share code with the array case (basically convert the scalar to an array)

Yeah, I was thinking about that but couldn't get it to work properly. The problem I was having was that if the user passes a scalar all the NaNs will be filled with that single scalar. But if the user passes a Series (or a dict which is then converted to a Series in the function) the NaNs will be filled with different values according to the index values:

In [1]: data = ['a', np.nan, 'b', np.nan, np.nan] In [2]: s = pd.Series(pd.Categorical(data, categories=['a', 'b'])) In [3]: s.fillna('a') Out[3]: 0 a 1 a 2 b 3 a 4 a dtype: category Categories (2, object): [a, b] In [4]: s.fillna(pd.Series(['a', 'b', 'b'], index=[1, 3, 4]) Out[4]: 0 a 1 a 2 b 3 b 4 b dtype: category Categories (2, object): [a, b]

And I was finding it difficult to deal with both cases with the same code. But I can keep thinking about it.

jreback · 2017-11-15T11:18:12Z

pandas/tests/test_categorical.py

+    def test_fillna_series_categorical_errormsg(self):
+        data = ['a', np.nan, 'b', np.nan, np.nan]
+        s = pd.Series(pd.Categorical(data, categories=['a', 'b']))
+


these tests can all go in pandas/tests/series/test_missing.py. also pls move any related fillna tests that are directly on Series/DataFrame as well (to tests/dataframe/test_missing.py), if they are only on Categorical itself then leave here.

Moved the new tests relating to Series to pandas/tests/series/test_missing.py and moved the existing test_fillna from pandas/tests/test_categorical.py to pandas/tests/frame/test_missing.py

jreback · 2017-11-16T00:08:43Z

doc/source/whatsnew/v0.22.0.txt

 - Better support for :func:`Dataframe.style.to_excel` output with the ``xlsxwriter`` engine. (:issue:`16149`)
 - :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`)
-
+- :func:`Series.fillna` now accepts a Series or a dict as a ``value`` (:issue:`17033`)


for a categorical dtype

jreback · 2017-11-16T00:10:41Z

pandas/core/categorical.py

+                mask = values == -1
+                if mask.any():
+                    values = values.copy()
+                    if isna(value):


you can simplify this to

values[mask] = self.categories.get_indexer([value])[0]

get_indexer seems to cause problems with PeriodIndex. When I replace the code with your suggestion:

In [1]: idx = pd.PeriodIndex(['2011-01', '2011-01', pd.NaT], freq='M') In [2]: df = pd.DataFrame({'a': pd.Categorical(idx)}) In [3]: df.fillna(value=pd.NaT)) Out[3]: pandas._libs.period.IncompatibleFrequency: Input has different freq=None from PeriodIndex(freq=M)

hmm, this my be a known bug

jreback · 2017-11-16T00:11:03Z

pandas/tests/frame/test_missing.py

        assert_frame_equal(df.fillna(method='bfill'), exp)

+    def test_na_actions(self):
+


can you add the issue number as a comment

jreback · 2017-11-16T00:12:11Z

pandas/tests/frame/test_missing.py

+
+        def f():
+            df.fillna(value={"cats": 4, "vals": "c"})
+


this test is way too long. can you parametrize it? would make it simpler. could also break it into a couple of tests, just use descriptive names.

haha, realized you are just moving this test! ok see what you can do here.

jreback · 2017-11-19T16:25:41Z

pandas/core/categorical.py

+
+            # If value is not a dict or Series it should be a scalar
+            else:
+                if not isna(value) and value not in self.categories:


you need to test if its a is_scalar before you use isna, because if its not then this will raise a different error (add a test for that as well). (you can make this an elif which is_scalar is True and raise in an else

Thanks, that's a good idea.

jreback · 2017-11-19T16:26:36Z

pls rebase as well (generally anytime you are pushing you should)

jreback · 2017-11-19T18:57:47Z

pandas/core/categorical.py

-                else:
-                    values[mask] = self.categories.get_loc(value)
+            else:
+                raise TypeError('"value" parameter must be a scalar, dict '


do we have testing to hit this?

The problem is that I can't think of an example where we would actually hit this specific line. If the user passed a list, tuple, or DataFrame as a value in fillna this will be caught by fillna in generic.py rather than in categorical.py with an identical error message to the one here. So in some ways this exception is kind of redundant because all of the work should be done in generic.py. But it might be useful to keep it anyway.

I have included tests in test_fillna_categorical_raise() in tests/series/test_missing.py to check for the TypeError when the user passes a list, tuple, or DataFrame fill value. But the tests aren't parametrized.

jreback · 2017-11-19T18:58:01Z

pandas/core/generic.py

-                    raise ValueError("invalid fill value with a %s" %
-                                     type(value))
+                    raise TypeError('"value" parameter must be a scalar, dict '
+                                    'or Series, but you passed a '


ideally we would have a parametriezed test that hits this (with multiple invalid things that should raise)

Yeah, as I said above, I have included tests in test_fillna_categorical_raise() in tests/series/test_missing.py to check for the TypeError when the user passes a list, tuple, or DataFrame fill value. But the tests aren't parametrized.

jreback · 2017-11-19T18:59:17Z

doc/source/whatsnew/v0.22.0.txt

 - :func:`pandas.tseries.frequencies.to_offset` now accepts leading '+' signs e.g. '+1h'. (:issue:`18171`)
 - :class:`pandas.io.formats.style.Styler` now has method ``hide_index()`` to determine whether the index will be rendered in ouptut (:issue:`14194`)
 - :class:`pandas.io.formats.style.Styler` now has method ``hide_columns()`` to determine whether columns will be hidden in output (:issue:`14194`)
+- :func:`Series.fillna` now accepts a Series or a dict as a ``value`` (:issue:`17033`)


can you add another note in api breaking changes. note that the previous exception was a ValueError, now its a TypeError (good change). use this PR number for that note.

jreback

minor change, rebase

jreback · 2017-11-20T11:36:18Z

doc/source/whatsnew/v0.22.0.txt

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-
+- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (`PR18293 <https://github.com/pandas-dev/pandas/pull/18293>`__)


just use the PR number and use issue (it works0

jreback · 2017-11-22T02:34:00Z

thanks @reidy-p

jschendel reviewed Nov 14, 2017

View reviewed changes

jreback requested changes Nov 15, 2017

View reviewed changes

jreback added Categorical Categorical Data Type Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Nov 15, 2017

reidy-p force-pushed the fillna_cat_series branch from 083efef to b1ac2d4 Compare November 15, 2017 23:21

jreback requested changes Nov 16, 2017

View reviewed changes

jreback requested changes Nov 19, 2017

View reviewed changes

reidy-p force-pushed the fillna_cat_series branch from 359e36a to 0e2e8bc Compare November 19, 2017 18:07

jreback requested changes Nov 19, 2017

View reviewed changes

jreback reviewed Nov 19, 2017

View reviewed changes

reidy-p force-pushed the fillna_cat_series branch from eb98c94 to 4d2997b Compare November 19, 2017 23:17

jreback requested changes Nov 20, 2017

View reviewed changes

reidy-p added 11 commits November 20, 2017 20:01

BUG: Accept dict or Series in fillna for categorical Series

46de7e9

Fix problems with new tests

3024118

pep8 issue

2ef5444

move tests and add whatsnew

5780c43

fix test_categorical.py

2f4be6d

cleanup existing tests in frame/test_missing.py

a69e696

adding comments and fix docstring

572d246

lint issue

2dd2d4b

add is_scalar check and improve error msg

8f8f316

whatsnew update

6ffec6c

whatsnew typo

c484f49

reidy-p force-pushed the fillna_cat_series branch from 9638c88 to c484f49 Compare November 20, 2017 20:06

jreback added this to the 0.22.0 milestone Nov 22, 2017

jreback approved these changes Nov 22, 2017

View reviewed changes

jreback merged commit 103ea6f into pandas-dev:master Nov 22, 2017

reidy-p deleted the fillna_cat_series branch November 27, 2017 14:50

		assert_frame_equal(df.fillna(method='bfill'), exp)

		def test_na_actions(self):

Uh oh!

BUG: Accept dict or Series in fillna for categorical Series #18293

BUG: Accept dict or Series in fillna for categorical Series #18293

Uh oh!

Conversation

reidy-p commented Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Nov 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on November 20, 2017 at 20:06 Hours UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reidy-p Nov 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Nov 19, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reidy-p Nov 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reidy-p Nov 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reidy-p commented Nov 14, 2017 •

edited

Loading

pep8speaks commented Nov 14, 2017 •

edited

Loading

codecov bot commented Nov 15, 2017 •

edited

Loading

reidy-p Nov 16, 2017 •

edited

Loading

reidy-p Nov 19, 2017 •

edited

Loading

reidy-p Nov 19, 2017 •

edited

Loading