Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fillna not throwing error when limit is not positive #27042

Closed
williamma12 opened this issue Jun 25, 2019 · 5 comments

Comments

@williamma12
Copy link

commented Jun 25, 2019

Code Sample

import pandas as pd
import numpy as np
# Your code here
NCOLS = 3
NROWS = 50
data = {"col{}".format(int((i - NCOLS / 2) % NCOLS + 1)): np.random.randint(
            -100, 100, size=(NROWS)
        )
        for i in range(NCOLS)
}
df = pd.dataframe(data)
df.fillna(0, limit=-5)

Problem description

I am currently on pandas 0.24.2 and running the above code does not throw an error even though the limit is less than or equal to zero. This error does not arise when the data is made up of floats instead of ints (replace np.random.randint with np.random.uniform).

Expected Output

ValueError: Limit must be greater than 0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.3.0
pip: 19.0.3
setuptools: 40.8.0
Cython: 0.29.7
numpy: 1.15.0
scipy: 1.2.1
pyarrow: 0.14.0.RAY
xarray: 0.12.0
IPython: 7.3.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.2.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 25, 2019

Thanks for the report. The validation should probably occur somewhere around

axis = self._get_axis_number(axis)
.

@Inevitable-Marzipan

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

I've had a bit of a look and it appears that when randint is used, the fillna method in pandas/core/internals/blocks.py has self._can_hold_na set to false and returns before running the checks on limit, hence not raising the errors. Not entirely sure how _can_hold_na is set but it may have something to do with whether the DataFrame contains any NAs or not, as when I set one element in the above example to np.nan the error is thrown correctly.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

@Inevitable-Marzipan

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

I've drilled into it a little bit more and it appears that _can_hold_na is set at the class level and is set to True for FloatBlock and False for IntBlock which is causing the difference in behaviour. I propose a change to the function:

def validate_fillna_kwargs(value, method, validate_scalar_dict_value=True):

such that it takes an additional arugment limit, and performs the necessary checks in there. These checks would be those that are already implemented at
if not is_integer(limit):

Happy to try and implement myself, just want to make sure this make sense?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Jun 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.