Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fillna with method segfaults on zero-length input (fixes #2775) #2778

Closed

Conversation

@stephenwlin
Copy link
Contributor

commented Jan 30, 2013

fixes #2775

just added a check for zero-length data to the backfill and pad templates

not sure if I should add test coverage? the problem is that the tests will not fail without the fix, but rather segfault, so it might not be a good idea

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2013

fyi....this is going to conflict with dtypes branch....maybe do this afterwards? #2708

@stephenwlin

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2013

do you want to just fix it on your branch instead and we can close this PR? it's just some checks for N == 0

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2013

sure...just pad & backfill (all methods)....do we have a test to replicate?

@stephenwlin

This comment has been minimized.

Copy link
Contributor Author

commented Jan 30, 2013

just run fillna on any DataFrame or Series with zero rows, using a method rather than a fill value...
example from original issue was pandas.DataFrame(columns=["x"]).x.fillna(method="pad", inplace=1)

@ghost

This comment has been minimized.

Copy link

commented Jan 30, 2013

I don't think inplace is needed to trigger, and I was unable to replicate with Series().fillna(method="pad", inplace=1)

Not sure why pandas.DataFrame(columns=["x"]).x and series() behave differently, there may be a deeper issue
at play.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2013

it triggered for me with and w/o inplace....adding as a test

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2013

ok...fixed in #2708

@stephenwlin stephenwlin deleted the stephenwlin:fillna-segfault-fix branch Jan 31, 2013

@stephenwlin

This comment has been minimized.

Copy link
Contributor Author

commented Jan 31, 2013

fyi, apparently it only segfaults when dtype=='object', probably because that's the only case in which the value is dereferenced as a pointer. for non-object dtypes, the bug is still there, but it doesn't cause a segfault because the memory is just being read and interpreted as an integer/float

In [2]: p.Series().dtype
Out[2]: dtype('float64')

In [3]: p.Series().fillna(method='pad')
Out[3]: []

In [4]: p.DataFrame(columns=['x'])['x'].dtype
Out[4]: dtype('object')

In [5]: p.DataFrame(columns=['x'])['x'].fillna(method='pad')
Segmentation fault (core dumped)
In [2]: p.Series().astype('int64').fillna(method='pad')
Out[2]: []

In [3]: p.Series().astype('object').fillna(method='pad')
Segmentation fault (core dumped)
@wesm

This comment has been minimized.

Copy link
Member

commented Jan 31, 2013

Should this be patched in a v0.10.2 release? I think that might happen before #2708 is merged into v0.11 dev branch

@stephenwlin stephenwlin restored the stephenwlin:fillna-segfault-fix branch Jan 31, 2013

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2013

up 2 you....it actually hard to trigger this (though fix is pretty trivial too)...

@stephenwlin

This comment has been minimized.

Copy link
Contributor Author

commented Jan 31, 2013

@wesm, i restored the branch and commited the same test as in #2708, in case you want to merge this first. there will probably be conflicts later if you do, but they'll be easy to resolve

@stephenwlin stephenwlin reopened this Jan 31, 2013

jreback added a commit to jreback/pandas that referenced this pull request Feb 8, 2013
ENH: allow propgation and coexistance of numeric dtypes (closes GH pa…
…ndas-dev#622)

     construction of multi numeric dtypes with other types in a dict
     validated get_numeric_data returns correct dtypes
     added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame
     added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns
     fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger)
     changed implementation of get_dtype_counts() to use .blocks
     revised DataFrame.convert_objects to use blocks to be more efficient
     added Dtype printing to show on default with a Series
     added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns]
     where can upcast integer to float as needed (on inplace ops pandas-dev#2793)
     added fully cythonized support for int8/int16
     no support for float16 (it can exist, but no cython methods for it)

TST: fixed test in test_from_records_sequencelike (dict orders can be different on different arch!)
       NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!)
     test updates for merging (multi-dtypes)
     added tests for replace (but skipped for now, algos not set for float32/16)
     tests for astype and convert in internals
     fixes for test_excel on 32-bit
     fixed test_resample_median_bug_1688 I belive
     separated out test_from_records_dictlike
     testing of panel constructors (GH pandas-dev#797)
     where ops now have a full test suite
     allow slightly less sensitive decimal tests for less precise dtypes

BUG: fixed GH pandas-dev#2778, fillna on empty frame causes seg fault
     fixed bug in groupby where types were not being casted to original dtype
     respect the dtype of non-natural numeric (Decimal)
     don't upcast ints/bools to floats (if you say were agging on len, you can get an int)
DOC: added astype conversion examples to whatsnew and docs (dsintro)
     updated RELEASE notes
     whatsnew for 0.10.2
     added upcasting gotchas docs

CLN: updated convert_objects to be more consistent across frame/series
     moved most groupby functions out of algos.pyx to generated.pyx
     fully support cython functions for pad/bfill/take/diff/groupby for float32
     moved more block-like conversion loops from frame.py to internals.py (created apply method)
       (e.g. diff,fillna,where,shift,replace,interpolate,combining), to top-level methods in BlockManager
@stephenwlin

This comment has been minimized.

Copy link
Contributor Author

commented Feb 10, 2013

closed because of merge of #2708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.