Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: SEGFAULT when assigning nan via iloc #6056

Closed
aldanor opened this issue Jan 24, 2014 · 9 comments
Closed

BUG: SEGFAULT when assigning nan via iloc #6056

aldanor opened this issue Jan 24, 2014 · 9 comments

Comments

@aldanor
Copy link
Contributor

aldanor commented Jan 24, 2014

I have a DataFrame that looks like this:

>>> df.dtypes
ask                 float64
bbg                  object
bid                 float64
last                float64
quotetime            object
size                float64
snap                 object
snaptime            float64
tradedate    datetime64[ns]
tradetime            object
sec_type             object
dtype: object

>>> df.shape
(164618, 11)

df.sec_type.head()
0    Equity
1    Equity
2    Equity
3    Equity
4    Equity
Name: sec_type, dtype: object

Now, this segfaults:

>>> df.sec_type.iloc[0] = np.nan
fish: Job 1, 'ipython --pdb' terminated by signal SIGSEGV (Address boundary error)

This also segfaults, but not right away:

>>> df.sec_type.iloc[0] = None
>>> df.iloc[:10]
fish: Job 1, 'ipython --pdb' terminated by signal SIGSEGV (Address boundary error)

And this:

>>> df.sec_type.ix[0] = np.nan
>>> df.sec_type.head()
fish: Job 1, 'ipython --pdb' terminated by signal SIGSEGV (Address boundary error)

However this doesn't:

s = df.sec_type.copy()
s.iloc[0] = np.nan

I tried recreating the error with a minimal case but couldn't since small trivial examples don't seem to segfault; debugging is also kinda hard cause it goes deep into pandas guts..

This happened right after I updated to 0.13, never happened before.

Any ideas?

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy

assign with a multi-axis using loc or ix

df.ix[5,'foo'] = np.nan

using a chained assignment can cause aliasing issue under certain versions of numpy so pandas will handle this on 0.13.1

however you should always assign via loc or ix to avoid any issues (see the above link)

@jreback jreback closed this as completed Jan 24, 2014
@aldanor
Copy link
Contributor Author

aldanor commented Jan 24, 2014

@jreback Thanks for explanation. However, it's not just the iloc, this also fails:

>>> df.sec_type.ix[0] = np.nan
>>> df.sec_type.head()

There are no errors, no warnings, the first line passes, yet then it segfaults on head().

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

it's the same problem -

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

it's a view getting corrupted

don't assign that way - it's specifically warned not to do it (not actually for this reason) but because the results can be unpredictable depending on whether u have a view or not (eg it may not change because it's actually a copy)

@aldanor
Copy link
Contributor Author

aldanor commented Jan 24, 2014

Alright then, what's a direct (correct) replacement for doing something like df['col'].iloc[0] = value since both .loc and .ix are not positional? I have quite a lot of code with .iloc assignments which never failed until now so I wonder what would be the best way to fix it.

Btw also: in the above segfault case, why is there no warning/error thrown (SettingWIthCopyException?) if it's known to get corrupted?

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

ix will work it handles both positional and labels

the warning is very tricky to have happen and is not guaranteed - this heuristic has a gazillion cases - most of which you don't want to warn and a very small subset which u do

want to try to figure it out?????

the issue is a python syntax issue

doing a repeated getattr then a getitem is impossible to detect

if u do

df['calumn''].iloc[0] will warn iIIRC

@aldanor
Copy link
Contributor Author

aldanor commented Jan 24, 2014

ix will work it handles both positional and labels

So, let's say if my labels are integers and then using .ix introduces another ambiguity -> one would have to do something like df.loc[df.index[0], 'column'] = value now?..

Btw just tested, both df['column'].iloc[0] = np.nan and df.column.iloc[0] = np.nan give no warning (and in my cases both silently segfault). Yea I understand it's hard to track all the getattrs/getitems unless you maintain some sort of a call/index stack but that's sure an overkill to just fire warnings.

I think it was discussed in another issue, so that stuff like df[df.a > 1]['a'].iloc[0] = 'b' gives a warning (and it does), and stuff like df.a.iloc[0] = 'b' doesn't on purpose since it's "safe"? (at least that's the way it works now, just checked)

@jreback
Copy link
Contributor

jreback commented Jan 24, 2014

ok...this is the same bug as #6026

here's some short-term work arounds:

  • this is fixed in 0.13.1 (releasing shortly)
  • numpy 1.8 fixes it too
  • .ix as I said (and if you do have the problem with ix that caused us to use loc so it DOESN't iterpret indices as positional then this will not work and you will have to do as you show above)

you won't be gettng a warning on df.a.iloc[0] = 'b' bcause this is NOT setting a copy, (and does work, but its a bug under numpy < 1.8). the reason df[df.a>1]['a'].iloc[0] = 'b' gives a warning is because the first indexing causes a 'silent' copy (as its a take).

thanks for reporting though!

@aldanor
Copy link
Contributor Author

aldanor commented Jan 24, 2014

You're welcome! Fair enough, that kinda explains it; will upgrade to numpy 1.8.0 and look forward to pandas 0.13.1.

UPD: Confirming, no warning/no segfault in all cases on numpy 1.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants