Surprising behavior of DataFrame.replace #3582

Closed
cpcloud opened this Issue May 11, 2013 · 21 comments

Comments

Projects
None yet
3 participants
Member

cpcloud commented May 11, 2013

This seems a bit surprising:

>>> from string import ascii_lowercase as letters
>>> from pandas import DataFrame
>>> df = DataFrame(list(letters[:4]), columns=['a'])
>>> df
   a
0  a
1  b
2  c
3  d
>>> df.replace({'a': 'b'})
   a
0  a
1  a
2  c
3  d
>>> df.replace({'a': 'c'})
   a
0  a
1  b
2  b
3  d

Does this have to do with padding?

Contributor

jreback commented May 11, 2013

could be a bug

Member

cpcloud commented May 11, 2013

also no doc for the first parameter of this method, will submit a pr for that...

Contributor

jreback commented May 11, 2013

also if u have time:

I think we need clarification in the main docs on usage of when to use

filter, replace, select, update, lookup

I also don't think there is an example of filter anywhere

Member

cpcloud commented May 11, 2013

np, not sure if will be done 2day, trying to work out the regex replace on frames and just wanted to note this strangeness when i was looking at one of the simplest possible cases of replacement in an object block

Member

cpcloud commented May 11, 2013

@jreback A quick glance at indexing.rst shows docs for select and lookup (they could be fleshed out a bit), but nothing for the other 3. Will add the 3 and flesh the other 2 out.

Contributor

jreback commented May 12, 2013

@cpcloud I think replace is somewhere
a complete example

Member

cpcloud commented May 12, 2013

ah yes, grin '\.replace' doc shows it

Contributor

janschulz commented May 12, 2013

Wow, this was worrying, but fortunately df.a.replace({'a': 'b'}) and df.replace({"a":{"a":"b"}}) works as intended...

Member

cpcloud commented May 12, 2013

@jreback seems the magic is already in _interpolate, I will leave it out in replace though.

Member

cpcloud commented May 12, 2013

@jreback The "bug" here is that when you call

df.replace({'a': 'b'})

the column 'a' replaces the value at index 'b' with the previous non-null value because the fill method default is pad. This seems a bit weird. Can fix if u want. I would change to "search the whole frame for occurrences of 'a' and replace with 'b'.

Contributor

jreback commented May 12, 2013

you want this equivalent to

df.replace('a','b')

?

Member

cpcloud commented May 12, 2013

yes, is that ok?

Contributor

jreback commented May 12, 2013

let me take a look a little later

was always fuzzy on this anyhow

are there tests for this usage anyhow ?
do they break?

Member

cpcloud commented May 12, 2013

There are tests. However, the replace method doesn't actually do anything so they don't break. Here's a pared down example from the test_replace_interpolate method of TestDataFrame from pandas/tests/test_frame.py:

# get the test frame
from pandas.tests.test_frame import _tsframe as df
from numpy.testing import assert_array_equal
res = df.replace({'A': nan}, method='pad', axis=1)
assert_array_equal(df, res)  # replace == identity function here
Contributor

jreback commented May 13, 2013

I think it makes sense to not have automatic interpolation by default

so, if say df.replace({ 'a' : 'b' }) would be translated to df.replace('a','b'), IOW, only allow None for the value if to_replace is a dict/Series (right now this will interpolate)

then make _interpolate an explicit method interpolate. Even though this is an API change, I doubt this is expected behavor

@wesm, @y-p any thoughts?

@cpcloud why don't you put up the PR when you are ready (as this would be an easy change anyhow)

Member

cpcloud commented May 13, 2013

okay. kicking tires a bit right now (added 12 new tests so far); adding the regex functionality is proving to be quite a bit more involved than i thought, but i have a much better understanding (although not complete) of DataFrame internals.

Member

cpcloud commented May 14, 2013

@jreback @y-p Can I address this issue in the regex pr? I think it will create fewer merge conflicts if I just do this all at once.

Contributor

jreback commented May 14, 2013

yes, I would

Member

cpcloud commented May 14, 2013

just realized the axis parameter in replace never gets used if the call to _interpolate is nixed...remove the parameter?

Contributor

jreback commented May 17, 2013

@cploud close this too?

Member

cpcloud commented May 17, 2013

Not just yet. Need to submit 0.12 pr.

jreback closed this in #3675 May 22, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment