DEPR: set_value / get_value #15269

jreback · 2017-01-30T23:56:41Z

we already have too many public indexers.....

bluss · 2017-01-31T10:44:22Z

Is there a fast replacement for set_value? Using loc in its place seems to be slow.

In [328]: columns = list("abcdef")
In [329]: dx = 0.01; xs = np.arange(0, 1, step=dx);
In [330]: df = pd.DataFrame(index=xs)
In [331]: %%timeit
     ...: for x in xs:
     ...:   for c in columns:
     ...:     df.set_value(x, c, 1)
     ...: 
The slowest run took 8.40 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.24 ms per loop

In [333]: df = pd.DataFrame(index=xs)
In [334]: %%timeit
     ...: for x in xs:
     ...:   for c in columns:
     ...:     df.loc[x, c] = 1
     ...: 
10 loops, best of 3: 96.4 ms per loop

jorisvandenbossche · 2017-01-31T11:07:41Z

In your example, df.at[x, c] = 1 is also faster than loc

jorisvandenbossche · 2017-01-31T11:12:29Z

Sparked by #15268

Personally, I never use those functions, so would also not regret them being gone (and would welcome the cleaning up of namespace).

But, are there genuine cases where these methods can be useful? (compared to the other indexing methods) Why were they added in the first place?

On StackOverflow, it seems mainly mentioned for its speed (eg http://stackoverflow.com/questions/13842088/set-value-for-particular-cell-in-pandas-dataframe/24517695#24517695)

cc @pandas-dev/pandas-core

jreback · 2017-01-31T14:47:57Z

IIRC these were always there :> (e.g. way before .iloc/.loc even).

replacements are well-supported .iat/.at. (which internally do almost what these functions do, they should be performant, yet support all of the pandas types). These routines are 'raw' and don't do any validation / checking.

These are not idiomatic (e.g. calling functions to set/get values). And confusing to beginners.

wesm · 2017-01-31T19:47:02Z

Once upon a time, I spent a lot of time making get_value and set_value extremely fast. Over time the performance has degraded significantly.

In the course of rewriting the scalar value access code paths for pandas 2, things will get fast again, so I'm not sure how to proceed given that will occur at some point in the future

bluss · 2017-01-31T19:49:25Z

.at is indeed fast, but still slows down that loop example by 2-3x. .at is barely documented, too.

jreback · 2017-01-31T21:39:42Z

@bluss

you know you can just do this right:

In [7]: %timeit pd.DataFrame(1.0, index=xs, columns=list('abcde'))
10000 loops, best of 3: 165 µs per loop

bluss · 2017-01-31T21:49:56Z

Yes, the actual use case does not write just the same value, but a value specific to the current index and column. I've noticed that if I do it wrong, building the DataFrame overshadows all other computation, but with .set_value it is fine (and .at too, slower but not dominating).

dov · 2017-08-15T19:22:12Z

I was refered to this issue from #17256, where chris-b1 suggested that I use loc instead of set_value. I thought that I would share a small test that I made in building a dataframe through various methods. Indeed I switched from loc to set_value because of a performance problem with loc. Consider the following test program that shows the huge difference in efficiency:

import pandas as pd
import numpy as numpy
pd.options.display.float_format = '{:,.0f}'.format
import time

df = pd.DataFrame(numpy.random.rand(1000,100)*100)
df.loc[:,'A'] = None
df.loc[:,'B'] = None
df.loc[:,'C'] = None

t0 = time.time()
for idx,row in df.iterrows():
  row.loc[('A','B','C')] = (100+idx,200+idx,300+idx)
  df.loc[idx] = row
print 'First:  ', time.time()-t0

t0 = time.time()
for idx,row in df.iterrows():
  row.loc[('A','B','C')] = (100+idx,200+idx,300+idx)
print 'Second: ', time.time()-t0

t0 = time.time()
for idx,row in df.iterrows():
  df.set_value(idx,'A', 100+idx)
  df.set_value(idx,'B', 200+idx)
  df.set_value(idx,'C', 300+idx)
print 'Third:  ', time.time()-t0

t0 = time.time()
for idx,row in df.iterrows():
  df.loc[idx,'A'] = 100+idx
  df.loc[idx,'B'] = 200+idx
  df.loc[idx,'C'] = 300+idx
print 'Fourth: ', time.time()-t0

t0 = time.time()
for idx,row in df.iterrows():
  df.loc[idx,('A','B','C')] = (100+idx,200+idx,300+idx)
print 'Fifth:  ', time.time()-t0

On my home box this gives the output:

First:   13.1010141373
Second:  0.216350078583
Third:   0.0418322086334
Fourth:  0.731967926025
Fifth:   0.692142963409

Using set_value in the third example is more than 15 times faster than using loc in the forth and the fifth exmple! In a real world problem I reduced my run time from 54 to 6 seconds, just by doing these kind of changes. Needless to say, until these problems are solved, will I continue to use set_value.

jreback · 2017-08-15T19:55:50Z

@dov you should simply use .at and .iat. .set_value / .get_value are going to be deprecated. Iteratively setting values is an anti-pattern.

dov · 2017-08-16T06:46:10Z

Thanks @jreback, I missed the .at accessor. There are just too many different options! Adding the following to my above script:

t0 = time.time()
for idx,row in df.iterrows():
  df.at[idx,'A'] = 100+idx
  df.at[idx,'B'] = 200+idx
  df.at[idx,'C'] = 300+idx
print 'Sixth: ', time.time()-t0

AIdx = len(df.columns)-3
BIdx = len(df.columns)-2
CIdx = len(df.columns)-1

t0 = time.time()
for idx,row in df.iterrows():
  df.iat[idx,AIdx] = 100+idx
  df.iat[idx,BIdx] = 200+idx
  df.iat[idx,CIdx] = 300+idx
print 'Seventh: ', time.time()-t0

gives the additional output

Sixth:   0.0538790225983
Seventh: 0.0564727783203

I.e. it is "only" about 20% slower than the set_value() calls. I can live with that, but I still think it is strange to deprecate the fastest option.

Regarding the anti-pattern of the loop, I agree. But it can just be considered as a stress test run a thousand times. On the other hand I often do something non-pandas related (e.g. image processing) and you just want to store the result in an existing dataframe.

jreback · 2017-08-16T10:00:54Z

.ie. it is "only" about 20% slower than the set_value() calls. I can live with that, but I still think it is strange to deprecate the fastest option.

@dov well you can have correct, or slightly faster. I would always take correct.

(e.g. image processing)

this most certainly is an anti-pattern, there are vectorized methods in other libraries.

…l, SparseSeries, SparseDataFrame closes pandas-dev#15269

…l, SparseSeries, SparseDataFrame (#17739) closes #15269

…l, SparseSeries, SparseDataFrame (pandas-dev#17739) closes pandas-dev#15269

PointyShinyBurning · 2018-03-21T11:42:26Z

What's the right way to set a single value in a method chain given this deprecation?

There's got to be something faster/more readable than:

%timeit df.set_value(1, "a", 1).mean()
__main__:1: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
68.6 µs ± 1.11 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df.assign(a=lambda f: f.a.mask(f.a.index==1,1)).mean()
632 µs ± 2.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

jreback · 2018-03-21T12:10:15Z

the deprecate warning is pretty explicit

PointyShinyBurning · 2018-03-21T12:38:13Z

Yes, but .at[1, "a"] = 1 returns None, so I have to break the method chain and give the intermediate variable a name. I have a lot of code that's patterned like https://tomaugspurger.github.io/method-chaining.html which I thought was a style pandas aimed to support?

I can roll my own but still losing speed/readability quite a bit:

def set_value(df, index, col, val):
    new_df = df.copy()
    df.at[index, col] = val
    return new_df


%timeit df.pipe(set_value, 1, "a", 1).mean()
160 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

jreback · 2018-03-21T13:09:58Z

you are missing the point
setting a single item at a time is non performant and an anti-pattern

sure you can do it but in a
method chain doesn’t make any sense

PointyShinyBurning · 2018-03-21T13:26:17Z

I understand that it's an anti-pattern to construct a dataframe by doing it 2000 times. I admit I don't really understand why it's wrong to correct, say, the first value in my data because the instrument it comes from has a warm up period?

(there also isn't a good pattern to set on a slice in a method chain either, of course, but that's a separate issue from removing functionality that already exists)

LinuxIsCool · 2023-04-11T01:23:00Z

@PointyShinyBurning has a really good point and I think this issue should be re-opened.

jreback added Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves labels Jan 30, 2017

jreback added this to the 0.20.0 milestone Jan 30, 2017

This was referenced Feb 20, 2017

DOC: DataFrame.get_value #15455

Closed

BUG: assignment with .at or .loc modifies dataframe when it fails #15490

Open

jreback modified the milestones: 0.20.0, 0.21.0 Mar 29, 2017

chris-b1 mentioned this issue Aug 15, 2017

DataFrame.set_value() fails with numpy types #17256

Closed

jreback added a commit to jreback/pandas that referenced this issue Oct 2, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

21d3bb0

…l, SparseSeries, SparseDataFrame closes pandas-dev#15269

jreback mentioned this issue Oct 2, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Panel & Sparse #17739

Merged

jreback added a commit to jreback/pandas that referenced this issue Oct 2, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

4bf8a21

…l, SparseSeries, SparseDataFrame closes pandas-dev#15269

jreback added a commit to jreback/pandas that referenced this issue Oct 3, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

095256d

…l, SparseSeries, SparseDataFrame closes pandas-dev#15269

jreback added a commit to jreback/pandas that referenced this issue Oct 3, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

7dd12ee

…l, SparseSeries, SparseDataFrame closes pandas-dev#15269

jreback added a commit to jreback/pandas that referenced this issue Oct 5, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

ec7b086

…l, SparseSeries, SparseDataFrame closes pandas-dev#15269

jreback closed this as completed in #17739 Oct 5, 2017

jreback added a commit that referenced this issue Oct 5, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

3fae8dd

…l, SparseSeries, SparseDataFrame (#17739) closes #15269

ghost pushed a commit to reef-technologies/pandas that referenced this issue Oct 16, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

897867f

…l, SparseSeries, SparseDataFrame (pandas-dev#17739) closes pandas-dev#15269

alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

ecc067c

…l, SparseSeries, SparseDataFrame (pandas-dev#17739) closes pandas-dev#15269

No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

DEPR: deprecate .get_value and .set_value for Series, DataFrame, Pane…

db0d19b

…l, SparseSeries, SparseDataFrame (pandas-dev#17739) closes pandas-dev#15269

c-thiel mentioned this issue Feb 23, 2018

BUG?: .at not working on object indexes containing some integers #19860

Closed

smmaurer mentioned this issue Jun 18, 2020

Supporting Pandas v1.0 UDST/udst-planning#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPR: set_value / get_value #15269

DEPR: set_value / get_value #15269

jreback commented Jan 30, 2017

bluss commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

jreback commented Jan 31, 2017

wesm commented Jan 31, 2017

bluss commented Jan 31, 2017

jreback commented Jan 31, 2017

bluss commented Jan 31, 2017

dov commented Aug 15, 2017 •

edited

Loading

jreback commented Aug 15, 2017

dov commented Aug 16, 2017

jreback commented Aug 16, 2017

PointyShinyBurning commented Mar 21, 2018

jreback commented Mar 21, 2018

PointyShinyBurning commented Mar 21, 2018

jreback commented Mar 21, 2018

PointyShinyBurning commented Mar 21, 2018

LinuxIsCool commented Apr 11, 2023

DEPR: set_value / get_value #15269

DEPR: set_value / get_value #15269

Comments

jreback commented Jan 30, 2017

bluss commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

jorisvandenbossche commented Jan 31, 2017

jreback commented Jan 31, 2017

wesm commented Jan 31, 2017

bluss commented Jan 31, 2017

jreback commented Jan 31, 2017

bluss commented Jan 31, 2017

dov commented Aug 15, 2017 • edited Loading

jreback commented Aug 15, 2017

dov commented Aug 16, 2017

jreback commented Aug 16, 2017

PointyShinyBurning commented Mar 21, 2018

jreback commented Mar 21, 2018

PointyShinyBurning commented Mar 21, 2018

jreback commented Mar 21, 2018

PointyShinyBurning commented Mar 21, 2018

LinuxIsCool commented Apr 11, 2023

dov commented Aug 15, 2017 •

edited

Loading