Quick dataframe shift #5609

halleygithub · 2013-11-28T10:59:48Z

Quick implementation of dataframe shift

def quick_shift(df, N=1):
    '''Quick implementation of dataframe shift'''

    shift_value = np.roll(df.values,N,axis=0)
    shift_value[0:N] = np.NaN
    return DataFrame(shift_value, index=df.index, columns=df.columns)
DataFrame.quick_shift = quick_shift

Perf benchmark:

df = DataFrame(np.random.rand(10000,500))

%timeit df1 = df.quick_shift(1)
10 loops, best of 3: 55.3 ms per loop

%timeit df2 = df.shift(1)
1 loops, best of 3: 238 ms per loop

Related to #4095

The text was updated successfully, but these errors were encountered:

jreback · 2013-11-28T19:12:22Z

this would need to be done in core/internals.py

needs to handle non trivial shifts (eg something that is not so straightforward like this) and multi dtypes

jreback · 2014-01-25T00:15:34Z

@halleygithub putting on the plate for 0.14. Care to do a PR for this?

halleygithub · 2014-01-28T11:13:41Z

@jreback , as I am not familiar with github, so I am not sure what can I do . But you are free to handle the code as you want. :)

jreback · 2014-01-28T11:21:02Z

it's a nice excuse to get familiar

see here: http://pandas.pydata.org/developers.html

gouthambs · 2014-02-15T06:25:50Z

@jreback I was wondering if I can take a stab at this?

jreback · 2014-02-15T06:33:25Z

of course

lmk if u need help

gouthambs · 2014-02-15T07:30:35Z

@jreback I see shift function in core/internals.py in the Block class and in core/generic.py in the NDFrame class. Can you point me to a place where I can understand some basic code flow? Or can you give me some quick pointers?

jreback · 2014-02-15T11:53:31Z

core/generic /shift sets up what to do eg how many periods to shift and such and translates it to move the data up 3 or whatever; this then calls the internals shift

core/internals/ Block/shift is where to make this change

simply swap out the implantation for the new
make sure tests pass
write a vbench for this (pandas/vb_suite) - find a file with similar methods and stick it their
u can even use the above benchmarks

gouthambs · 2014-02-15T21:18:14Z

@jreback Thanks. I will do that.

jreback · 2014-02-16T18:02:26Z

FYI, I just fixed a bug in shift, see here: #6373

The soln proposed above is still valid in any case....make a change in core/internals/Block/shift

gouthambs · 2014-02-17T13:49:05Z

Sure. I will take your changes and merge with mine.

gouthambs · 2014-02-18T06:05:29Z

@jreback I swapped the current shift with a roll, in order to see if I get a speed up. I tried swapping in core/internals.py. But this did not get any performance improvements. On the other hand when I use a roll in core/generic.py (just as a test), I see good speedup. The two implementations are shown below:

diff --git a/pandas/core/generic.py b/pandas/core/generic.py
index d607be6..753b146 100644
--- a/pandas/core/generic.py
+++ b/pandas/core/generic.py
@@ -3169,9 +3169,10 @@ class NDFrame(PandasObject):
             return self

         if freq is None and not len(kwds):
-            block_axis = self._get_block_manager_axis(axis)
-            indexer = com._shift_indexer(len(self._get_axis(axis)), periods)
-            new_data = self._data.shift(indexer=indexer, periods=periods, axis=block_axis)
+            #block_axis = self._get_block_manager_axis(axis)
+            #indexer = com._shift_indexer(len(self._get_axis(axis)), periods)
+            #new_data = self._data.shift(indexer=indexer, periods=periods, axis=block_axis)
+            new_data    = np.roll(self.values,periods,axis=axis)
         else:
             return self.tshift(periods, freq, **kwds)

diff --git a/pandas/core/internals.py b/pandas/core/internals.py
index c89aac0..e56697b 100644
--- a/pandas/core/internals.py
+++ b/pandas/core/internals.py
@@ -931,7 +931,8 @@ class Block(PandasObject):
     def shift(self, indexer, periods, axis=0):
         """ shift the block by periods, possibly upcast """

-        new_values = self.values.take(indexer, axis=axis)
+        #new_values = self.values.take(indexer, axis=axis)
+        new_values = np.roll(self.values,periods, axis=axis)
         # convert integer to float if necessary. need to do a lot more than
         # that, handle boolean etc also
         new_values, fill_value = com._maybe_upcast(new_values)

Do you have any ideas why np.roll speeds up the NDFrame but not the Block class?

The timings for the above mentioned example are as follows:

Original Implementation: 242ms
Quick Shift : 27 ms
Roll in core/internals : 239 ms
Roll in core/generic:27 ms

jreback · 2014-02-18T12:32:40Z

you can't do it in generic.py because that only handles a single dtype. The orientation needs to be changed as the blocks store values 'flipped' from what you think they are. You need to adjust the roll (which IIRC is just an axis swap) to account for this.

In internals you are effectively using axis 1 for the shift (on a row shift), that's what the block_axis is.
transpose the values before roll, then transpose again after the setting

jreback · 2014-02-18T12:33:23Z

[goat-jreback-~/pandas] git diff

diff --git a/pandas/core/internals.py b/pandas/core/internals.py
index e68fc8d..d33386e 100644
--- a/pandas/core/internals.py
+++ b/pandas/core/internals.py
@@ -946,11 +946,10 @@ class Block(PandasObject):
     def shift(self, indexer, periods, axis=0):
         """ shift the block by periods, possibly upcast """

-        new_values = self.values.take(indexer, axis=axis)
+        new_values, fill_value = com._maybe_upcast(self.values)
         # convert integer to float if necessary. need to do a lot more than
         # that, handle boolean etc also
-        new_values, fill_value = com._maybe_upcast(new_values)
-
+        new_values = np.roll(self.values.T,periods, axis=axis)
         axis_indexer = [ slice(None) ] * self.ndim
         if periods > 0:
             axis_indexer[axis] = slice(None,periods)
@@ -959,7 +958,7 @@ class Block(PandasObject):
             axis_indexer[axis] = slice(periods,None)
         new_values[tuple(axis_indexer)] = fill_value

-        return [make_block(new_values, self.items, self.ref_items,
+        return [make_block(new_values.T, self.items, self.ref_items,
                            ndim=self.ndim, fastpath=True)]

     def eval(self, func, other, raise_on_error=True, try_cast=False):

jreback · 2014-02-18T12:34:52Z

you can also remove _shift_indexer from core/common.py as I think that's the only thing that uses it

jreback · 2014-02-18T12:35:12Z

pls add a vbench for this as well (vb_suite/frame_methods.py)

gouthambs · 2014-02-18T13:29:49Z

Thanks. I will try this and add one to vbench.

This was referenced Feb 19, 2014

Quick Frame Shift Implementation #6404

Closed

SparseDataFrame.shift does not respond to axis #6427

Closed

gouthambs mentioned this issue Mar 20, 2014

PERF: Quick Shift Implementation (GH5609) #6672

Merged

jreback closed this as completed in #6672 Mar 20, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick dataframe shift #5609

Quick dataframe shift #5609

halleygithub commented Nov 28, 2013

jreback commented Nov 28, 2013

jreback commented Jan 25, 2014

halleygithub commented Jan 28, 2014

jreback commented Jan 28, 2014

gouthambs commented Feb 15, 2014

jreback commented Feb 15, 2014

gouthambs commented Feb 15, 2014

jreback commented Feb 15, 2014

gouthambs commented Feb 15, 2014

jreback commented Feb 16, 2014

gouthambs commented Feb 17, 2014

gouthambs commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

gouthambs commented Feb 18, 2014

Quick dataframe shift #5609

Quick dataframe shift #5609

Comments

halleygithub commented Nov 28, 2013

jreback commented Nov 28, 2013

jreback commented Jan 25, 2014

halleygithub commented Jan 28, 2014

jreback commented Jan 28, 2014

gouthambs commented Feb 15, 2014

jreback commented Feb 15, 2014

gouthambs commented Feb 15, 2014

jreback commented Feb 15, 2014

gouthambs commented Feb 15, 2014

jreback commented Feb 16, 2014

gouthambs commented Feb 17, 2014

gouthambs commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

jreback commented Feb 18, 2014

gouthambs commented Feb 18, 2014