Series.interpolate with index of type float gives wrong result #1206

GlenHertz opened this issue May 7, 2012 · 4 comments


GlenHertz commented May 7, 2012


The logic for Series.interpolate assumes the indexes are equally spaced. With a floating point index this is not the desired interpolation. For example:

x1 = np.array([0, 0.25, 0.77, 1.2, 1.4, 2.6, 3.1])
y1 = np.array([0, 1.1, 0.5, 1.5, 1.2, 2.1, 2.4])
x2 = np.array([0, 0.25, 0.66, 1.0, 1.2, 1.4, 3.1])
y2 = np.array([0, 0.2, 0.8, 1.1, 2.2, 0.1, 2.4])

df1 = DataFrame(data=y1, index=x1, columns=['A'])

df2 = DataFrame(data=y2, index=x2, columns=['A'])

df3=df1 - df2
print df3

def resample(signals):
    aligned_x_vals = reduce(lambda s1, s2: s1.index.union(s2.index), signals)
    return map(lambda s: s.reindex(aligned_x_vals).apply(Series.interpolate), signals)

sig1, sig2 = resample([df1, df2])
sig3 = sig1 - sig2
plt.plot(df1.index, df1.values, marker='D')
plt.plot(sig1.index, sig1.values, marker='o')
plt.plot(df2.index, df2.values, marker='o')
plt.plot(sig2.index ,sig2.values, marker='o')

I expect sig1 and sig2 to have more points than df1 and df2 but with the values interpolated. There are a few points that are not overlapping because it is assumed they are equally spaced. In my opinion if the index is a floating point the user wants to interpolate by the index's value and don't assume they are equally spaced. It should do something like this:

import numpy as np
from pandas import *

def interpolate(serie):
        inds = np.array([float(d) for d in serie.index])
    except ValueError:
        inds = np.arange(len(serie))

    values = serie.values

    invalid = isnull(values)
    valid = -invalid

    firstIndex = valid.argmax()
    valid = valid[firstIndex:]
    invalid = invalid[firstIndex:]
    inds = inds[firstIndex:]

    result = values.copy()
    result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid],

    return Series(result, index=serie.index,



wesm commented May 29, 2012

This is implemented in git master now and will be part of the 0.8.0 release

nlsn commented Jul 21, 2012

This seems to still be a problem as of

import pandas
import numpy as np
import pylab as pl
from scipy.interpolate import interp1d

time_fast = np.arange(50000.,50010.,.4) +.1
time_slow = np.arange(50000.,50010.,1.)

x_fast = np.sin(time_fast)
x_slow = np.sin(time_slow)

df_fast = pandas.DataFrame(x_fast, index=time_fast, columns=['fast'])
df_slow = pandas.DataFrame(x_slow, index=time_slow, columns=['slow'])

df_joined = df_fast.join(df_slow, how='outer')

df_joined['pandas interpolate'] = df_joined['slow'].interpolate()

f = interp1d(df_slow.index, df_slow['slow'], bounds_error=False)
df_joined['scipy interp1d'] = f(df_joined.index)

df_joined['pandas interpolate'].plot(style='o')
df_joined['scipy interp1d'].plot(style='o')

pl.title('Linearly interpolated points are expected to lie on the dotted red lines.')


wesm reopened this Jul 21, 2012


nlsn commented Jul 21, 2012

I also observed this with indices that are Datetime objects. The title of this issue may be too narrow.


wesm commented Jul 21, 2012

@nlsn you have to do:


The default of interpolate assumes that each value is evenly spaced, while method='values' uses the index values

wesm closed this Jul 21, 2012

