Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series.interpolate with index of type float gives wrong result #1206

Closed
GlenHertz opened this issue May 7, 2012 · 4 comments

Comments

@GlenHertz
Copy link

commented May 7, 2012

Hi,

The logic for Series.interpolate assumes the indexes are equally spaced. With a floating point index this is not the desired interpolation. For example:

x1 = np.array([0, 0.25, 0.77, 1.2, 1.4, 2.6, 3.1])
y1 = np.array([0, 1.1, 0.5, 1.5, 1.2, 2.1, 2.4])
x2 = np.array([0, 0.25, 0.66, 1.0, 1.2, 1.4, 3.1])
y2 = np.array([0, 0.2, 0.8, 1.1, 2.2, 0.1, 2.4])

df1 = DataFrame(data=y1, index=x1, columns=['A'])
df1.plot(marker='o')

df2 = DataFrame(data=y2, index=x2, columns=['A'])
df2.plot(marker='o')

df3=df1 - df2
df3.plot(marker='o')
print df3

def resample(signals):
    aligned_x_vals = reduce(lambda s1, s2: s1.index.union(s2.index), signals)
    return map(lambda s: s.reindex(aligned_x_vals).apply(Series.interpolate), signals)

sig1, sig2 = resample([df1, df2])
sig3 = sig1 - sig2
plt.plot(df1.index, df1.values, marker='D')
plt.plot(sig1.index, sig1.values, marker='o')
plt.grid()
plt.figure()
plt.plot(df2.index, df2.values, marker='o')
plt.plot(sig2.index ,sig2.values, marker='o')
plt.grid()

I expect sig1 and sig2 to have more points than df1 and df2 but with the values interpolated. There are a few points that are not overlapping because it is assumed they are equally spaced. In my opinion if the index is a floating point the user wants to interpolate by the index's value and don't assume they are equally spaced. It should do something like this:

import numpy as np
from pandas import *

def interpolate(serie):
    try:
        inds = np.array([float(d) for d in serie.index])
    except ValueError:
        inds = np.arange(len(serie))

    values = serie.values

    invalid = isnull(values)
    valid = -invalid

    firstIndex = valid.argmax()
    valid = valid[firstIndex:]
    invalid = invalid[firstIndex:]
    inds = inds[firstIndex:]

    result = values.copy()
    result[firstIndex:][invalid] = np.interp(inds[invalid], inds[valid],
                                             values[firstIndex:][valid])

    return Series(result, index=serie.index, name=serie.name)

Thanks

@wesm

This comment has been minimized.

Copy link
Member

commented May 29, 2012

This is implemented in git master now and will be part of the 0.8.0 release

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jun 21, 2012
Merge tag 'v0.8.0b1' into debian-0.8
Version 0.8.0 beta 1

* tag 'v0.8.0b1': (703 commits)
  RLS: 0.8.0 beta 1
  RLS: 0.8.0beta
  RLS: release notes
  ENH: add option to use Series.values to interpolate, close pandas-dev#1206
  TST: testing to close pandas-dev#1331
  DOC: groupby drop duplicate index pandas-dev#1312
  ENH: tz_convert for DataFrame pandas-dev#1330
  ENH: add NA handling to scatter_matrix, close pandas-dev#1297
  BUG: display localtime in DatetimeIndex.__repr__, close pandas-dev#1336
  DOC: draft of timeseries section of docs. Added Period related documentation and examples
  DOC: timezone handling and started on Period
  DOC: rough draft of DatetimeIndex, date_range, shifting/resampling etc
  DOC: more ts docs. Need to do resampling then PeriodIndex
  DOC: starting deeper revamp of ts docs for 0.8
  BUG: raise exception for unintellible frequency strings, close pandas-dev#1328
  ENH: construct PeriodIndex from arrays of fields, allow negative ordinals. close pandas-dev#1333 and pandas-dev#1264
  BUG: tsplot fix with business freq pandas-dev#1332
  BUG: DatetimeIndex partial slicing bug, tsplot kludge around pandas-dev#1332
  BUG: alias W to W-SUN, add test for buglet close pandas-dev#1327
  ENH: mix arrays and scalars in DataFrame constructor, close pandas-dev#1329
  ...
@nlsn

This comment has been minimized.

Copy link

commented Jul 21, 2012

This seems to still be a problem as of 0.8.1.dev-e2633d4.

import pandas
import numpy as np
import pylab as pl
from scipy.interpolate import interp1d

time_fast = np.arange(50000.,50010.,.4) +.1
time_slow = np.arange(50000.,50010.,1.)

x_fast = np.sin(time_fast)
x_slow = np.sin(time_slow)

df_fast = pandas.DataFrame(x_fast, index=time_fast, columns=['fast'])
df_slow = pandas.DataFrame(x_slow, index=time_slow, columns=['slow'])

df_joined = df_fast.join(df_slow, how='outer')

df_joined['pandas interpolate'] = df_joined['slow'].interpolate()

f = interp1d(df_slow.index, df_slow['slow'], bounds_error=False)
df_joined['scipy interp1d'] = f(df_joined.index)

df_joined['pandas interpolate'].plot(style='o')
df_joined['scipy interp1d'].plot(style='o')
df_slow['slow'].plot(style='r.:')

pl.title('Linearly interpolated points are expected to lie on the dotted red lines.')

pl.legend()
pl.show()

@wesm wesm reopened this Jul 21, 2012

@nlsn

This comment has been minimized.

Copy link

commented Jul 21, 2012

I also observed this with indices that are Datetime objects. The title of this issue may be too narrow.

@wesm

This comment has been minimized.

Copy link
Member

commented Jul 21, 2012

@nlsn you have to do:

df_joined['slow'].interpolate(method='values')

The default of interpolate assumes that each value is evenly spaced, while method='values' uses the index values

@wesm wesm closed this Jul 21, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.