New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upsample spline interpolation with smoothing #26309
Comments
@tritemio can you edit the original post to be fully reproducible on it's own? Specifically, remove the need for read_csv of an external file (you can use I believe that pandas always keeps "valid" (non-NaN) points present, so we really are just interpolating rather than smoothing. We could consider an alternative API for smoothers, but I'm not sure what that would look like. |
@TomAugspurger, I modified the example to be self-contained. As you can see now better, the times in the original data are all multiple of 5 minutes. When you evaluate the interpolation at the exact same timestamp as in the original data, the result should be different than the original data, by definition of smoothed spline. Since scipy does the right thing, it is strange that pandas would override the some of the result values. Looks like pandas is adding the data points back, without checking if they already exist in the interpolated data. |
Thanks,
I'm not sure about the original intent on pandas' interpolate, but it's
fundamentally based around filling missing values. All
the current interpolate methods will only update missing values, and
non-missing values will be passed through.
This use case seems worth supporting, but someone will need to design an
API (either an additional keyword, or
an alternative method).
…On Tue, May 14, 2019 at 2:01 PM Antonino Ingargiola < ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>, I modified the example
to be self-contained.
As you can see now better, the times in the original data are all multiple
of 5 minutes. When you evaluate the interpolation at the exact same
timestamp as in the original data, the result should be different than the
original data, by definition of smoothed spline.
Since scipy does the right thing, it is strange that pandas would override
the some of the result values. Looks like pandas is adding the data points
back, without checking if they already exist in the interpolated data.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26309?email_source=notifications&email_token=AAKAOIWKZ7A2OVHEWOYRNHDPVMECBA5CNFSM4HLJ46LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVMO2NA#issuecomment-492367156>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITT4AEYFLJTDTNXJLLPVMECBANCNFSM4HLJ46LA>
.
|
Except for spline, I don't think Instead of adding a new API, would it make sense to special-case |
To do that, we would need to deprecate the existing behavior first, which
would require a new keyword (which itself would then need to be deprecated).
A keyword to control smoothing or not for existing points seems the most
sensible (would probably raise for non-spline methods).
…On Tue, May 14, 2019 at 4:08 PM Antonino Ingargiola < ***@***.***> wrote:
Except for spline, I don't think scipy.interpolate has other methods
using "smoothing".
Instead of adding a new API, would it make sense to special-case if
method='spline' and s > 0 to fill all values in the resampled axis?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26309?email_source=notifications&email_token=AAKAOIRGVAPPOBCWAJQAVELPVMS3LA5CNFSM4HLJ46LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVMZLOY#issuecomment-492410299>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOITXOMWMJERLD5CMW2DPVMS3LANCNFSM4HLJ46LA>
.
|
Code Sample, a copy-pastable example if possible
See the same script as a notebook:
Problem description
Spline interpolation of order 3 with smoothing (
s>0
) gives an interpolation that does not pass through the data points. Scipy's version shows this behaviour. Pandas's version shows a smooth spline and then "jumps" in correspondence to the data points in order to "pass through the data". See figure below:Expected Output
Scipy and pandas interpolation should match.
Output of
pd.show_versions()
[paste the output of
pd.show_versions()
here below this line]INSTALLED VERSIONS
commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.3.0
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.2
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.3.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2018.9
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.2
bs4: 4.7.1
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: