Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some argument combinations with reindex fails on an empty dataframe #27315

Closed
ajspera opened this issue Jul 9, 2019 · 0 comments · Fixed by #37874
Closed

some argument combinations with reindex fails on an empty dataframe #27315

ajspera opened this issue Jul 9, 2019 · 0 comments · Fixed by #37874
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@ajspera
Copy link

ajspera commented Jul 9, 2019

import pandas as pd
from datetime import datetime, timedelta

end = datetime.utcnow()
begin = end - timedelta(minutes=1)
data_interval = 10
date_index = pd.date_range(start=begin,
                           end=end,
                           freq='{} s'.format(data_interval))
df = pd.DataFrame([], columns=['time','a','b'])
df = df.set_index('time', drop=True)
tol = timedelta(seconds=9)
df = df.reindex(date_index, method='pad', tolerance=tol)

# IndexError: index -1 is out of bounds for axis 0 with size 0

df = pd.DataFrame([], columns=['time','a','b'])
df = df.reindex(date_index, method='nearest')

# IndexError: index -1 is out of bounds for axis 0 with size 0

You get an index error when a dataframe is empty using the tolerance= or method='nearest' .

This is not something that happens with other usages of reindex and can come up as a surprise when reindexing an empty window of data. I would expect it to behave the same as it does without tolerance here.

Expected Output

Should be same as reindex with no args in this case which returns...

                              a    b
2019-07-09 22:35:05.165640  NaN  NaN
2019-07-09 22:35:15.165640  NaN  NaN
2019-07-09 22:35:25.165640  NaN  NaN
2019-07-09 22:35:35.165640  NaN  NaN
2019-07-09 22:35:45.165640  NaN  NaN
2019-07-09 22:35:55.165640  NaN  NaN
2019-07-09 22:36:05.165640  NaN  NaN

Temp Solution

Simple user solution is to check length... but this is a problem that might surprise someone at a bad time like it did for us.

if(len(df) is 0):
    df = df.reindex(date_index)
else:
    df = df.reindex(date_index, method='pad', tolerance=tol)

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-54-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.0.1
pip: 18.0
setuptools: 40.4.1
Cython: None
numpy: 1.16.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.1.8
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: 1.3.5
pymysql: 0.9.3
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: 0.6.1
pandas_datareader: None
gcsfs: None

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 10, 2019
@jreback jreback added this to the 1.2 milestone Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
2 participants