Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: IndexError occuring on duplicate axis does not fail gracefully. #39337

Open
2 of 3 tasks
thisiswhereitype opened this issue Jan 22, 2021 · 0 comments
Open
2 of 3 tasks
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@thisiswhereitype
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
import pandas as pd
import numpy as np
# pd.show_versions()


## Adjust N for your machine
# n = 10000000
df = pd.DataFrame(np.arange(n), columns='a'.split()).assign(**{'ts':pd.date_range(start='2020-01-01', freq='1S', periods=n)})

df['td'] = df.ts.dt.normalize()


df = df.set_index(['td', 'ts'])


# Missing index level from reset
d = df.reset_index(1).groupby(pd.Grouper(level=0)).a.apply(lambda x: (x.shift() - x ) < pd.Timedelta(minutes=1))
d
# The assignment, usage of values solves index error
df['x'] = d.cumsum()#.values
df

Problem description

Assigning in my case causes very slow climbing in memory and if the dataframe is large enough, OOM killing.
The dataframe is similar is width to the example here, the equivalent multi-index is unique.

Raising this stops using hunting for memory leaks or bugs etc.

Expected Output

The error is raised quickly without requiring disproportionate memory.

Output of pd.show_versions()

pandas           : 1.2.1
numpy            : 1.19.5
@thisiswhereitype thisiswhereitype added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 22, 2021
@mroeschke mroeschke added Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

2 participants