Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REG: DataFrame.shift with axis=1 and CategoricalIndex columns #38504

Merged
merged 6 commits into from
Dec 17, 2020

Conversation

jbrockmendel
Copy link
Member

@topper-123 topper-123 added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type labels Dec 15, 2020
@topper-123 topper-123 added this to the 1.2 milestone Dec 15, 2020
@topper-123
Copy link
Contributor

Thanks for taking this.

This PR should IMO be part of 1.2, since it’s a regression since 1.1.

if periods > 0:
result = self.iloc[:, :-periods]
for col in range(min(ncols, abs(periods))):
# TODO(EA2D): doing this in a loop unnecessary with 2D EAs
# Define filler inside loop so we get a copy
filler = self.iloc[:, 0].shift(len(self))
result.insert(0, col, filler, allow_duplicates=True)
result.insert(0, label, filler, allow_duplicates=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work correctly for periods > 1? Looks like it inserts the same label in all locations, where Id think it should insert self.columns[col]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be benign because we set result.columns directly below. ill add a test to be on the safe side

else:
result = self.iloc[:, -periods:]
for col in range(min(ncols, abs(periods))):
# Define filler inside loop so we get a copy
filler = self.iloc[:, -1].shift(len(self))
result.insert(
len(result.columns), col, filler, allow_duplicates=True
len(result.columns), label, filler, allow_duplicates=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

# GH#38434
ci = CategoricalIndex(["a", "b"])
df = DataFrame([[1, 2], [3, 4]], index=ci, columns=ci)
result = df.shift(axis=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test a 3-column dataframe with periods=2 also?

@simonjayhawkins
Copy link
Member

This PR should IMO be part of 1.2, since it’s a regression since 1.1.

and whatsnew probably not necessary

@jbrockmendel
Copy link
Member Author

added test case with periods=2, removed whatsnew

@jreback jreback added the Regression Functionality that used to work in a prior pandas version label Dec 16, 2020
@jreback jreback merged commit d08f12c into pandas-dev:master Dec 17, 2020
@jreback
Copy link
Contributor

jreback commented Dec 17, 2020

@meeseeksdev backport 1.2.x

@lumberbot-app

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: DataFrame.shift(axis=1) raises TypeError when columns is CategoricalIndex
4 participants