Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve DataFrame(columns=).columns dtype during empty-like construction #14381

Merged

Conversation

mroeschke
Copy link
Contributor

Description

.column used to always return pd.Index([], dtype=object) even if an empty-dtyped columns was passed into the DataFrame constructor e.g. DatetimeIndex([]). Needed to preserved some information about what column dtype was passed in so we can return a correctly type Index

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. non-breaking Non-breaking change labels Nov 9, 2023
@mroeschke mroeschke requested a review from a team as a code owner November 9, 2023 00:53
Comment on lines 737 to 740
if len(columns) == 0:
empty_dtype = getattr(columns, "dtype", None)
else:
empty_dtype = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the columns are non-empty and have a dtype, should we respect that? It feels like there are still times in which we might lose the dtype, or do I have it wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Currently a column passed in as int8 would be returned as int64. I'll push an update to avoid this upcasting

@mroeschke
Copy link
Contributor Author

In a followup, I have a feeling ColumnAccessor should receive the user's passed column argument so we can centralize the logic whether a MultiIndex/RangeIndex/non-default dtype Index was passed

@mroeschke mroeschke changed the base branch from branch-23.12 to branch-24.02 November 16, 2023 03:08
@wence-
Copy link
Contributor

wence- commented Nov 21, 2023

/merge

@rapids-bot rapids-bot bot merged commit fcc8950 into rapidsai:branch-24.02 Nov 21, 2023
67 checks passed
@mroeschke mroeschke deleted the bug/dataframe/empty_column_type branch November 21, 2023 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants