-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: read_csv not respecting object dtype when option is set #56047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can we get this one in? |
|
|
||
| df = DataFrame(col_dict, columns=columns, index=index) | ||
| if hasattr(self, "orig_options"): | ||
| dtype_arg = self.orig_options.get("dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the dtype option normally applied in _engine.read? Just curious why it needs to be done here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the DataFrame constructor infers object to string again if the option is set, which would discard the original dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK makes sense.
Could we defer looping over col_dict if dtype isn't specified to be object-like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, only doing this now if we have a dict or object dtype
| dtype_mapping[pa.null()] = pd.Int64Dtype() | ||
| frame = table.to_pandas(types_mapper=dtype_mapping.get) | ||
| elif using_pyarrow_string_dtype(): | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These mappers don't work, arrow supports type -> type not column -> type
|
cc @mroeschke gentle ping |
|
Thanks @phofl |
| new_rows = len(index) | ||
|
|
||
| df = DataFrame(col_dict, columns=columns, index=index) | ||
| if hasattr(self, "orig_options"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we do something more explicit than a hasattr check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, you can subclass the reader, so we don't have any control over it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does anybody actually do this? i judge those people, their ethics, and their hygiene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's something I can't answer, we might want to deprecate maybe, but we are stuck with hasattr here until then
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.we are not honouring object dtype here, thoughts on performance @jbrockmendel ?