-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguity in DataFrame.write_database(if_exists='replace')
could lead to data loss.
#12779
Comments
Thanks for the issue. I don't really see any ambiguity in the docstring here though; seems fairly clear what's going to happen, and it's also consistent with other major DataFrame libraries 🤔 Polars: This is the same behaviour (and parameter name) used in Pandas1 ...
...and ADBC2 (which calls the same parameter "mode" instead):
Given the precedents (and the docstring), I don't think most users will be that surprised. Footnotes |
Thinking about it some more, we could consider renaming the parameter |
I think this is an improvement, but it comes at the expense of worsening the inconsistency from |
Not really. That method already has a different parameter name, and different options; improving |
Do you know if there an issue tracking this already? Would be similar to |
Description
It is possible to incorrectly assume that the
if_exists="replace"
kwarg toDataFrame.write_database
refers to a record-level operation, similar to PostgREST'sPrefer: resolution=merge-duplicates
header.This assumption lead to a destructive operation where the table is dropped and recreated with the schema of the supplied
DataFrame
. To make matters worse, the only other non-NOOP value thatif_exists
accepts,append
, does in fact refer to a record-level operation (as one would expect), since supplying a DataFrame with a different schema to the target table leads to an exception (again, as one would expect) as opposed to overwriting the table.A potential fix is to rename
if_exists
tomode
, in tandem withDataFrame.write_delta(mode='overwrite')
.Tangential idea: This also opens up the door to adding a new feat
mode="merge"
where duplicated records in theDataFrame
are merged on their primary key the way PostgREST does.Link
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_database.html
The text was updated successfully, but these errors were encountered: