Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load table from dataframe with overlapping index/column name #1543

Open
bnaul opened this issue Apr 4, 2023 · 2 comments
Open
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@bnaul
Copy link
Contributor

bnaul commented Apr 4, 2023

After this change in #1535, loading a dataframe where the index is also a column now fails:

[ins] In [42]: df
Out[42]:
   a
a
A  A
B  B

[ins] In [43]: bigquery.Client().load_table_from_dataframe(df, "tmp.blah")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [43], in <cell line: 1>()
----> 1 bigquery.Client().load_table_from_dataframe(df, "tmp.blah")
...
File ~/model/.venv/lib/python3.10/site-packages/google/cloud/bigquery/_pandas_helpers.py:484, in dataframe_to_bq_schema(dataframe, bq_schema)
    482 bq_type = _PANDAS_DTYPE_TO_BQ.get(dtype.name)
    483 if bq_type is None:
--> 484     sample_data = _first_valid(dataframe.reset_index()[column])
    485     if (
    486         isinstance(sample_data, _BaseGeometry)
    487         and sample_data is not None  # Paranoia
    488     ):
    489         bq_type = "GEOGRAPHY"
...
File ~/model/.venv/lib/python3.10/site-packages/pandas/core/frame.py:4440, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   4434     raise ValueError(
   4435         "Cannot specify 'allow_duplicates=True' when "
   4436         "'self.flags.allows_duplicate_labels' is False."
   4437     )
   4438 if not allow_duplicates and column in self.columns:
   4439     # Should this be a different kind of error??
-> 4440     raise ValueError(f"cannot insert {column}, already exists")
   4441 if not isinstance(loc, int):
   4442     raise TypeError("loc must be int")

ValueError: cannot insert a, already exists

Kind of a weird edge case but I think the same goal of that PR could have been accomplished without a breaking change. Perhaps the easiest would be to just reset_index() in a separate statement and catch the ValueError (since if you hit it then the reset_index() call wasn't needed)?

cc @tswast @chelsea-lin

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Apr 4, 2023
@meredithslota
Copy link
Contributor

@tswast Looks like this change was approved by you — was this an accidental breaking change?

@meredithslota meredithslota added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 15, 2023
@tswast
Copy link
Contributor

tswast commented Aug 16, 2023

That's correct. It is indeed an accidental breaking change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants