Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: insert_rows() fails when a repeated field is missing #9602

Closed
danych2 opened this issue Nov 5, 2019 · 5 comments · Fixed by #10196
Closed

BigQuery: insert_rows() fails when a repeated field is missing #9602

danych2 opened this issue Nov 5, 2019 · 5 comments · Fixed by #10196
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@danych2
Copy link

danych2 commented Nov 5, 2019

Using the method insert_rows to insert a row missing a REPEATED field returns an error. Repeated fields should be nullable by default, so it should not matter if a repeated field is not provided.
The method insert_rows_json, which insert_rows uses to make the API call seems to work fine.

Environment details

Python 2.7.15+
google-cloud-bigquery 1.21.0

Steps to reproduce

  • Create a table with a repeated field
  • Insert a row into the table using insert_rows, passing a dict where the repeated field key-value pair is missing.
@busunkim96 busunkim96 changed the title insert_rows fails when a repeated field is missing Bigquery: insert_rows fails when a repeated field is missing Nov 5, 2019
@busunkim96 busunkim96 added the api: bigquery Issues related to the BigQuery API. label Nov 5, 2019
@tseaver tseaver added the type: question Request for information or clarification. Not an issue. label Nov 5, 2019
@tseaver
Copy link
Contributor

tseaver commented Nov 5, 2019

@tswast AFAIK, this would be a back-end issue.

@plamut
Copy link
Contributor

plamut commented Nov 6, 2019

Seems related to #9207.

@plamut
Copy link
Contributor

plamut commented Nov 6, 2019

I was able to reproduce it in Python 3, too. Posting a minimal reproducible example (the table must already exist):

from __future__ import print_function

from google.cloud import bigquery


PROJECT = "project_name"
DATASET = "dataset_name"
TABLE = "table_name"

client = bigquery.Client(PROJECT)
dataset_ref = client.dataset(DATASET)
table_ref = dataset_ref.table(TABLE)

schema = [
    bigquery.SchemaField("int_col", "INTEGER", mode="REQUIRED"),
    bigquery.SchemaField("repeated_col", "INTEGER", mode="REPEATED"),
])

rows = [
    # {"int_col": 5, "repeated_col": [10, 20, 30]},  # works
    # {"int_col": 8, "repeated_col": []},            # works 
    # {"int_col": 9, "repeated_col": None},          # "Field value cannot be empty."
    {"int_col": 12},                                 # "Field value cannot be empty."
]

print("Inserting rows...")
errors = client.insert_rows(table_ref, rows, selected_fields=schema)
print("Insert errors:,", errors)

The error is reproducible if the repeated field is not present in the data, or if it's set to None (even if insert_rows_json() is used directly). On the other hand, things work fine if the field is explicitly set to an empty list. All three cases should probably indeed be treated in the same way, though.

Classifying as P2, since a straightforward workaround exists.

@plamut plamut added backend priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: question Request for information or clarification. Not an issue. labels Nov 6, 2019
@tswast
Copy link
Contributor

tswast commented Jan 8, 2020

Actually, we do have a client-side issue here. Inserting {"int_col": 12} works with insert_rows_json, but not insert_rows.

We need to update the logic for insert_rows to omit any missing values in the generated resource rather than the current logic, where it explicitly sets missing values to None.

record[subname] = _field_to_json(subfield, subvalue)

$ python write_python.py
----------------------------------------
insert_rows_json
----------------------------------------
Inserting populated rows...
Insert errors:

Inserting null rows...
Insert errors:
{'index': 0, 'errors': [{'reason': 'invalid', 'location': 'repeated_col', 'debugInfo': '', 'message': 'Field value cannot be empty.'}]}

Inserting missing rows...
Insert errors:

----------------------------------------
insert_rows
----------------------------------------
Inserting populated rows...
Insert errors:

Inserting null rows...
Insert errors:
{'index': 0, 'errors': [{'reason': 'invalid', 'location': 'repeated_col', 'debugInfo': '', 'message': 'Field value cannot be empty.'}]}

Inserting missing rows...
Insert errors:
{'index': 0, 'errors': [{'reason': 'invalid', 'location': 'repeated_col', 'debugInfo': '', 'message': 'Field value cannot be empty.'}]}

Source:

from google.cloud import bigquery


client = bigquery.Client()

print("-" * 40)
print("insert_rows_json")
print("-" * 40)

rows = [
    {"repeated_col": []},  # works
    {"int_col": 5, "repeated_col": [10, 20, 30]},  # works
    {"int_col": 8, "repeated_col": []},            # works
]

print("Inserting populated rows...")
errors = client.insert_rows_json("my_dataset.google_cloud_python_9602", rows)
print("Insert errors:")
for error in errors:
    print(error)
print()

rows = [
    {"int_col": 9, "repeated_col": None},          # "Field value cannot be empty."
]
print("Inserting null rows...")
errors = client.insert_rows_json("my_dataset.google_cloud_python_9602", rows)
print("Insert errors:")
for error in errors:
    print(error)
print()

rows = [
    {"int_col": 12},                                 # works
]
print("Inserting missing rows...")
errors = client.insert_rows_json("my_dataset.google_cloud_python_9602", rows)
print("Insert errors:")
for error in errors:
    print(error)
print()


print("-" * 40)
print("insert_rows")
print("-" * 40)

schema = [
    bigquery.SchemaField("int_col", "INTEGER", mode="REQUIRED"),
    bigquery.SchemaField("repeated_col", "INTEGER", mode="REPEATED"),
]

rows = [
    {"repeated_col": []},  # works
    {"int_col": 5, "repeated_col": [10, 20, 30]},  # works
    {"int_col": 8, "repeated_col": []},            # works
]

print("Inserting populated rows...")
errors = client.insert_rows("my_dataset.google_cloud_python_9602", rows, selected_fields=schema)
print("Insert errors:")
for error in errors:
    print(error)
print()

rows = [
    {"int_col": 9, "repeated_col": None},          # "Field value cannot be empty."
]
print("Inserting null rows...")
errors = client.insert_rows("my_dataset.google_cloud_python_9602", rows, selected_fields=schema)
print("Insert errors:")
for error in errors:
    print(error)
print()

rows = [
    {"int_col": 12},                                 # "Field value cannot be empty."
]
print("Inserting missing rows...")
errors = client.insert_rows("my_dataset.google_cloud_python_9602", rows, selected_fields=schema)
print("Insert errors:")
for error in errors:
    print(error)
print()

@tswast tswast assigned plamut and unassigned tswast Jan 8, 2020
@tswast tswast removed the backend label Jan 8, 2020
@tswast
Copy link
Contributor

tswast commented Jan 8, 2020

As I don't believe the backend has changed it's behavior for this case, any backend issue I were to file would be considered a feature request.

@plamut plamut changed the title Bigquery: insert_rows fails when a repeated field is missing BigQuery: insert_rows fails when a repeated field is missing Jan 22, 2020
@plamut plamut changed the title BigQuery: insert_rows fails when a repeated field is missing BigQuery: insert_rows() fails when a repeated field is missing Jan 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants