-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: insert_rows() fails when a repeated field is missing #9602
Comments
@tswast AFAIK, this would be a back-end issue. |
Seems related to #9207. |
I was able to reproduce it in Python 3, too. Posting a minimal reproducible example (the table must already exist): from __future__ import print_function
from google.cloud import bigquery
PROJECT = "project_name"
DATASET = "dataset_name"
TABLE = "table_name"
client = bigquery.Client(PROJECT)
dataset_ref = client.dataset(DATASET)
table_ref = dataset_ref.table(TABLE)
schema = [
bigquery.SchemaField("int_col", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("repeated_col", "INTEGER", mode="REPEATED"),
])
rows = [
# {"int_col": 5, "repeated_col": [10, 20, 30]}, # works
# {"int_col": 8, "repeated_col": []}, # works
# {"int_col": 9, "repeated_col": None}, # "Field value cannot be empty."
{"int_col": 12}, # "Field value cannot be empty."
]
print("Inserting rows...")
errors = client.insert_rows(table_ref, rows, selected_fields=schema)
print("Insert errors:,", errors) The error is reproducible if the repeated field is not present in the data, or if it's set to Classifying as P2, since a straightforward workaround exists. |
Actually, we do have a client-side issue here. Inserting We need to update the logic for
Source: from google.cloud import bigquery
client = bigquery.Client()
print("-" * 40)
print("insert_rows_json")
print("-" * 40)
rows = [
{"repeated_col": []}, # works
{"int_col": 5, "repeated_col": [10, 20, 30]}, # works
{"int_col": 8, "repeated_col": []}, # works
]
print("Inserting populated rows...")
errors = client.insert_rows_json("my_dataset.google_cloud_python_9602", rows)
print("Insert errors:")
for error in errors:
print(error)
print()
rows = [
{"int_col": 9, "repeated_col": None}, # "Field value cannot be empty."
]
print("Inserting null rows...")
errors = client.insert_rows_json("my_dataset.google_cloud_python_9602", rows)
print("Insert errors:")
for error in errors:
print(error)
print()
rows = [
{"int_col": 12}, # works
]
print("Inserting missing rows...")
errors = client.insert_rows_json("my_dataset.google_cloud_python_9602", rows)
print("Insert errors:")
for error in errors:
print(error)
print()
print("-" * 40)
print("insert_rows")
print("-" * 40)
schema = [
bigquery.SchemaField("int_col", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("repeated_col", "INTEGER", mode="REPEATED"),
]
rows = [
{"repeated_col": []}, # works
{"int_col": 5, "repeated_col": [10, 20, 30]}, # works
{"int_col": 8, "repeated_col": []}, # works
]
print("Inserting populated rows...")
errors = client.insert_rows("my_dataset.google_cloud_python_9602", rows, selected_fields=schema)
print("Insert errors:")
for error in errors:
print(error)
print()
rows = [
{"int_col": 9, "repeated_col": None}, # "Field value cannot be empty."
]
print("Inserting null rows...")
errors = client.insert_rows("my_dataset.google_cloud_python_9602", rows, selected_fields=schema)
print("Insert errors:")
for error in errors:
print(error)
print()
rows = [
{"int_col": 12}, # "Field value cannot be empty."
]
print("Inserting missing rows...")
errors = client.insert_rows("my_dataset.google_cloud_python_9602", rows, selected_fields=schema)
print("Insert errors:")
for error in errors:
print(error)
print() |
As I don't believe the backend has changed it's behavior for this case, any backend issue I were to file would be considered a feature request. |
Using the method insert_rows to insert a row missing a REPEATED field returns an error. Repeated fields should be nullable by default, so it should not matter if a repeated field is not provided.
The method insert_rows_json, which insert_rows uses to make the API call seems to work fine.
Environment details
Python 2.7.15+
google-cloud-bigquery 1.21.0
Steps to reproduce
The text was updated successfully, but these errors were encountered: