Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inability to cast int to string when appending data to table using load_table_from_json #906

Closed
Nathan-Nesbitt opened this issue Aug 24, 2021 · 8 comments
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@Nathan-Nesbitt
Copy link

Issue

I can append this data using the stream functionality, but with a job it fails to convert properly. Is this an error on my part or a bug?

Environment details

  • OS type and version: Ubuntu 20.04.2 LTS
  • Python version: 3.9
  • pip version: 21.1.3
  • google-cloud-bigquery version: 1.24.0

Steps to reproduce

  1. Run the following query which works in a stream in a job

Code example

row = {'id': '781263812376123', ...}
table = client.get_table(table)
job_config = LoadJobConfig(table.schema, write_disposition=WriteDisposition.WRITE_APPEND)
job = client.load_table_from_json([row], destination=table, job_config=job_config)

Stack trace

ebapp_1  |   File "<REMOVED>", line 130, in insert_rows_batch
webapp_1  |     job.result()
webapp_1  |     │   └ <function _AsyncJob.result at 0x7fd294afd820>
webapp_1  |     └ <google.cloud.bigquery.job.LoadJob object at 0x7fd28ae04160>
webapp_1  | 
webapp_1  |   File "/home/python/lib/python3.8/site-packages/google/cloud/bigquery/job.py", line 818, in result
webapp_1  |     return super(_AsyncJob, self).result(timeout=timeout)
webapp_1  |                  │          │                    └ None
webapp_1  |                  │          └ <google.cloud.bigquery.job.LoadJob object at 0x7fd28ae04160>
webapp_1  |                  └ <class 'google.cloud.bigquery.job._AsyncJob'>
webapp_1  |   File "/home/python/lib/python3.8/site-packages/google/api_core/future/polling.py", line 130, in result
webapp_1  |     raise self._exception
webapp_1  |           │    └ BadRequest('Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. P...
webapp_1  |           └ <google.cloud.bigquery.job.LoadJob object at 0x7fd28ae04160>
webapp_1  | 
webapp_1  | google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.

Which leads to

'Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to string. Field: id; Value: 781263812376123'
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Aug 24, 2021
@plamut plamut added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 25, 2021
@plamut
Copy link
Contributor

plamut commented Aug 25, 2021

@Nathan-Nesbitt What's the table schema? Specifically, what's the type of the "id" column?

It seems that the id value was converted to an integer (no quotes), but it was then attempted to parse as a string, which failed (due to missing quotes?).

Is the error also reproducible if sending id as an integer?

row = {'id': 781263812376123, ...}  # note: no quotes

@Nathan-Nesbitt
Copy link
Author

@plamut the ID column has type integer which makes this even more strange.

image

I will set up a test to see if the ID passed in as an int causes issues!

@Nathan-Nesbitt
Copy link
Author

I just tested it on this column by converting all of the input into INTs, unfortunately there is no difference in the error.

@plamut
Copy link
Contributor

plamut commented Aug 25, 2021

@Nathan-Nesbitt OK, thanks for checking that, I will have a closer look after we successfully rename the default branch.

@plamut plamut self-assigned this Aug 25, 2021
@plamut
Copy link
Contributor

plamut commented Aug 27, 2021

@Nathan-Nesbitt Unfortunately, I was not able to reproduce the issue with Python 3.9 and google-cloud-bigquery==1.24.0 (and neither with the latest version).

BTW, is the code example accurate? Asking because the code is not runnable, table.schema needs to be passed as a kwarg to LoadJobConfig (just a sanity check if something is perhaps missing).

Does the error also occur with the latest BigQuery version? (2.25.1 at the time of writing)

@Nathan-Nesbitt
Copy link
Author

Sorry about going silent, was redirected onto another portion of the project for the last week. I had to make some changes to trim down the code to a minimal chunk and accidentally deleted the kwarg name. This is what I have in my function right now:

row = {'id': '781263812376123', ...} # This is a more complex object that I can't share unfortunately, but fits the table schema
table = client.get_table(table)
job_config = LoadJobConfig(schema=table.schema, write_disposition=WriteDisposition.WRITE_APPEND)
job = client.load_table_from_json([row], destination=table, job_config=job_config)

I will try out the new version and try to narrow down the issue to be sure there isn't anything else that could be going on.

Thanks again!

@plamut
Copy link
Contributor

plamut commented Oct 16, 2021

@Nathan-Nesbitt Any luck with finding the cause of the problem, or at least with narrowing down the pre-conditions to consistently reproduce the problem locally?

Thanks!

@plamut
Copy link
Contributor

plamut commented Nov 15, 2021

I'm closing this due to inactivity, but if there's any new information an the issue had not been resolved yet by upgrading, feel free to re-open it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants