500 server error when creating table using clustering #242

charlielito · 2020-08-25T16:57:21Z

Environment details

OS type and version: Ubuntu20 PopOs
Python version: 3.7.8
pip version: 20.1.1
google-cloud-bigquery version: 1.27.2

Steps to reproduce

I'm creating a table with some columns, one of them is of type GEOGRAHPY. When I try to create the table with a sample data, if I choose to use clustering, I got the 500 error. I can create the table only if no clustering is made. Also I can create the table with clustering if I don't include the column of type GEOGRAHPY.
Code with a toy example to reproduce it:

Code example

import time
import pandas as pd
from google.cloud import bigquery
from shapely.geometry import Point

client = bigquery.Client()
PROJECT_ID = ""
table_id = f"{PROJECT_ID}.data_capture.toy"

df = pd.DataFrame(
    dict(
        lat=[6.208969] * 100,
        lon=[-75.571696] * 100,
        logged_at=[int(time.time() * 1000) for _ in range(100)],
    )
)
df["point"] = df.apply(lambda row: Point(row["lon"], row["lat"]).wkb_hex, axis=1)

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("lon", "FLOAT64", "REQUIRED"),
        bigquery.SchemaField("lat", "FLOAT64", "REQUIRED"),
        bigquery.SchemaField("point", "GEOGRAPHY", "REQUIRED"),
        bigquery.SchemaField("logged_at", "TIMESTAMP", "REQUIRED"),
    ],
    write_disposition="WRITE_TRUNCATE",
    time_partitioning=bigquery.TimePartitioning(
        type_=bigquery.TimePartitioningType.DAY, field="logged_at",
    ),
    clustering_fields=["logged_at"],
)

job = client.load_table_from_dataframe(
    df, table_id, job_config=job_config
)  # Make an API request.
job.result()  # Wait for the job to complete.

Stack trace

Traceback (most recent call last):
  File "test.py", line 108, in <module>
    job.result()  # Wait for the job to complete.
  File "/home/charlie/data/kiwi/data-upload/.venv/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 812, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/home/charlie/data/kiwi/data-upload/.venv/lib/python3.7/site-packages/google/api_core/future/polling.py", line 130, in result
    raise self._exception
google.api_core.exceptions.InternalServerError: 500 An internal error occurred and the request could not be completed. Error: 3144498

Thank you in advance!

The text was updated successfully, but these errors were encountered:

plamut · 2020-08-26T07:27:24Z

I can confirm the reported behavior, having a column explicitly specified as type GEOGRAPHY in the schema triggers the error on the backend.

@shollyman Any idea what the error 3144498 represents?
(also, fell free to increase the priority to P1 if no good workaround exists and user experience in considerably affected)

shollyman · 2020-08-26T16:52:45Z

This is a case where we need to improve our documentation.

The docstrings in the REST reference bear this constraint 'Only top-level, non-repeated, simple-type fields are supported'. But we're not doing a great job describing what that means.

You can see from the Standard SQL datatypes page that GEOGRAPHY types have problems with ordering and comparisons, which prevents the kinds of operations we need to do to cluster data.

I've filed internal issue 166457597 to improve this. Once we get better clarity in the reference, we can propagate that into the various client libraries.

plamut · 2020-08-26T16:57:10Z

OK thanks, that makes sense. I'll still keep this open for visibility until the docs are improved.

charlielito · 2020-08-26T17:14:54Z

@shollyman that is odd. Actually I wanted to try this because the documentation said that GEOGRAPHY columns supported clustering, which I thought was really cool. This section states the following:

Clustering columns must be top-level, non-repeated columns of one of the following types:
- DATE
- BOOL
- GEOGRAPHY
- INT64
- NUMERIC
- STRING
- TIMESTAMP

If you try the code sample I shared, it crashes also if you select as clustering column the GEOGRAPHY one

shollyman · 2020-08-26T17:52:03Z

It appears I've failed to read the details. Going to take a deeper look here, stay tuned.

shollyman · 2020-08-26T21:00:02Z

So, please disregard my comments on GEOGRAPHY type not being valid for CLUSTERING, that was in error and based on a stale recollection of the state of data clustering. We need to do a better job of authoritative linking some of our docs, but I'll handle that in the internal ticket.

I find I'm unable to reproduce exactly using your code example. Running it against my own project, it completes successfully. The job metadata is here:

{
  "configuration": {
    "jobType": "LOAD",
    "load": {
      "clustering": {
        "fields": [
          "logged_at"
        ]
      },
      "destinationTable": {
        "datasetId": "data_capture",
        "projectId": "shollyman-demo-test",
        "tableId": "toy"
      },
      "schema": {
        "fields": [
          {
            "mode": "REQUIRED",
            "name": "lat",
            "type": "FLOAT"
          },
          {
            "mode": "REQUIRED",
            "name": "lon",
            "type": "FLOAT"
          },
          {
            "mode": "REQUIRED",
            "name": "logged_at",
            "type": "TIMESTAMP"
          },
          {
            "mode": "REQUIRED",
            "name": "point",
            "type": "GEOGRAPHY"
          }
        ]
      },
      "sourceFormat": "PARQUET",
      "timePartitioning": {
        "field": "logged_at",
        "type": "DAY"
      },
      "writeDisposition": "WRITE_TRUNCATE"
    }
  },
  "etag": "LmJB7x/0E+MJxSNeKl2/3w==",
  "id": "shollyman-demo-test:US.130f8487-abd2-4deb-ad30-e4e750cf09bc",
  "jobReference": {
    "jobId": "130f8487-abd2-4deb-ad30-e4e750cf09bc",
    "location": "US",
    "projectId": "shollyman-demo-test"
  },
  "kind": "bigquery#job",
  "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/shollyman-demo-test/jobs/130f8487-abd2-4deb-ad30-e4e750cf09bc?location=US",
  "statistics": {
    "completionRatio": 1,
    "creationTime": "1598474653319",
    "endTime": "1598474662958",
    "load": {
      "badRecords": "0",
      "inputFileBytes": "1899",
      "inputFiles": "1",
      "outputBytes": "6400",
      "outputRows": "100"
    },
    "reservationUsage": [
      {
        "name": "default-pipeline",
        "slotMs": "67607"
      }
    ],
    "reservation_id": "default-pipeline",
    "startTime": "1598474653689",
    "totalSlotMs": "67607"
  },
  "status": {
    "state": "DONE"
  },
  "user_email": "test-robot@shollyman-demo-test.iam.gserviceaccount.com"
}

However, when I modify the clustering config and attempt to re-run, it does fail due to the existing table having a conflicting specification with the load specification. However, in this case it surfaces as HTTP 400 rather than 500 you're observing.

Incompatible table partitioning specification. Expects partitioning specification interval(type:day,field:logged_at) clustering(logged_at), but input partitioning specification is interval(type:day,field:logged_at) clustering(point)

I'm wondering if there's something we're missing related to the current state of the resources in your project?

shollyman · 2020-08-26T21:24:44Z

state of my env:

$ pip freeze
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
google-api-core==1.22.1
google-auth==1.20.1
google-cloud-bigquery==1.27.2
google-cloud-bigquery-storage==1.0.0
google-cloud-core==1.4.1
google-crc32c==1.0.0
google-resumable-media==1.0.0
googleapis-common-protos==1.52.0
grpcio==1.31.0
idna==2.10
numpy==1.19.1
pandas==1.1.1
protobuf==3.13.0
pyarrow==1.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
python-dateutil==2.8.1
pytz==2020.1
requests==2.24.0
rsa==4.6
Shapely==1.7.1
six==1.15.0
tqdm==4.48.2
urllib3==1.25.10

charlielito · 2020-08-26T21:53:46Z

What do you mean with the current state of the resources? In my example the dataset data_capture already exists and the table toy does not exist. I just tested the code snippet with a fresh env with your versions and I still get the same 500.

shollyman · 2020-08-26T23:13:40Z

I'm unable to reproduce what you're observing.

Perhaps try setting debug logging for the underlying http interactions, it may make it clearer what I'm missing. Add something like the following to the beginning of your repro and re-run:

import http
http.client.HTTPConnection.debuglevel=5

Note that the dumps will include live authentication headers (e.g. Bearer headers, namely) which may be used to impersonate you. Please redact the values before sharing.

If you'd prefer not to post details publicly, as an alternative you can also send them directly to me (same username at google.com).

charlielito · 2020-08-26T23:33:42Z

These are the logs:

send: b'POST /token HTTP/1.1\r\nHost: oauth2.googleapis.com\r\nUser-Agent: python-requests/2.24.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\ncontent-type: application/x-www-form-urlencoded\r\nContent-Length: 859\r\n\r\n'
send: b'assertion=XXXXXXXXXXXXXXXX&grant_type=urn%3Aietf%3Aparams%3Aoauth%3Agrant-type%3Ajwt-bearer'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Encoding: gzip
header: Date: Wed, 26 Aug 2020 23:27:40 GMT
header: Server: scaffolding on HTTPServer2
header: Cache-Control: private
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Transfer-Encoding: chunked
send: b'POST /upload/bigquery/v2/projects/PROJECT_ID/jobs?uploadType=resumable HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nAccept-Encoding: gzip, deflate\r\nAccept: application/json\r\nConnection: keep-alive\r\ncontent-type: application/json\r\nx-upload-content-type: */*\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\nContent-Length: 738\r\n\r\n'
send: b'{"jobReference": {"jobId": "171f09e7-2b61-4819-837c-db8619565f18", "projectId": "PROJECT_ID"}, "configuration": {"load": {"schema": {"fields": [{"mode": "REQUIRED", "name": "lat", "type": "FLOAT64", "description": null}, {"mode": "REQUIRED", "name": "lon", "type": "FLOAT64", "description": null}, {"mode": "REQUIRED", "name": "logged_at", "type": "TIMESTAMP", "description": null}, {"mode": "REQUIRED", "name": "point", "type": "GEOGRAPHY", "description": null}]}, "writeDisposition": "WRITE_TRUNCATE", "timePartitioning": {"type": "DAY", "field": "logged_at"}, "clustering": {"fields": ["logged_at"]}, "sourceFormat": "PARQUET", "destinationTable": {"projectId": "PROJECT_ID", "datasetId": "data_capture", "tableId": "toy"}}}}'
reply: 'HTTP/1.1 200 OK\r\n'
header: Content-Type: text/plain; charset=utf-8
header: X-GUploader-UploadID: UPLOAD_ID
header: Location: https://bigquery.googleapis.com/upload/bigquery/v2/projects/PROJECT_ID/jobs?uploadType=resumable&upload_id=UPLOAD_ID
header: ETag: QRf43t5EtSKPbjwAJWMVOQ==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Length: 0
header: Date: Wed, 26 Aug 2020 23:27:41 GMT
header: Server: UploadServer
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
send: b'PUT /upload/bigquery/v2/projects/PROJECT_ID/jobs?uploadType=resumable&upload_id=UPLOAD_ID HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: python-requests/2.24.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\ncontent-type: */*\r\ncontent-range: bytes 0-1898/1899\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\nContent-Length: 1899\r\n\r\n'
send: b'PAR1\x15\x04\...XXXXXX...'
reply: 'HTTP/1.1 200 OK\r\n'
header: X-GUploader-UploadID: UPLOAD_ID
header: ETag: KnJI7lOFKFjxuLn+P8rytA==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Length: 1594
header: Date: Wed, 26 Aug 2020 23:27:42 GMT
header: Server: UploadServer
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
send: b'GET /bigquery/v2/projects/PROJECT_ID/jobs/171f09e7-2b61-4819-837c-db8619565f18?location=US HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nAccept-Encoding: gzip\r\nAccept: */*\r\nConnection: keep-alive\r\nX-Goog-API-Client: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: ETag: KnJI7lOFKFjxuLn+P8rytA==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Encoding: gzip
header: Date: Wed, 26 Aug 2020 23:27:43 GMT
header: Server: ESF
header: Cache-Control: private
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Transfer-Encoding: chunked
send: b'GET /bigquery/v2/projects/PROJECT_ID/jobs/171f09e7-2b61-4819-837c-db8619565f18?location=US HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nAccept-Encoding: gzip\r\nAccept: */*\r\nConnection: keep-alive\r\nX-Goog-API-Client: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: ETag: KnJI7lOFKFjxuLn+P8rytA==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Encoding: gzip
header: Date: Wed, 26 Aug 2020 23:27:43 GMT
header: Server: ESF
header: Cache-Control: private
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Transfer-Encoding: chunked
send: b'GET /bigquery/v2/projects/PROJECT_ID/jobs/171f09e7-2b61-4819-837c-db8619565f18?location=US HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nAccept-Encoding: gzip\r\nAccept: */*\r\nConnection: keep-alive\r\nX-Goog-API-Client: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: ETag: zb3hm09cwfnAiOYjmzMmaw==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Encoding: gzip
header: Date: Wed, 26 Aug 2020 23:27:45 GMT
header: Server: ESF
header: Cache-Control: private
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Transfer-Encoding: chunked
send: b'GET /bigquery/v2/projects/PROJECT_ID/jobs/171f09e7-2b61-4819-837c-db8619565f18?location=US HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nAccept-Encoding: gzip\r\nAccept: */*\r\nConnection: keep-alive\r\nX-Goog-API-Client: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: ETag: c3pJouaM/0o2KaV0DuaVnQ==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Encoding: gzip
header: Date: Wed, 26 Aug 2020 23:27:46 GMT
header: Server: ESF
header: Cache-Control: private
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Transfer-Encoding: chunked
send: b'GET /bigquery/v2/projects/PROJECT_ID/jobs/171f09e7-2b61-4819-837c-db8619565f18?location=US HTTP/1.1\r\nHost: bigquery.googleapis.com\r\nUser-Agent: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nAccept-Encoding: gzip\r\nAccept: */*\r\nConnection: keep-alive\r\nX-Goog-API-Client: gl-python/3.7.7 grpc/1.31.0 gax/1.22.1 gapic/1.27.2 gccl/1.27.2\r\nauthorization: Bearer XXXXXXXXXXXXXXXX\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: ETag: 7ztKRvG3ECRYHY6fwL0rpg==
header: Content-Type: application/json; charset=UTF-8
header: Vary: Origin
header: Vary: X-Origin
header: Vary: Referer
header: Content-Encoding: gzip
header: Date: Wed, 26 Aug 2020 23:28:02 GMT
header: Server: ESF
header: Cache-Control: private
header: X-XSS-Protection: 0
header: X-Frame-Options: SAMEORIGIN
header: X-Content-Type-Options: nosniff
header: Alt-Svc: h3-29=":443"; ma=2592000,h3-27=":443"; ma=2592000,h3-T051=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
header: Transfer-Encoding: chunked

Then same traceback as example

shollyman · 2020-08-27T00:03:14Z

I'm not seeing the request that yields a 500 reply. We should just be polling job status, so perhaps grabbing the job response through another means may help?

If you've got the gcloud SDK installed, try bq show -j --format=prettyjson 171f09e7-2b61-4819-837c-db8619565f18

charlielito · 2020-08-27T00:33:05Z

{
  "configuration": {
    "jobType": "LOAD", 
    "load": {
      "clustering": {
        "fields": [
          "logged_at"
        ]
      }, 
      "destinationTable": {
        "datasetId": "data_capture", 
        "projectId": "PROJECT_ID", 
        "tableId": "toy"
      }, 
      "schema": {
        "fields": [
          {
            "mode": "REQUIRED", 
            "name": "lat", 
            "type": "FLOAT"
          }, 
          {
            "mode": "REQUIRED", 
            "name": "lon", 
            "type": "FLOAT"
          }, 
          {
            "mode": "REQUIRED", 
            "name": "logged_at", 
            "type": "TIMESTAMP"
          }, 
          {
            "mode": "REQUIRED", 
            "name": "point", 
            "type": "GEOGRAPHY"
          }
        ]
      }, 
      "sourceFormat": "PARQUET", 
      "timePartitioning": {
        "field": "logged_at", 
        "type": "DAY"
      }, 
      "writeDisposition": "WRITE_TRUNCATE"
    }
  }, 
  "etag": "7ztKRvG3ECRYHY6fwL0rpg==", 
  "id": "PROJECT_ID:US.171f09e7-2b61-4819-837c-db8619565f18", 
  "jobReference": {
    "jobId": "171f09e7-2b61-4819-837c-db8619565f18", 
    "location": "US", 
    "projectId": "PROJECT_ID"
  }, 
  "kind": "bigquery#job", 
  "selfLink": "https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT_ID/jobs/171f09e7-2b61-4819-837c-db8619565f18?location=US", 
  "statistics": {
    "completionRatio": 0, 
    "creationTime": "1598484462115", 
    "endTime": "1598484470223", 
    "reservation_id": "default-pipeline", 
    "startTime": "1598484462475", 
    "totalSlotMs": "12726"
  }, 
  "status": {
    "errorResult": {
      "message": "An internal error occurred and the request could not be completed. Error: 3144498", 
      "reason": "internalError"
    }, 
    "errors": [
      {
        "message": "An internal error occurred and the request could not be completed. Error: 3144498", 
        "reason": "internalError"
      }
    ], 
    "state": "DONE"
  }, 
  "user_email": "charlie-lenovo@PROJECT_ID.iam.gserviceaccount.com"
}

shollyman · 2020-08-27T00:51:37Z

Interesting. Mind sending me the project id out of band (email)? It'll help me investigate the underlying error to see what's going wrong.

shollyman · 2020-08-28T00:04:18Z

current state: backend team investigating the failure.

charlielito · 2020-09-03T15:33:17Z

@shollyman any news from this end?

shollyman · 2020-09-14T19:13:17Z

Getting back to this (was out on vacation).

Issue appears to be related to parquet interpreter and issue was related to the format of the geography column (the dataframe integration serializes to parquet when sending row data to BigQuery). Since then, there have been changes to the parquet readers responsible, but it's not immediately clear if it addresses your issue.

Could you retry? If you're still seeing issues, next step is to understand how your environment differs from mine, where I cannot reproduce. Possibly there's a difference in libraries like the shapely dependency that converts to well-known-byte (WKB) representation.

If you're still getting issues, if you could also provide another job ID where the ingestion is failing?

charlielito · 2020-09-15T17:55:49Z

state of my env:

$ pip freeze
cachetools==4.1.1
certifi==2020.6.20
cffi==1.14.2
chardet==3.0.4
google-api-core==1.22.1
google-auth==1.20.1
google-cloud-bigquery==1.27.2
google-cloud-bigquery-storage==1.0.0
google-cloud-core==1.4.1
google-crc32c==1.0.0
google-resumable-media==1.0.0
googleapis-common-protos==1.52.0
grpcio==1.31.0
idna==2.10
numpy==1.19.1
pandas==1.1.1
protobuf==3.13.0
pyarrow==1.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.20
python-dateutil==2.8.1
pytz==2020.1
requests==2.24.0
rsa==4.6
Shapely==1.7.1
six==1.15.0
tqdm==4.48.2
urllib3==1.25.10

I have this same exact environment. I am running python3.7.7 with conda on Ubuntu20.
The problem still persists. One job failing (tested today) 5a8b7f92-edf0-418f-af75-d378de4ebe68

tswast · 2020-09-15T22:04:35Z

@charlielito Could you patch load_table_from_dataframe at

python-bigquery/google/cloud/bigquery/client.py

Line 2228 in ae647eb

tmppath,

to write to your desktop / somewhere that won't get deleted and post the resulting parquet file here?

Peter did as much in #56 (comment) and it was helpful in narrowing down the backend issue (in that case a Feature Request to support DATETIME in Parquet)

charlielito · 2020-09-15T22:34:42Z

@tswast Here is the zipped .parquet file generated
test_parquet.zip

charlielito · 2020-10-28T17:34:50Z

UPDATE: Somehow now this works, maybe the backend was updated or fixed, it would be nice if a notification could be sent ;)
Closing this issue now

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Aug 25, 2020

meredithslota added the type: question Request for information or clarification. Not an issue. label Aug 25, 2020

meredithslota assigned plamut Aug 25, 2020

plamut added type: docs Improvement to the documentation for an API. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 26, 2020

plamut added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: docs Improvement to the documentation for an API. labels Aug 27, 2020

plamut removed their assignment Oct 1, 2020

charlielito closed this as completed Oct 28, 2020

JustinBeckwith assigned charlielito Feb 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

500 server error when creating table using clustering #242

500 server error when creating table using clustering #242

charlielito commented Aug 25, 2020 •

edited

plamut commented Aug 26, 2020 •

edited

shollyman commented Aug 26, 2020

plamut commented Aug 26, 2020

charlielito commented Aug 26, 2020 •

edited

shollyman commented Aug 26, 2020

shollyman commented Aug 26, 2020

shollyman commented Aug 26, 2020 •

edited

charlielito commented Aug 26, 2020

shollyman commented Aug 26, 2020

charlielito commented Aug 26, 2020

shollyman commented Aug 27, 2020

charlielito commented Aug 27, 2020

shollyman commented Aug 27, 2020

shollyman commented Aug 28, 2020

charlielito commented Sep 3, 2020

shollyman commented Sep 14, 2020

charlielito commented Sep 15, 2020

tswast commented Sep 15, 2020

charlielito commented Sep 15, 2020 •

edited

charlielito commented Oct 28, 2020

500 server error when creating table using clustering #242

500 server error when creating table using clustering #242

Comments

charlielito commented Aug 25, 2020 • edited

Environment details

Steps to reproduce

Code example

Stack trace

plamut commented Aug 26, 2020 • edited

shollyman commented Aug 26, 2020

plamut commented Aug 26, 2020

charlielito commented Aug 26, 2020 • edited

shollyman commented Aug 26, 2020

shollyman commented Aug 26, 2020

shollyman commented Aug 26, 2020 • edited

charlielito commented Aug 26, 2020

shollyman commented Aug 26, 2020

charlielito commented Aug 26, 2020

shollyman commented Aug 27, 2020

charlielito commented Aug 27, 2020

shollyman commented Aug 27, 2020

shollyman commented Aug 28, 2020

charlielito commented Sep 3, 2020

shollyman commented Sep 14, 2020

charlielito commented Sep 15, 2020

tswast commented Sep 15, 2020

charlielito commented Sep 15, 2020 • edited

charlielito commented Oct 28, 2020

charlielito commented Aug 25, 2020 •

edited

plamut commented Aug 26, 2020 •

edited

charlielito commented Aug 26, 2020 •

edited

shollyman commented Aug 26, 2020 •

edited

charlielito commented Sep 15, 2020 •

edited