df.to_gbq(): InvalidSchema when using 'type': 'INT64' and if_exists = 'append' #322

Azerane · 2020-07-29T13:05:10Z

Hello there,

I'm facing an issue with the function df.to_gbq()

From my understanding, it seems that when we are using if_exists = 'append' when the destination_table already exists, we are retrieving the original_schema from destination_table.

The original_schema will contain every fields but type retrieve is INTEGER and not INT64 anymore.
Since the code is simply checking that all fields in the schema_local are the same in the schema_remote, this conducts to a missmatch and function schema_is_subset is returning False instead of True.

I'm not sure about where we should fix this error, but it's kind of annoying when working with integers values.

The text was updated successfully, but these errors were encountered:

norbertsteele · 2020-08-03T19:38:06Z

Is there a workaround that anyone has found?

gilangardya · 2020-09-12T15:05:44Z

Hi @Azerane,

Maybe you get that error because you use object instead of int64 to represent some integer values.
If you need object representation because you store big-integer (i.e. 2**65), you should change the data type since BigQuery INTEGER range is -2^63 .. 2^63-1.
But if you don't need that, you can cast the Series into int64 first before loading into the BigQuery table, or you can specify the table_schema in to_gbq parameters.
If that's not your case, please share more details, that would be great.

Maybe it can help you too @norbertsteele

tswast · 2020-11-09T20:03:42Z

I'm able to reproduce with the following code example:

import pandas_gbq

df1 = pandas.DataFrame({"int_col": [1, 2, 3]})
pandas_gbq.to_gbq(
    df1,
    "my_dataset.int64table",
    table_schema=[{"name": "int_col", "type": "INT64"}]
)
df2 = pandas.DataFrame({"int_col": [4, 5, 6]})

# This fails, despite using the same schema as was used for the initial
# upload.
pandas_gbq.to_gbq(
    df2,
    "my_dataset.int64table",
    table_schema=[{"name": "int_col", "type": "INT64"}],
    if_exists="append",

tswast · 2020-11-09T20:09:40Z

Is there a workaround that anyone has found?

Workaround is to use legacy SQL type names (INTEGER, not INT64), but this should be fixed once #340 is merged and released (target: 0.14.1)

SDonatas · 2022-12-18T18:23:33Z

Had this error in few days. The problem is with wrong error handling. The issue I had was - pandas dataframe did not have datetime columns converted as specified in supplied schema. The integer error, at least for me was a false flag.

Bellow fraction of the code solved it for me:

    if schema == None:
        schema = [{'name': col, 'type': 'STRING'} for col in df.columns]
        writeSchema(schema)

    def clean_types(x):
        for schema_item in schema:
            name_par = schema_item['name']
            type_par = schema_item['type']

            if type_par == 'STRING':
                if type(x[name_par]) in (list, dict):
                    x[name_par] = json.dumps(x[name_par])
                else:
                    x[name_par] = str(x[name_par])
            elif type_par == 'INTEGER':
                x[name_par] = int(x[name_par])
            elif type_par in ('TIMESTAMP', 'DATETIME'):
                x[name_par] = pd.to_datetime(x[name_par])

        return x

    
    df = df.apply(lambda x: clean_types(x), axis = 1)

And scema json:

[ { "name": "object", "type": "STRING" }, { "name": "id", "type": "STRING" }, { "name": "subject", "type": "STRING" }, { "name": "from_email", "type": "STRING" }, { "name": "from_name", "type": "STRING" }, { "name": "sent_at", "type": "DATETIME" }, { "name": "send_time", "type": "DATETIME" }, { "name": "name", "type": "STRING" }, { "name": "lists", "type": "STRING" }, { "name": "excluded_lists", "type": "STRING" }, { "name": "status", "type": "STRING" }, { "name": "status_id", "type": "INTEGER" }, { "name": "status_label", "type": "STRING" }, { "name": "created", "type": "DATETIME" }, { "name": "updated", "type": "DATETIME" }, { "name": "num_recipients", "type": "INTEGER" }, { "name": "campaign_type", "type": "STRING" }, { "name": "is_segmented", "type": "BOOLEAN" }, { "name": "message_type", "type": "STRING" }, { "name": "template_id", "type": "STRING" } ]

tswast added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Jul 29, 2020

tswast self-assigned this Nov 9, 2020

tswast added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Nov 9, 2020

tswast mentioned this issue Nov 9, 2020

BUG: support INT64 and other standard SQL aliases in to_gbq table_schema #340

Merged

4 tasks

tswast closed this as completed in #340 Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.to_gbq(): InvalidSchema when using 'type': 'INT64' and if_exists = 'append' #322

df.to_gbq(): InvalidSchema when using 'type': 'INT64' and if_exists = 'append' #322

Azerane commented Jul 29, 2020 •

edited

Loading

norbertsteele commented Aug 3, 2020

gilangardya commented Sep 12, 2020

tswast commented Nov 9, 2020

tswast commented Nov 9, 2020

SDonatas commented Dec 18, 2022 •

edited

Loading

df.to_gbq(): InvalidSchema when using 'type': 'INT64' and if_exists = 'append' #322

df.to_gbq(): InvalidSchema when using 'type': 'INT64' and if_exists = 'append' #322

Comments

Azerane commented Jul 29, 2020 • edited Loading

norbertsteele commented Aug 3, 2020

gilangardya commented Sep 12, 2020

tswast commented Nov 9, 2020

tswast commented Nov 9, 2020

SDonatas commented Dec 18, 2022 • edited Loading

Azerane commented Jul 29, 2020 •

edited

Loading

SDonatas commented Dec 18, 2022 •

edited

Loading