-
Notifications
You must be signed in to change notification settings - Fork 124
Closed
Labels
api: bigqueryIssues related to the googleapis/python-bigquery-pandas API.Issues related to the googleapis/python-bigquery-pandas API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
In the new pandas_gqb Version 0.24.0 there seems to be a bug with writing to BigQuery tables from dataframes where there are nested columns which only include NULL values. However the Schema type should be clear since we provide a Schema for type information.
With pandas_gbq Version=0.24.0 - we get a "AttributeError: 'NoneType' object has no attribute 'to_api_repr' " Error - for the previous version 0.23.0 we do not get this error and the script works.
I assume the reason is that in the library we try to get the schema for this dataframe in "gbq.py" eventhough the schema would be provided. The function "generate_bq_schema" fails eventhough the execution would not be necessary in this case:
default_schema = _generate_bq_schema(dataframe) --> Throws the error since it can not infer the schema
# If table_schema isn't provided, we'll create one for you
if not table_schema:
table_schema = default_schema
# It table_schema is provided, we'll update the default_schema to the provided table_schema
else:
table_schema = pandas_gbq.schema.update_schema(
default_schema, dict(fields=table_schema)
)
Environment details
- Python version: 3.12.7
- pip version: 24.2
pandas-gbqversion: 0.24.0- pandas version: 2.2.3
- numpy version: 1.26.4
Steps to reproduce
- Execute script below with the above versions -> Fails.
- Execute script below with above versions - but adjust pandas-gbq==0.23.0 -> Works
Code example
import pandas_gbq
import pandas as pd
import numpy as np
DESTINATION_TABLE_ID = 'INSERT_YOUR_TABLE_HERE'
schema = [
{'name': 'Id', 'type': 'INTEGER', 'mode': 'NULLABLE'},
{'name': 'Positions',
'type': 'RECORD',
'mode': 'REPEATED',
'fields': [
{'name': 'PositionState',
'type': 'STRING',
'mode': 'NULLABLE'}
]
}
]
works_df = pd.DataFrame([{
'Id': 123,
'Positions': None
}])
error_df = pd.DataFrame([{
'Id': 123,
'Positions': np.array([{
'PositionState': None
}])
}])
# Works with warning
# pandas_gbq.to_gbq(works_df, destination_table=DESTINATION_TABLE_ID, table_schema=schema, if_exists='replace')
# Throws error
pandas_gbq.to_gbq(error_df, destination_table=DESTINATION_TABLE_ID, table_schema=schema, if_exists='replace')Stack trace
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[20], line 2
1 import pandas_gbq
----> 2 pandas_gbq.to_gbq(dummy_df, destination_table='DESTINATION_TABLE_ID', table_schema=schema)
File ~/anaconda3/envs/py-312/lib/python3.12/site-packages/pandas_gbq/gbq.py:1163, in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key, auth_redirect_uri, client_id, client_secret, user_agent, rfc9110_delimiter)
1160 dataset_id = destination_table_ref.dataset_id
1161 table_id = destination_table_ref.table_id
-> 1163 default_schema = _generate_bq_schema(dataframe)
1164 # If table_schema isn't provided, we'll create one for you
1165 if not table_schema:
File ~/anaconda3/envs/py-312/lib/python3.12/site-packages/pandas_gbq/gbq.py:1249, in _generate_bq_schema(df, default_type)
1246 fields_json = []
1248 for field in fields:
-> 1249 fields_json.append(field.to_api_repr())
1251 return {"fields": fields_json}
File ~/anaconda3/envs/py-312/lib/python3.12/site-packages/google/cloud/bigquery/schema.py:353, in SchemaField.to_api_repr(self)
350 # If this is a RECORD type, then sub-fields are also included,
351 # add this to the serialized representation.
352 if self.field_type.upper() in _STRUCT_TYPES:
--> 353 answer["fields"] = [f.to_api_repr() for f in self.fields]
355 # Done; return the serialized dictionary.
356 return answer
AttributeError: 'NoneType' object has no attribute 'to_api_repr'
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the googleapis/python-bigquery-pandas API.Issues related to the googleapis/python-bigquery-pandas API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.