Skip to content

AttributeError: 'NoneType' object has no attribute 'to_api_repr'  #836

@berniwal

Description

@berniwal

In the new pandas_gqb Version 0.24.0 there seems to be a bug with writing to BigQuery tables from dataframes where there are nested columns which only include NULL values. However the Schema type should be clear since we provide a Schema for type information.

With pandas_gbq Version=0.24.0 - we get a "AttributeError: 'NoneType' object has no attribute 'to_api_repr' " Error - for the previous version 0.23.0 we do not get this error and the script works.

I assume the reason is that in the library we try to get the schema for this dataframe in "gbq.py" eventhough the schema would be provided. The function "generate_bq_schema" fails eventhough the execution would not be necessary in this case:

  default_schema = _generate_bq_schema(dataframe) --> Throws the error since it can not infer the schema
  # If table_schema isn't provided, we'll create one for you
  if not table_schema:
      table_schema = default_schema
  # It table_schema is provided, we'll update the default_schema to the provided table_schema
  else:
      table_schema = pandas_gbq.schema.update_schema(
          default_schema, dict(fields=table_schema)
      )

Environment details

  • Python version: 3.12.7
  • pip version: 24.2
  • pandas-gbq version: 0.24.0
  • pandas version: 2.2.3
  • numpy version: 1.26.4

Steps to reproduce

  1. Execute script below with the above versions -> Fails.
  2. Execute script below with above versions - but adjust pandas-gbq==0.23.0 -> Works

Code example

import pandas_gbq
import pandas as pd
import numpy as np

DESTINATION_TABLE_ID = 'INSERT_YOUR_TABLE_HERE'

schema = [
 {'name': 'Id', 'type': 'INTEGER', 'mode': 'NULLABLE'},
 {'name': 'Positions',
  'type': 'RECORD',
  'mode': 'REPEATED',
  'fields': [
   {'name': 'PositionState',
    'type': 'STRING',
    'mode': 'NULLABLE'}
  ]
}
]

works_df = pd.DataFrame([{
        'Id': 123,
        'Positions': None
}])

error_df = pd.DataFrame([{
        'Id': 123,
        'Positions': np.array([{
            'PositionState': None
        }])
}])

# Works with warning
# pandas_gbq.to_gbq(works_df, destination_table=DESTINATION_TABLE_ID, table_schema=schema, if_exists='replace')

# Throws error
pandas_gbq.to_gbq(error_df, destination_table=DESTINATION_TABLE_ID, table_schema=schema, if_exists='replace')

Stack trace

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[20], line 2
      1 import pandas_gbq
----> 2 pandas_gbq.to_gbq(dummy_df, destination_table='DESTINATION_TABLE_ID', table_schema=schema)

File ~/anaconda3/envs/py-312/lib/python3.12/site-packages/pandas_gbq/gbq.py:1163, in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key, auth_redirect_uri, client_id, client_secret, user_agent, rfc9110_delimiter)
   1160 dataset_id = destination_table_ref.dataset_id
   1161 table_id = destination_table_ref.table_id
-> 1163 default_schema = _generate_bq_schema(dataframe)
   1164 # If table_schema isn't provided, we'll create one for you
   1165 if not table_schema:

File ~/anaconda3/envs/py-312/lib/python3.12/site-packages/pandas_gbq/gbq.py:1249, in _generate_bq_schema(df, default_type)
   1246 fields_json = []
   1248 for field in fields:
-> 1249     fields_json.append(field.to_api_repr())
   1251 return {"fields": fields_json}

File ~/anaconda3/envs/py-312/lib/python3.12/site-packages/google/cloud/bigquery/schema.py:353, in SchemaField.to_api_repr(self)
    350 # If this is a RECORD type, then sub-fields are also included,
    351 # add this to the serialized representation.
    352 if self.field_type.upper() in _STRUCT_TYPES:
--> 353     answer["fields"] = [f.to_api_repr() for f in self.fields]
    355 # Done; return the serialized dictionary.
    356 return answer

AttributeError: 'NoneType' object has no attribute 'to_api_repr'

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-pandas API.priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions