Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated Type support for Structs #376

Open
stanzheng opened this issue Nov 12, 2021 · 2 comments
Open

Repeated Type support for Structs #376

stanzheng opened this issue Nov 12, 2021 · 2 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@stanzheng
Copy link

stanzheng commented Nov 12, 2021

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Describe the solution you'd like

Ability to insert repeated nested STRUCT records. attribute to define repeated nested records

https://cloud.google.com/bigquery/docs/nested-repeated

TASK_SCHEMA = Table(
    "TASK_SCHEMA",
    meta,
 Column('updated', String(), table=None),
 Column('updater', String(), table=None),
 Column('url', String(), table=None),
 Column('verification_status', String(), table=None),
 Column('how_tos', STRUCT(title=String(), id=String(), 
url=String(), slug=String(), text=String()), table=None, **REPEATED=True**)),
)

Structs should support the bigquery REPEATED attribute on records.

A clear and concise description of what you want to happen.
Describe alternatives you've considered

Serializing Records as Strings to deserialize in the client.

A clear and concise description of any alternative solutions or features you've considered.
Additional context

tried using Arrays in conjunction with Structs

(
 Column('updater', String(), table=None),
 Column('url', String(), table=None),
 Column('verification_status', String(), table=None),
 Column('how_tos', 

ARRAY('how_tos', as_tuple=
Column('how_tos', STRUCT(title=String(), id=String(), url=String(), slug=String(), text=String()), table=None)), table=None)

error stacktrace

  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 2938, in visit_create_column
    text = self.get_column_specification(column, first_pk=first_pk)
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy_bigquery/base.py", line 606, in get_column_specification
    colspec = super(BigQueryDDLCompiler, self).get_column_specification(
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 3137, in get_column_specification
    + self.dialect.type_compiler.process(
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 403, in process
    return type_._compiler_dispatch(self, **kw)
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/visitors.py", line 96, in _compiler_dispatch
    return meth(self, **kw)
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy_bigquery/base.py", line 561, in visit_ARRAY
    return "ARRAY<{}>".format(self.process(type_.item_type, **kw))
  File "/Users/stanley.zheng/code/nyc3-ss-source-cagliostro/.venv/lib/python3.8/site-packages/sqlalchemy/sql/compiler.py", line 403, in process
    return type_._compiler_dispatch(self, **kw)
AttributeError: 'str' object has no attribute '_compiler_dispatch'

Add any other context or screenshots about the feature request here.

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. label Nov 12, 2021
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Nov 13, 2021
@meredithslota meredithslota added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Nov 15, 2021
@jdimatteo
Copy link

jdimatteo commented Sep 12, 2022

This appears to already be supported. If I'm missing something, please let me know so I don't run into a problem soon!

For example, with sqlalchemy-bigquery 1.4.4:

import sqlalchemy
import sqlalchemy.ext.declarative
import sqlalchemy_bigquery
import os

Base = sqlalchemy.ext.declarative.declarative_base()

class Foo(Base):
    __tablename__ = 'foo'

    zap = sqlalchemy.Column(sqlalchemy.String, primary_key=True)
    
    bar = sqlalchemy.Column(
        sqlalchemy.ARRAY(
            sqlalchemy_bigquery.STRUCT(
                a=sqlalchemy.String,
                b=sqlalchemy.String,
                c=sqlalchemy.ARRAY(
                    sqlalchemy_bigquery.STRUCT(
                        x=sqlalchemy.String,
                        y=sqlalchemy.String)))))

engine = sqlalchemy.create_engine(f'bigquery://{os.environ["BQ_PROJECT_ID"]}/{os.environ["BQ_DATASET_ID"]}')
Base.metadata.create_all(engine)

This successfully creates a table with repeated structs:
image

And here is a more verbose example that includes inserts that I tested works: https://gist.github.com/jdimatteo/c223668095392e629ca01aadada4171c

@jlynchMicron
Copy link

jlynchMicron commented Mar 15, 2023

Can support be added for representing all the sub-columns of an unnested array of structs? I am having to manually specify the columns that exist in an unnested array of structs instead of SQLalchemy "knowing" the columns that live in the unnested array of structs.

Also, this BigQuery documentation should probably be updated, since it mentions nothing about an unnested array of structs forming their own new columns in the result.
Documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/arrays#query_structs_in_an_array

Example (Incorrect result, participants results in 2 columns):

WITH Races AS (
  SELECT "800M" AS race,
    [STRUCT("Rudisha" AS name, [23.4, 26.3, 26.4, 26.1] AS laps),
     STRUCT("Makhloufi" AS name, [24.5, 25.4, 26.6, 26.1] AS laps),
     STRUCT("Murphy" AS name, [23.9, 26.0, 27.0, 26.0] AS laps),
     STRUCT("Bosse" AS name, [23.6, 26.2, 26.5, 27.1] AS laps),
     STRUCT("Rotich" AS name, [24.7, 25.6, 26.9, 26.4] AS laps),
     STRUCT("Lewandowski" AS name, [25.0, 25.7, 26.3, 27.2] AS laps),
     STRUCT("Kipketer" AS name, [23.2, 26.1, 27.3, 29.4] AS laps),
     STRUCT("Berian" AS name, [23.7, 26.1, 27.0, 29.3] AS laps)]
       AS participants)
SELECT
  race,
  participant
FROM Races AS r
CROSS JOIN UNNEST(r.participants) AS participant;

+------+---------------------------------------+
| race | participant                           |
+------+---------------------------------------+
| 800M | {Rudisha, [23.4, 26.3, 26.4, 26.1]}   |
| 800M | {Makhloufi, [24.5, 25.4, 26.6, 26.1]} |
| 800M | {Murphy, [23.9, 26, 27, 26]}          |
| 800M | {Bosse, [23.6, 26.2, 26.5, 27.1]}     |
| 800M | {Rotich, [24.7, 25.6, 26.9, 26.4]}    |
| 800M | {Lewandowski, [25, 25.7, 26.3, 27.2]} |
| 800M | {Kipketer, [23.2, 26.1, 27.3, 29.4]}  |
| 800M | {Berian, [23.7, 26.1, 27, 29.3]}      |
+------+---------------------------------------+

Actual Result:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-sqlalchemy API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

5 participants