deploy: add support for managing SQL views #33

andreypopp · 2020-07-10T10:52:09Z

This PR takes a stab at implementing support for managing SQL views with rex.deploy.

The example usage:

- view: persons
  definition: |
    select name, email
    from user

- view: persons
  definition: |
    select name, email
    from user
    union
    select name, email
    from patient

- view: persons
  present: false

I've been trying to follow the code for tables but of course making this specific to the views.

Some notes/design questions appeared:

How things are introspected has changed, previously both tables and views were loaded as TableImage objects. Now we load views as ViewImage objects and we DON'T load view columns. I suspect this will make a different when querying views with HTSQL?
When building a model out of facts we need to check for name clashes between tables and views as they share the same namespace in the database.
Should we raise an error in case the same named view appears with different definitions? Or should we instead override the former with the latter (dangerous)?
Right now we load view definition with pg_get_viewdef function but I'm not sure if it is suitable for comparison with view definition in facts as db can probably mangle the original SQL. We should probably store view definition in a COMMENT in a database.
Do we need to support view defined with HTSQL? That would be cool.
Would be nice to be able to specify a pk for a view (required for HTSQL to query such view).
We need to track dependencies:

As we use SQL for view definitions we might need to use SQL parser for that. I'm thinking of using https://pglast.readthedocs.io/en/latest/ which wraps PostgreSQL's parser.

If we want to add HTSQL based view then we can use HTSQL to infer the dependencies.
- view -> table, why?
  - So we can require to drop view before we drop the table. It is implemented for linked tables? Would be nice as it will allow to validate migrations w/o accessing a database.
- view -> view, why?
  - so we can require to drop dependent views before we drop the view
  - so we can change definition of a view with dependents (in this case we need to drop and re-create a view and its dependents)

This commit takes a stab at implementing support for managing SQL views with rex.deploy. The example usage: ``` - view: persons definition: | select name, email from user - view: persons definition: | select name, email from user union select name, email from patient - view: persons present: false ``` I've been trying to follow the code for tables but of course making this specific to the views. Some notes/design questions appeared: - How things are introspected has changed, previously both tables and views were loaded as TableImage objects. Now we load views as ViewImage objects and we DON'T load view columns. I suspect this will make a different when querying views with HTSQL? - When building a model out of facts we need to check for name clashes between tables and views as they share the same namespace in the database. - Should we raise an error in case the same named view appears with different definitions? Or should we instead override the former with the latter (dangerous)? - Right now we load view definition with `pg_get_viewdef` function but I'm not sure if it is suitable for comparison with view definition in facts as db can probably mangle the original SQL. We should probably store view definition in a COMMENT in a database. - Do we need to support view defined with HTSQL? That would be cool. - Would be nice to be able to specify a pk for a view (required for HTSQL to query such view). - We need to track dependencies: As we use SQL for view definitions we might need to use SQL parser for that. I'm thinking of using https://pglast.readthedocs.io/en/latest/ which wraps PostgreSQL's parser. If we want to add HTSQL based view then we can use HTSQL to infer the dependencies. - view -> table, why? - So we can require to drop view before we drop the table. It is implemented for linked tables? Would be nice as it will allow to validate migrations w/o accessing a database. - view -> view, why? - so we can require to drop dependent views before we drop the view - so we can change definition of a view with dependents (in this case we need to drop and re-create a view and its dependents)

jayclassless · 2020-07-13T20:39:00Z

* Should we raise an error in case the same named view appears with different definitions? Or should we instead override the former with the latter (dangerous)?

Absolutely error. Is there a situation where this would be intentional?

* Right now we load view definition with `pg_get_viewdef` function but I'm not sure if it is suitable for comparison with view definition in facts as db can probably mangle the original SQL. We should probably store view definition in a COMMENT in a database.

If we can't rely on the original SQL from the server, would we need the whole SQL in the comment, or would a checksum suffice?

* Do we need to support view defined with HTSQL? That would be cool.

That'd be interesting, but possibly difficult if the HTSQL query is specified using tables/columns that are in the deploy configuration but not yet realized in the database?

* Would be nice to be able to specify a pk for a view (required for HTSQL to query such view).

I wouldn't require it, but it'd definitely be nice to have so maybe the rex_deploy extension (or where ever is most appropriate) can automatically set the necessary tweak.override: unique-keys to make HTSQL happy.

xitology · 2020-07-15T16:36:09Z

This PR will break HTSQL introspector in htsql_rex_deploy/introspect.py, which expects the schema to contain TableImage objects, and will complain about missing attributes when it receives ViewImage instead:

  File "/app/src/rex.deploy/src/htsql_rex_deploy/introspect.py", line 51, in IntrospectDeploy.__call__
    if table_image.primary_key is not None:
AttributeError: 'ViewImage' object has no attribute 'primary_key'

It is not difficult to fix by adding missing attributes such as primary_key to ViewImage.

But since views and tables are really indistinguishable to HTSQL, perhaps we should base a view fact on the table fact? That is, require all view columns to be declared as facts? Although column information can be extracted from the view definition or from pg_catalog, this will also allow us to configure view identity and links, which will make it better integrated with HTSQL.

andreypopp requested a review from xitology July 10, 2020 10:52

andreypopp marked this pull request as draft July 10, 2020 10:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy: add support for managing SQL views #33

deploy: add support for managing SQL views #33

andreypopp commented Jul 10, 2020

jayclassless commented Jul 13, 2020

xitology commented Jul 15, 2020

deploy: add support for managing SQL views #33

Are you sure you want to change the base?

deploy: add support for managing SQL views #33

Conversation

andreypopp commented Jul 10, 2020

jayclassless commented Jul 13, 2020

xitology commented Jul 15, 2020