Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deploy: add support for managing SQL views #33

Draft
wants to merge 1 commit into
base: baseline/master
Choose a base branch
from

Conversation

andreypopp
Copy link
Member

This PR takes a stab at implementing support for managing SQL views with rex.deploy.

The example usage:

- view: persons
  definition: |
    select name, email
    from user

- view: persons
  definition: |
    select name, email
    from user
    union
    select name, email
    from patient

- view: persons
  present: false

I've been trying to follow the code for tables but of course making this specific to the views.

Some notes/design questions appeared:

  • How things are introspected has changed, previously both tables and views were loaded as TableImage objects. Now we load views as ViewImage objects and we DON'T load view columns. I suspect this will make a different when querying views with HTSQL?

  • When building a model out of facts we need to check for name clashes between tables and views as they share the same namespace in the database.

  • Should we raise an error in case the same named view appears with different definitions? Or should we instead override the former with the latter (dangerous)?

  • Right now we load view definition with pg_get_viewdef function but I'm not sure if it is suitable for comparison with view definition in facts as db can probably mangle the original SQL. We should probably store view definition in a COMMENT in a database.

  • Do we need to support view defined with HTSQL? That would be cool.

  • Would be nice to be able to specify a pk for a view (required for HTSQL to query such view).

  • We need to track dependencies:

    As we use SQL for view definitions we might need to use SQL parser for that. I'm thinking of using https://pglast.readthedocs.io/en/latest/ which wraps PostgreSQL's parser.

    If we want to add HTSQL based view then we can use HTSQL to infer the dependencies.

    • view -> table, why?

      • So we can require to drop view before we drop the table. It is implemented for linked tables? Would be nice as it will allow to validate migrations w/o accessing a database.
    • view -> view, why?

      • so we can require to drop dependent views before we drop the view

      • so we can change definition of a view with dependents (in this case we need to drop and re-create a view and its dependents)

This commit takes a stab at implementing support for managing SQL views
with rex.deploy.

The example usage:

```
- view: persons
  definition: |
    select name, email
    from user

- view: persons
  definition: |
    select name, email
    from user
    union
    select name, email
    from patient

- view: persons
  present: false
```

I've been trying to follow the code for tables but of course making this
specific to the views.

Some notes/design questions appeared:

- How things are introspected has changed, previously both tables and
  views were loaded as TableImage objects. Now we load views as
  ViewImage objects and we DON'T load view columns. I suspect this will
  make a different when querying views with HTSQL?

- When building a model out of facts we need to check for name clashes
  between tables and views as they share the same namespace in the
  database.

- Should we raise an error in case the same named view appears with
  different definitions? Or should we instead override the former with
  the latter (dangerous)?

- Right now we load view definition with `pg_get_viewdef` function but
  I'm not sure if it is suitable for comparison with view definition in
  facts as db can probably mangle the original SQL. We should probably
  store view definition in a COMMENT in a database.

- Do we need to support view defined with HTSQL? That would be cool.

- Would be nice to be able to specify a pk for a view (required for
  HTSQL to query such view).

- We need to track dependencies:

  As we use SQL for view definitions we might need to use SQL parser for
  that. I'm thinking of using https://pglast.readthedocs.io/en/latest/
  which wraps PostgreSQL's parser.

  If we want to add HTSQL based view then we can use HTSQL to infer the
  dependencies.

  - view -> table, why?

    - So we can require to drop view before we drop the table. It is
      implemented for linked tables? Would be nice as it will allow to
      validate migrations w/o accessing a database.

  - view -> view, why?

    - so we can require to drop dependent views before we drop the
      view

    - so we can change definition of a view with dependents (in this
      case we need to drop and re-create a view and its dependents)
@andreypopp andreypopp requested a review from xitology July 10, 2020 10:52
@andreypopp andreypopp marked this pull request as draft July 10, 2020 10:52
@jayclassless
Copy link
Contributor

* Should we raise an error in case the same named view appears with different definitions? Or should we instead override the former with the latter (dangerous)?

Absolutely error. Is there a situation where this would be intentional?

* Right now we load view definition with `pg_get_viewdef` function but I'm not sure if it is suitable for comparison with view definition in facts as db can probably mangle the original SQL. We should probably store view definition in a COMMENT in a database.

If we can't rely on the original SQL from the server, would we need the whole SQL in the comment, or would a checksum suffice?

* Do we need to support view defined with HTSQL? That would be cool.

That'd be interesting, but possibly difficult if the HTSQL query is specified using tables/columns that are in the deploy configuration but not yet realized in the database?

* Would be nice to be able to specify a pk for a view (required for HTSQL to query such view).

I wouldn't require it, but it'd definitely be nice to have so maybe the rex_deploy extension (or where ever is most appropriate) can automatically set the necessary tweak.override: unique-keys to make HTSQL happy.

@xitology
Copy link
Contributor

This PR will break HTSQL introspector in htsql_rex_deploy/introspect.py, which expects the schema to contain TableImage objects, and will complain about missing attributes when it receives ViewImage instead:

  File "/app/src/rex.deploy/src/htsql_rex_deploy/introspect.py", line 51, in IntrospectDeploy.__call__
    if table_image.primary_key is not None:
AttributeError: 'ViewImage' object has no attribute 'primary_key'

It is not difficult to fix by adding missing attributes such as primary_key to ViewImage.

But since views and tables are really indistinguishable to HTSQL, perhaps we should base a view fact on the table fact? That is, require all view columns to be declared as facts? Although column information can be extracted from the view definition or from pg_catalog, this will also allow us to configure view identity and links, which will make it better integrated with HTSQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants