Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbListTables: list Materialized Views as well #261

Closed
wants to merge 17 commits into from

Conversation

dpprdan
Copy link
Contributor

@dpprdan dpprdan commented Sep 28, 2020

dbListTables() now uses the pg_class/pg_namespace tables instead of INFORMATION_SCHEMA.tables to retrieve the names of remote tables and (materialized) views.

In particular dbListTables() will now list:

  • r = ordinary table
  • v = view
  • m = materialized view
  • f = foreign table
  • p = partitioned table

It does not list the partitions of a partitioned table.

dpprdan added a commit to dpprdan/RPostgres that referenced this pull request Sep 28, 2020
@dpprdan
Copy link
Contributor Author

dpprdan commented Sep 28, 2020

There are no dbListTables() tests, correct?

@krlmlr
Copy link
Member

krlmlr commented Sep 28, 2020

I'm getting legitimate Travis errors: https://travis-ci.com/github/r-dbi/RPostgres/jobs/391997882#L3128

@dpprdan
Copy link
Contributor Author

dpprdan commented Sep 28, 2020

☹️ I can reproduce on a Postgres 9.6. Apparently, relispartition (as well as p = partitioned table) was added in v10 (pg_class docs v10 vs v9.6).

How do you want me to handle this?

  1. Drop relispartition i.e. list partitions as well.
  2. Make the query dependent on the postgres version, i.e. drop relispartition only for Postgres < v10.

Relatedly: Do you already have a policy regarding the use of information schema vs. system catalogs?

The information schema consists of a set of views that contain information about the objects defined in the current database. The information schema is defined in the SQL standard and can therefore be expected to be portable and remain stable — unlike the system catalogs, which are specific to PostgreSQL and are modeled after implementation concerns. The information schema views do not, however, contain information about PostgreSQL-specific features; to inquire about those you need to query the system catalogs or other PostgreSQL-specific views.
https://www.postgresql.org/docs/current/information-schema.html (emphasis mine).

Since mviews are PostgreSQL-specific features we need to query the system catalogs. These have the drawback of being less stable, which we see here to some degree.

Therefore, alternative no.

  1. Do not support PostgreSQL-specific features and therefore mviews in dbListTables() (and dbListFields() and dbExistsTable()).

BTW Which Postgres versions does RPostgres support?

@krlmlr
Copy link
Member

krlmlr commented Sep 28, 2020

A robust solution would be better. We require client libraries >= 9.6, the server version is not specified.

Are materialized views contained in INFORMATION_SCHEMA.views ?

@dpprdan
Copy link
Contributor Author

dpprdan commented Sep 29, 2020

Are materialized views contained in INFORMATION_SCHEMA.views ?

No, they are not since they are a PostgreSQL-specific feature. For more info, see this thread on the psql-hackers list for a discussion on why they are not in the INFORMATION_SCHEMA.tables (which includes Views). Tl;dr: "They are not defined by the SQL standard."

My 2ct: Since RPostgres is a PostgreSQL-specific package, it makes sense to support PostgreSQL-specific features.

OTOH: Since it is a DBI package, maybe it might also make sense to only support SQL-standard features (like information schema)?

In this particular case I don't expect major changes in the future, but you never know.

We require client libraries >= 9.6, the server version is not specified.

Apparently my mental model is off. What do you mean by client libraries? I thought it goes like this: RPostgres - libpq - PostgreSQL server.

@krlmlr
Copy link
Member

krlmlr commented Sep 29, 2020

We require libpq >= 9.6 .

I'm also worried about Redshift (#215). Can we implement a solution that works on all platforms?

@dpprdan
Copy link
Contributor Author

dpprdan commented Sep 29, 2020

I am not familiar with Redshift, at all, and don't know how to test ATM.

DESCRIPTION says libpq >= 9.0, BTW.

@dpprdan
Copy link
Contributor Author

dpprdan commented Sep 29, 2020

I found this:

The standard PostgreSQL catalog tables are accessible to Amazon Redshift users. For more information about PostgreSQL system catalogs, see PostgreSQL system tables.

This links to the Postgres v8.0 docs, however. Like I said, I am not familiar with Redshift, so I don't know whether this is an oversight or not.

I think we have a decision though: In order to support system catalogs, which are less stable than information schema, we would have to test against all currently supported server versions. If you want to support Redshift as well, we would need to test against all current Redshift versions as well. This is probably not as easy as spinning up a few docker containers (though I don't know for sure). This sounds very much like "let's not support queries against system catalogs in RPostgres" to me.

What do you think?

@krlmlr
Copy link
Member

krlmlr commented Sep 29, 2020

Right, 9.0 is stated in DESCRIPTION and in ./configure .

I don't mind implementing Postgres-specific code now that we have Redshift() as a driver for which we can special-case (#258). I could set up a Redshift test database to check this, also on CI.

@krlmlr
Copy link
Member

krlmlr commented Sep 29, 2020

We could find a middle ground by implementing Postgres-specific functions to gain some experience: Leave db*Table*() unchanged and implement Pq-specific functions with a distinct prefix.

@dpprdan
Copy link
Contributor Author

dpprdan commented Oct 1, 2020

Sounds good, I think! Just to make sure, you mean:

INFORMATION_SCHEMA systems catalog
dbListTable() pqListTables()
dbListFields() pqListFields()
dbExistsTable() pqExistsTable()
dbListObjects() pqListObjects()

This involves a bit more than just changing a SQL query, though. E.g. I will have to read up on S4. Also we may have to think about how we setup internal functions like find_table(). Or rather I will have to figure out how this is designed elsewhere in the package, at least.

What I am saying is: I won't be able to do this immediately and cannot commit to a time frame at this moment. Also, I may require a bit of support from you. Therefore it's also fine with me, if you want to do this yourself. 😃

@dpprdan
Copy link
Contributor Author

dpprdan commented Oct 1, 2020

Probably obvious, but we may need Redshift-specific functions as well, when a query relies on a Postgres function that Redshift does not support, e.g. #211. (Plus unsupported PG data types and features).

Not the best example, though, because this particular case seems to have been fixed in the meantime, since dbListTables() does not use the generate_subscripts function anymore.

RPostgres/R/tables.R

Lines 237 to 244 in 12c3d3d

setMethod("dbListTables", "PqConnection", function(conn, ...) {
query <- paste0(
"SELECT table_name FROM INFORMATION_SCHEMA.tables ",
"WHERE ",
"(table_schema = ANY(current_schemas(true))) AND (table_schema <> 'pg_catalog')"
)
dbGetQuery(conn, query)[[1]]
})

Edit: Not sure what I meant at the time, but looks like I was confused, because #211 is about using generate_subscripts (which Redshift does not support) in dbExitsTable(), while I was refering to dbListTables() here - which indeed doesn't use generate_subscripts, but probably never did. 🤷🏻‍♂️

Base automatically changed from master to main March 12, 2021 04:02
@dpprdan dpprdan marked this pull request as draft March 31, 2021 09:03
dpprdan added a commit to dpprdan/RPostgres that referenced this pull request Mar 31, 2021
@dpprdan dpprdan force-pushed the feat/list_mviews branch 2 times, most recently from e62c88b to 47914c6 Compare March 31, 2021 12:05
@dpprdan dpprdan force-pushed the feat/list_mviews branch 2 times, most recently from 57df678 to 8e8994f Compare May 22, 2021 16:36
@dpprdan
Copy link
Contributor Author

dpprdan commented Jul 17, 2021

I added new pq* versions of dbListTables(), dbExistsTable(), dbListObjects() and dbListFields(). They all query the system catalog (pg_class in particular) and are based on the same underlying code. Remember that we have to use the system catalog instead of information schema because materialized views are not included in information schema (because they are not in the SQL spec).

Is this, what you had in mind, @krlmlr?

At the minimum this needs some more tests to make sure that it does what we are trying to do, i.e. list materialized views as well.

We could find a middle ground by implementing Postgres-specific functions to gain some experience: Leave dbTable() unchanged and implement Pq-specific functions with a distinct prefix.

In the end I think it would be better to use this new code for the db* functions instead of having these "special" functions. One reason is ease of use, another is downstream functions (e.g. in dplyr), which build on the db* functions.
We could still use information schema based code for connections that do not support (this) pg_class code, i.e. for Redshift connections.

Slightly off-topic: Do you have plans to test non-current PostgreSQL versions with R CMD check on GHA? I am asking because it was set up on Travis and caught an error I made.

@krlmlr krlmlr added this to the 1.4.1 milestone Dec 5, 2021
@krlmlr
Copy link
Member

krlmlr commented Dec 5, 2021

Thanks a lot. I'm not sure we need the new pq*() functions to be S4 methods, also this package uses the postgres prefix in similar situations.

Other than that, I think we can expose this via the db*() interface if the user sets a flag in dbConnect() -- similarly to the check_interrupts flag. Tests would be great too.

Added checks for Postgres >= 10 to the build matrix.

@dpprdan
Copy link
Contributor Author

dpprdan commented Dec 15, 2022

Closing this in favour of #414

@dpprdan dpprdan closed this Dec 15, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 16, 2023
@dpprdan dpprdan deleted the feat/list_mviews branch January 23, 2024 11:58
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants