Skip to content

Data source refactor#438

Merged
mildbyte merged 16 commits intomasterfrom
feature/data-sources-refactor
Apr 14, 2021
Merged

Data source refactor#438
mildbyte merged 16 commits intomasterfrom
feature/data-sources-refactor

Conversation

@mildbyte
Copy link
Copy Markdown
Contributor

Only visible CLI-level changes are with a couple of sgr mount commands:

  • MySQL: parameter remote_schema has been renamed to dbname
  • Mongo: parameter coll has been renamed to collection; db to database

Lots of API changes to the DataSource class and its children:

  • Table options are now a separate parameter that is passed to the
  • Introspection now returns a dictionary of tables and proposed table options OR error classes for tables that we weren't able to introspect (allowing for partial failures)
  • Mounting can now return a list of mount errors (caller can choose to ignore).
  • CSV data source: allow passing a partially initialized list of table options without a schema, making it introspect just those S3 keys and fill out the missing table options.

Miscellaneous:

  • Postgres-level notices are now available in the PsycopgEngine.notices list after a run_sql invocation.
  • Multicorn: fix bug where server-level FDW options would override table-level FDW options.

mildbyte added 16 commits April 12, 2021 10:56
…l fdw_params (breaking change):

  * mysql: remote_schema -> dbname (MySQL uses `dbname`)
  * mongo: coll -> collection, db -> database
…nt invocations are unchanged) to the data source class:

  * Factor the table params out into a separate toplevel parameter, with its own JSONSchema (parameters for each table, e.g. index for ElasticSearch)
    * As a compat layer with sgr mount / existing invocations, look for `tables` in the current data source params and hoist it up into table params at init time
  * `TableInfo` type, besides a list of strings (table names), can now be a dict of table name -> (table schema, table params) instead of just table schema. This is accepted throughout various data source methods (mount, introspect, load, sync etc)
  * Introspection now returns the table schema and the table params (this is a TODO -- need to actually scrape the table params from the FDW options of the mounted tables)
…instead of lexicographical in which 0.0.9 > 0.0.10)
…truct, unpack it and set the correct CSV parsing options (delimiter, quotechar)
…options without a table schema (just the options like the S3 object key, not the table schema). This will make it introspect just those keys and fill out the missing table options (e.g. can pass the encoding and it'll infer the rest). Add initial support for passing these table options to `IMPORT FOREIGN SCHEMA` (as a JSON option).
  * CSV data source: pass errors introspecting / scanning files back to the Python process over the PG notice mechanism
  * Mount/Preview data source methods: return a better defined error struct (instead of an ad hoc string)
  * Clean up some types
@mildbyte mildbyte merged commit e95f353 into master Apr 14, 2021
@mildbyte mildbyte deleted the feature/data-sources-refactor branch April 14, 2021 14:11
mildbyte added a commit that referenced this pull request Apr 14, 2021
  * Add customizable fetch size to the Snowflake data source (#434)
  * Fix issue with changing the engine password (#437)
  * Data source refactor (#438):
    * MySQL: parameter `remote_schema` has been renamed to `dbname`
    * Mongo: parameter `coll` has been renamed to `collection`; `db` to `database`
    * Table options are now a separate parameter that is passed to the
    * Introspection now returns a dictionary of tables and proposed table options OR error classes for tables that we weren't able to introspect (allowing for partial failures)
    * Mounting can now return a list of mount errors (caller can choose to ignore).
    * CSV data source: allow passing a partially initialized list of table options without a schema, making it introspect just those S3 keys and fill out the missing table options.
  * Postgres-level notices are now available in the `PsycopgEngine.notices` list after a `run_sql` invocation.
  * Multicorn: fix bug where server-level FDW options would override table-level FDW options.

Full set of changes: [`v0.2.12...v0.2.13`](v0.2.12...v0.2.13)
mildbyte added a commit that referenced this pull request Apr 14, 2021
  * Add customizable fetch size to the Snowflake data source (#434)
  * Fix issue with changing the engine password (#437)
  * Data source refactor (#438):
    * MySQL: parameter `remote_schema` has been renamed to `dbname`
    * Mongo: parameter `coll` has been renamed to `collection`; `db` to `database`
    * Table options are now a separate parameter that is passed to the
    * Introspection now returns a dictionary of tables and proposed table options OR error classes for tables that we weren't able to introspect (allowing for partial failures)
    * Mounting can now return a list of mount errors (caller can choose to ignore).
    * CSV data source: allow passing a partially initialized list of table options without a schema, making it introspect just those S3 keys and fill out the missing table options.
  * Postgres-level notices are now available in the `PsycopgEngine.notices` list after a `run_sql` invocation.
  * Multicorn: fix bug where server-level FDW options would override table-level FDW options.

Full set of changes: [`v0.2.12...v0.2.13`](v0.2.12...v0.2.13)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant