smartnoise-sql support for BigQuery, Redshift and Snowflake engines #430
-
What do you think about adding the support for common cloud data warehouse engines to Cloud providers made it very easy to set up data warehouses for data analysis. Organizations of any size can now separate the low-latency, production-focused data bases from the low-cost, analytics-focused data warehouses. When choosing the provider for a data warehouse, Postgres is often not the first choice. Extending the This brings a couple of questions that I'd love to hear your thoughts on: Would DP in a data warehouse be a frequent use case? Is adding such support worth it? Are there any technical limitations preventing |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Yes, data warehouses are a primary scenario that
We actually use the SparkSQL connector against data warehouses with petabytes of data. I don't think the SQLAlchemy rewriter would be necessary to support BigQuery or Snowflake. SQLAlchemy rewriter will be useful for engines that don't support SQL-92, and For the engines that support SQL-92 and higher, the process is pretty quick:
To include support for these engines in the PyPi package, we would want to make sure the new engine is being tested in all of our automated CI tests. CI tests run as linux docker containers in GitHub Actions, and there are a ton of SQL tests hitting all of the supported functionality against all of the supported engines. More details at [3]. We also support local execution of CI tests via For the cloud data warehouse providers, this would just involve creating a test environment with all of the test tables in [3] installed. If this could be done in a linux docker that can run both in GitHub Actions and locally, that would be ideal. If it requires some sort of subscription and cloud access, it gets more complicated, since we'd need to have GH Actions managing secret keys or whatever, handle billing, etc. All worthwhile tasks, but likely would be the most time-consuming part of supporting these engines. [1] https://github.com/opendp/smartnoise-sdk/blob/main/sql/snsql/sql/reader/base.py |
Beta Was this translation helpful? Give feedback.
Yes, data warehouses are a primary scenario that
smartnoise-sql
was designed for. Some nice properties of data warehouses with differential privacy: