-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud Embedded Storage layer #369
Comments
For Parquet creation/reading this one looks the one that's best maintained |
For DuckDB this one is the official Typescript wrapper around the nodejs client I see duckDB supports writing to parquet directly |
I'm reading about -- connect to the Postgres instance with the given parameters in read-only mode
ATTACH 'dbname=postgres user=postgres host=127.0.0.1' AS db (TYPE POSTGRES, READ_ONLY);
COPY ⟨table_name⟩ TO 's3://bucket/file.parquet'; The problem with this approach is that DuckDB doesn't have all the connectors we support |
Questions:
|
This project could be interesting for connecting to sources that support ODBC connections like Snowflake https://github.com/rupurt/odbc-scanner-duckdb-extension |
The first version of this was released last week. Finally we called Materialized Queries |
What is this?
We want to offer users the possibility of replicating the data from users' databases (sources) into Parquet files saved to an S3 bucket. Then all subqueries are done with a DuckDB client.
The benefit of this is that users' DB's receive less load and parquet/duckdb are pretty fast-loading queries.
How it works.
Users define a query as a embedded layer by defining some parameters (TO BE DEFINED) in query's config. This configuration tells our embbeding layer
After this config is in place and query is stored in the Embedded storage layer users can reference this query as they would do if the query was done from their database. Under the hood Latitude app will go and fetch this info from S3
Considerations
Using the embedded layer has some considerations to be made.
a) All queries referencing this query has to use DuckDB SQL syntax not users's DB SQL syntax.
b) We need to provide in the queries a metadata saying when was last time this query was updated with original source.
c) How we do the check for periodic updates? I think we need some kind of cron job that pass and check what
need to be stored in Parquet and and need to be refreshed. I think this system has to be a piece a part from current latitude server that run the queries. If we want to go this path we need to access that
queries<-from-app<--from-workspace
somehow from that service. We should start storing the apps reference.TODO
The text was updated successfully, but these errors were encountered: