-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add support for using Parquet as external storage #20
Comments
Hey AJ-- I think @matsonj just used I was just going to take a pass over this repo to do some updates for DuckDB 0.7.x-- is there some reason |
Hi, @jwills. Re:
No reason I know of. I'm totally happy to recommend that model - As we think towards where to invest future efforts, and where to direct community members who want to interop with Spark and/or data lakes, I think target-duckdb might ultimately be a better layer for "table-like" operations in data lake paradigms. I'm not sure how the There's no rush on this, by the way. I just wanted to start this thread to see if what I'm thinking of would make sense. |
Yeah, your reasoning there re: upsert operations makes sense and is valid IMO. I'm going to turn my attention back to this project next week once I get some |
Thanks for this validation!
Sounds great. Again, no rush from our side. Nothing per se is broken as of now, and this is more of a long-term strategic investment, I think. I'll close this issue since the question is answered. Thanks again, and let us know if we can help in any way. |
Reopening (with an updated title) because I've been hearing a lot of interest in this. I've updated the description to be more direct in terms of what I think the next steps may be. cc @kgpayne |
okay, cool-- as you can tell I've done ~ nothing to move this forward; do you want to chat about it somewhere? Meltano Slack? |
@jwills - great idea! I created a new channel for this: (Join link for anyone not already in our slack: https://meltano.com/slack) |
Looks like @kgpayne has an implementation POC using external storage here: |
Any updates on this? |
@ReneTC I think the move is to use a virtualenv-type solution to align your duckdb, dbt-duckdb, and target-duckdb versions together; I'd recommend:
...but I'm on vacation for a couple of weeks and haven't tried them in combination yet. |
Thanks I'll test this tomorrow and report back.
Edit// conforming a meltano install with whatever tap and
```
loaders:
- name: target-duckdb
variant: jwills
pip_url: target-duckdb==0.6.0
transformers:
- name: dbt-duckdb
variant: jwills
pip_url: dbt-core~=1.5.0 dbt-duckdb==1.5.2
```
works. Thank you!
…On Thu, 20 Jul 2023, 20.19 Josh Wills, ***@***.***> wrote:
@ReneTC <https://github.com/ReneTC> I think the move is to use a
virtualenv-type solution to align your duckdb, dbt-duckdb, and
target-duckdb versions together; I'd recommend:
duckdb==0.8.1
dbt-duckdb==1.5.2
target-duckdb==0.6.0
...but I'm on vacation for a couple of weeks and haven't tried them in
combination yet.
—
Reply to this email directly, view it on GitHub
<#20 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJSZI4NOBYPZPK2A5FYR3JDXRFZCNANCNFSM6AAAAAAVULRJ7A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Updated issue description (2023-03-30):
There are some great use cases where we'd love to use
target-duckdb
as an interop layer to write Parquet files.Today, users sometimes are creating data flows where they first use
target-parquet
and then transforming withdbt-duckdb
, whereas a more streamlined approach would be to lettarget-duckdb
anddbt-duckdb
both operate on the same Parquet-based datastore.From the comment thread below in #20 (comment):
Original question
Details
We have some users interested in storing data within Parquet. Can this target be used in combination with DuckDB's support for Parquet datasets?
The text was updated successfully, but these errors were encountered: