Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core feature] Add support for duckdb with structured dataset #2865

Closed
2 tasks done
pingsutw opened this issue Sep 12, 2022 · 6 comments
Closed
2 tasks done

[Core feature] Add support for duckdb with structured dataset #2865

pingsutw opened this issue Sep 12, 2022 · 6 comments
Labels
flytekit FlyteKit Python related issue good first issue Good for newcomers scipy-2023

Comments

@pingsutw
Copy link
Member

pingsutw commented Sep 12, 2022

Motivation: Why do you think this is important?

duckdb is a serverless DB deployed at GCP. we could integrate it with Flyte, and write task output, like pandas.DataFrame or arrow.Tables, to duckdb transparently.

@task
def t1() -> Annotated[StructuredDataset, kwtypes(len=int)]:
    df = pd.DataFrame({"name": ["dylan", "steve"],"age": [33, 32]})
    return StructuredDataset(df, uri=duckdb_uri) # flytekit will write pandas dataframe to duckdb

Goal: What should the final outcome look like, ideally?

Add a duckDB plugin in flytekit, and add a structured dataset encoder and decoder.
Here is an example to add custom encoder/decoder in flytekit-plugin

Describe alternatives you've considered

No response

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@pingsutw pingsutw added good first issue Good for newcomers flytekit FlyteKit Python related issue labels Sep 12, 2022
@pingsutw pingsutw added this to the 1.3.0 milestone Sep 12, 2022
@rajagurunath
Copy link

Hi Team, I was new to Flyte, Shall I take this task and try this integration of flytekit and DuckDB?
(if no one was working on this issue !)

Thanks in Advance

@samhita-alla
Copy link
Contributor

@rajagurunath, please go ahead! Do you want me to assign the issue to you?

@rajagurunath
Copy link

Thanks a lot, @samhita-alla, please assign this issue to me!

@samhita-alla
Copy link
Contributor

samhita-alla commented Oct 12, 2022

@rajagurunath, hey I'm going to unassign you as my team will work on this issue. I hope you haven't started working on it. Sorry! Please feel free to leave a comment on the other hacktoberfest-labeled issues we have: #2917.

@shivaylamba
Copy link

Hi @samhita-alla I would like to take up the issue

@rajagurunath
Copy link

sure No problem @samhita-alla Thanks, yeah kind of started with an initial exploration of the code base and saving the data frame to duckdb, etc, will have a look at the list and pick other issues.

Please let me know @shivaylamba, for any help needed from my side!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flytekit FlyteKit Python related issue good first issue Good for newcomers scipy-2023
Projects
None yet
Development

No branches or pull requests

6 participants