Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for writing additional file types beyond parquet #101

Open
boetro opened this issue Mar 16, 2023 · 8 comments
Open

Add support for writing additional file types beyond parquet #101

boetro opened this issue Mar 16, 2023 · 8 comments
Labels
enhancement New feature or request io Any issue related to IO

Comments

@boetro
Copy link
Collaborator

boetro commented Mar 16, 2023

Some more file types: JSON, Avro, CSV

@boetro boetro added io Any issue related to IO enhancement New feature or request labels Mar 16, 2023
@aeroaks
Copy link
Contributor

aeroaks commented May 4, 2023

Can you point to where I should start looking to explore this issue?

@boetro
Copy link
Collaborator Author

boetro commented May 4, 2023

Definitely!

All the file io is in this file: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py

The enum here has the list of file types we support: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py#L18

And in the _write method defines each file format is written: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io.py#L50

You should be able to mimic the tests here with any additional file types also: https://github.com/launchflow/buildflow/blob/main/buildflow/runtime/ray_io/file_io_test.py

If you have any other questions let me know!

@aeroaks
Copy link
Contributor

aeroaks commented May 8, 2023

Thanks, I an trying to test my changes and found that the existing tests are using unittest module. Is their any specific reason to use this in place of pytest?
And also, are there any specific commands to run all the existing tests ?

@boetro
Copy link
Collaborator Author

boetro commented May 8, 2023

We mostly just did that from preference. Previous it was nice to use to setup ray specifics in the setUpClass methods, but we moved that to a pytest fixture, but it should play well with pytest I believe.

My typical workflow for a fresh install is (running from the root directory):

# install all dev deps
pip install .[dev]
pytest

@aeroaks
Copy link
Contributor

aeroaks commented May 10, 2023

Thanks, I got the tests working. I have send a pull request #137 . But the checks are not running. And I was not able to add reviewer in there.

@boetro
Copy link
Collaborator Author

boetro commented May 10, 2023

Hmm interesting probably an issue with our permission set up. Let me try and update those

@boetro
Copy link
Collaborator Author

boetro commented May 10, 2023

Alright turns out it was an issue with when the workflow runs. I updated it in #138 to run on PRs. So I think if you pull down the latest changes it should work.

(also added a CODEOWNERS to auto assign reviews)

@aeroaks
Copy link
Contributor

aeroaks commented May 12, 2023

With #142 , filesink should now supports CSV and JSON

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request io Any issue related to IO
Projects
None yet
Development

No branches or pull requests

2 participants