Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#4636: allows read_parquet to detect column partitioning in non-local filesystems #5192

Merged
merged 4 commits into from
Nov 21, 2022

Conversation

billiam-wang
Copy link
Collaborator

@billiam-wang billiam-wang commented Nov 4, 2022

Signed-off-by: Bill Wang billiam@ponder.io

What do these changes do?

Added support for read_parquet on column partitioned parquet files in non-local filesystems. Includes tests on non-local S3 parquet files.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: read_parquet can't detect column partitioning in non-local filesystems #4636
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

path_generator = os.walk(path)
else:
storage_options = kwargs.get("storage_options")
fs, fs_path = url_to_fs(path, **storage_options)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage_options variable can be None, so this construction breaks several tests in CI.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

…ning in non-local filesystems

Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
…ndas version

Signed-off-by: Bill Wang <billiam@ponder.io>
Signed-off-by: Bill Wang <billiam@ponder.io>
Copy link
Collaborator

@mvashishtha mvashishtha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Billy2551 for the fix!

@mvashishtha mvashishtha merged commit 073dffc into modin-project:master Nov 21, 2022
dchigarev pushed a commit that referenced this pull request Nov 25, 2022
…-local filesystems (#5192)

Signed-off-by: Bill Wang <billiam@ponder.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_parquet can't detect column partitioning in non-local filesystems
4 participants