Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompression of sas7bdat.bz2 file is not distributed across worker nodes #55

Open
yivanova88 opened this issue Mar 17, 2020 · 1 comment

Comments

@yivanova88
Copy link

Hello,

I have been experimenting with the bz2 decompression functionality in the repo's master branch which isn't part of your last release. When a bz2 compressed file is read, the decompression seems to be happening on one worker node only. Is it possible to parallelise the decompression of externally compressed files?

Thanks in advance for your response.

@yivanova88 yivanova88 changed the title Decompression on sas7bdat.bz2 file is not distributed across worker nodes Decompression of sas7bdat.bz2 file is not distributed across worker nodes Mar 17, 2020
@saurfang
Copy link
Owner

based on #50 this seems to be expected. bz2 is indeed splittable but we need to seek for page boundaries within sas files. the easiest workaround is probably decompress and parse separately both should be parallelizable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants