-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrently read and write files with parquet4s-fs2 #285
Comments
Hi @flipp5b ! |
Hi @mjakubowski84, Thank you for the great library! Sure, I'll prepare the PR. |
@mjakubowski84, I'd like to add a test to reproduce the bug, but it looks like the issue is specific to a distributed file system. So I wonder if it's acceptable to add a new test suite with testcontainers running a custom single-node Hadoop cluster image? |
sure! please do! |
@flipp5b Thanks for the contribution! Would you like to add a fix also to core & akka so that the bug fix release solves the problem completely? |
Sure, I'll do that tomorrow.
…On Tue, Jan 17, 2023, 22:32 Marcin Jakubowski ***@***.***> wrote:
@flipp5b <https://github.com/flipp5b> Thanks for the contribution! Would
you like to add a fix also to core & akka so that the bug fix release
solves the problem completely?
—
Reply to this email directly, view it on GitHub
<#285 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKAT3SHJMTG5KQZTADEIYTWS3XWBANCNFSM6AAAAAAT2SDKTM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Thanks, I appreciate it! |
@flipp5b Your fixes are released as v2.8.0. Thank you for your contribution! |
@mjakubowski84, it was a pleasure! Thank you for the instant feedback and the bugfix release! |
I face an issue when I try to concurrently read and write files with parquet4s-fs2.
A simplified reproducer looks as follows:
It fails with a
java.nio.channels.ClosedChannelException
. I've added some tracing messages and formatted the output for the sake of readability:parquet4s-fs2 calls
path.getFileSystem(conf)
here and there and wraps the resulting filesystem with acats.effect.Resource
so the filesystem is closed when the resource is released. But the issue is that by default,path.getFileSystem(conf)
uses caching, so the resource may potentially close the filesystem that is used by someone else.In the trace above, we can see that the writer prepares to write the file, gets a filesystem, opens
DFSOutputStream
, etc. But then the reader callsfindPartitionedPaths
which gets the same filesystem and closes it and the linkedDFSOutputStream
.It's possible to disable the filesystem cache using
fs.hdfs.impl.disable.cache
and this solves the problem. But this also may lead to aFileSystem
leak: parquet-mr callspath.getFileSystem(conf)
in some places, and it looks like it doesn't close receivedFileSystem
objects.Could you please advise the proper way to solve the issue? Am I doing something wrong, or is it probably better not to close file systems inside parquet4s-fs2?
The text was updated successfully, but these errors were encountered: