Skip to content

Comments

Fix InputStreamByteChunkProvider#isAvailable#850

Open
vorobii-vitalii wants to merge 5 commits intohierynomus:masterfrom
vorobii-vitalii:fix-is-available-input-stream
Open

Fix InputStreamByteChunkProvider#isAvailable#850
vorobii-vitalii wants to merge 5 commits intohierynomus:masterfrom
vorobii-vitalii:fix-is-available-input-stream

Conversation

@vorobii-vitalii
Copy link

@vorobii-vitalii vorobii-vitalii commented Dec 11, 2024

Hello!

There is issue in this function -

return super.isAvailable() || (is != null && is.available() > 0);
.
According to docs - https://docs.oracle.com/javase/8/docs/api/java/io/BufferedInputStream.html. BufferedInputStream#available returns only estimate of the number of bytes that can be read.
And one of implementations of InputStream we rely on actually always returns zero before reading.
Please review my approach. I've tested it and it works as expected

@vorobii-vitalii
Copy link
Author

@hierynomus Any news??

@laeubi
Copy link
Contributor

laeubi commented Mar 19, 2025

@vorobii-vitalii As you noticed using the isAvailable for anything really is unreliable I wonder where/who it is used, maybe one better fix that place to not use isAvailable at all...

@vorobii-vitalii
Copy link
Author

Hi, @laeubi
I agree that relying on BufferedInputStream#available is a bad idea, since it returns estimate.
However, if we want to read from some input stream in chunks and transfer chunks to Samba we still need to know whether underlying ByteChunkProvider can still provide some bytes. We shouldn't exit following loop until we are sure ByteChunkProvider won't provide any bytes.

public long write(ByteChunkProvider provider, ProgressListener progressListener) {
        int bytesWritten = 0;
        **while (provider.isAvailable()) {**
            logger.debug("Writing to {} from offset {}", this.entryName, provider.getOffset());
            SMB2WriteResponse wresp = share.write(fileId, provider);
            bytesWritten += wresp.getBytesWritten();
            if (progressListener != null)
                progressListener.onProgressChanged(wresp.getBytesWritten(), provider.getOffset());
        }
        return bytesWritten;
    }

Perhaps function name can be improved

@vorobii-vitalii
Copy link
Author

And I am not sure we can avoid this polling loop, we have to read from file/socket in chunks, because otherwise we might run into out of memory error

@laeubi
Copy link
Contributor

laeubi commented Mar 24, 2025

I agree that relying on BufferedInputStream#available is a bad idea, since it returns estimate.

It does not, per javadoc it might return an estimate it is also legal to just return 0 and this is the case you are encountering here.

So this PR actually makes isAviable to something like "wasEOF read" what already is part of the API by returning -1 ...

And I am not sure we can avoid this polling loop, we have to read from file/socket in chunks, because otherwise we might run into out of memory error

The provider need to know if there is more to write but this should not result in using available on an input stream! Instead it should work similar to how Iterator works, that is a call to hasNext() will actually perform a read of the next data and cache it somewhere or return false (in this case e.g. it has read EOF). then next() can use this cached data (what might be a small chunck) to write it / return it / whatever...

@vorobii-vitalii
Copy link
Author

I agree that relying on BufferedInputStream#available is a bad idea, since it returns estimate.

It does not, per javadoc it might return an estimate it is also legal to just return 0 and this is the case you are encountering here.

So this PR actually makes isAviable to something like "wasEOF read" what already is part of the API by returning -1 ...

And I am not sure we can avoid this polling loop, we have to read from file/socket in chunks, because otherwise we might run into out of memory error

The provider need to know if there is more to write but this should not result in using available on an input stream! Instead it should work similar to how Iterator works, that is a call to hasNext() will actually perform a read of the next data and cache it somewhere or return false (in this case e.g. it has read EOF). then next() can use this cached data (what might be a small chunck) to write it / return it / whatever...

I agree that my solution changes semantics of this function, implemented different approach

Copy link
Contributor

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a much better approach, just some style comments.

@hierynomus
Copy link
Owner

Before I merge this change in. I'd like to see a testcase that proves the original implementation wrong.

@laeubi
Copy link
Contributor

laeubi commented Mar 24, 2025

@hierynomus there is already a testcase in this PR, @vorobii-vitalii does this testcase fails without your patch?

@hierynomus
Copy link
Owner

I've seen the testcase indeed, I'm not convinced it fails without.

@vorobii-vitalii
Copy link
Author

It was test case for new component. Added new one
With previous implementation it fails:
image
With new it passes:
image

@vorobii-vitalii
Copy link
Author

@hierynomus @laeubi Hey guys! Is PR still missing some changes required for merge?

Copy link
Contributor

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me but I'm not a committer on the project :-)

private final BufferedInputStream inputStream;
private volatile int cachedByte = EOF;

public BufferedInputStreamReader(BufferedInputStream inputStream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really requires a BufferedInputStream here? If not then a plain InputStream would make this clas more reusable

@vorobii-vitalii
Copy link
Author

Hello @hierynomus Please let me know what changes are missing in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants