Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ByteArrayIndexInput::readBytes can read beyond valid boundary of the slice #10481

Closed
parasjain1 opened this issue Oct 7, 2023 · 3 comments · Fixed by #10551
Closed
Labels
bug Something isn't working Storage Issues and PRs relating to data and metadata storage

Comments

@parasjain1
Copy link
Contributor

parasjain1 commented Oct 7, 2023

Describe the bug
When using ByteArrayIndexInput::readBytes on a sliced ByteArrayIndexInput, the position validation check seems to be incorrect and allows reading beyond the valid boundary of the slice.

This is due to incorrect validation check in ByteArrayIndexInput::readBytes() method -

if (pos + len > this.offset + length) {
    throw new EOFException("seek past EOF");
}

The above code pointer exists since a very long time. The scenario of readBytes() called on a slice was not covered in a UT and hence this may have been missed. Please let me know if this does not seem to be a bug and rather is some misunderstanding wrt how slices are supposed to work. If this looks legit, I'll raise a PR with the fix and UT.

To Reproduce
Below is a UT that can be added to ByteArrayIndexInputTests to reproduce the behaviour. The UT will fail for the current implementation of readBytes().

public void testReadBytesWithSlice() throws IOException {
        int inputLength = randomIntBetween(100, 1000);

        byte[] input = randomUnicodeOfLength(inputLength).getBytes(StandardCharsets.UTF_8);
        IndexInput indexInput = getIndexInput(input);

        int sliceOffset = randomIntBetween(1, inputLength - 10);
        int sliceLength = randomIntBetween(1, inputLength - 10 - sliceOffset);
        IndexInput slice = indexInput.slice("slice", sliceOffset, sliceLength);

        // read a byte from sliced index input and verify if the read value is correct
        assertEquals(input[sliceOffset], slice.readByte());

        // read few more bytes into a byte array
        int bytesToRead = randomIntBetween(1, sliceLength - 1);
        slice.readBytes(new byte[bytesToRead], 0, bytesToRead);

        // now try to read beyond the boundary of the slice, but within the
        // boundary of the original IndexInput. We've already read few bytes
        // so this is expected to fail
        assertThrows(EOFException.class, () -> slice.readBytes(new byte[sliceLength], 0, sliceLength));
    }

Expected behavior
ByteArrayIndexInput::readBytes should not allow reads beyond the boundary of the slice and throw an EOFException.
Proposal is to fix the validation check as below -

if (pos + len > length) {
    throw new EOFException("seek past EOF");
}

Additional context
Attaching a manual dry-run example below -
We consider a basic scenario where we have a slice of length 4 and per the current implementation can call readBytes with len = 5 without raising an EOFException.

        /**
         * --- Original IndexInput of length 10 ---
         * 0 1 2 3 4 5 6 7 8 9 10
         *       ^     ^
         *       3 4 5 6
         *---- ^^^  slice ^^^ ------
         * offset = 3
         * length = 4
         * pos = 0
         *--------------------------
         * Sample call - readBytes(b, 0, 5)
         * 
         * len = 5
         * pos + len = 0 + 5
         * offset + length = 7
         *
         * Current Validation check : pos + len > offset + length  : 5 > 7 : false : does not raise EOFException
         *
         * Proposed validation check : pos + len > length : 5 > 4 : true :  raises EOFException
         *
         */
@parasjain1 parasjain1 added bug Something isn't working untriaged labels Oct 7, 2023
@andrross
Copy link
Member

andrross commented Oct 7, 2023

@parasjain1 Looks like a bug. Can you put up a PR?

@msfroh msfroh added the Storage Issues and PRs relating to data and metadata storage label Oct 9, 2023
@parasjain1
Copy link
Contributor Author

The behaviour is similar for other methods too as there's a common check ByteArrayIndexInput::validatePos. Raising a PR to fix validatePos.

@parasjain1
Copy link
Contributor Author

Have raised a PR against main. We'll also need to backport this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage Issues and PRs relating to data and metadata storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants