Panic in page.go on bigendian arch #56

pavolloffay · 2023-08-30T16:39:58Z

Hi, we are seeing the following error on IBM Z 64bit with Grafana Tempo and Parquet format. Weird is that the exact same setup works well on other architectures.

I have tried rebuilding Tempo with -tags purego and got the same panic.

Any help would be appreciated.

panic: runtime error: index out of range [7] with length 7

goroutine 1221 [running]:
github.com/segmentio/parquet-go.(*byteArrayPage).index(...)
	/remote-source/tempo/app/vendor/github.com/segmentio/parquet-go/page.go:982
github.com/segmentio/parquet-go.(*byteArrayDictionary).lookupString(0xc0002201c0, {0xc001108000, 0x1b, 0x400}, {{0xc001068000, 0x1b, 0x18}})
	/remote-source/tempo/app/vendor/github.com/segmentio/parquet-go/dictionary_purego.go:42 +0x162
github.com/segmentio/parquet-go.(*byteArrayDictionary).Lookup(0xc0002201c0, {0xc001108000, 0x1b, 0x400}, {0xc001068000, 0x1b, 0x3e8})
	/remote-source/tempo/app/vendor/github.com/segmentio/parquet-go/dictionary.go:748 +0x142
github.com/segmentio/parquet-go.(*indexedPageValues).ReadValues(0xc000fe5490, {0xc001068000, 0x1b, 0x3e8})
	/remote-source/tempo/app/vendor/github.com/segmentio/parquet-go/dictionary.go:1332 +0xd6
github.com/segmentio/parquet-go.(*repeatedPageValues).ReadValues(0xc000aa3e40, {0xc001068000, 0x3e8, 0x3e8})
	/remote-source/tempo/app/vendor/github.com/segmentio/parquet-go/page_values.go:98 +0x192
github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).iterate.func3.2(0xc00115ff48, 0xc00115fec8, 0xc00018f740, {0xc001068000, 0x3e8, 0x3e8}, 0x3e8, {0x2815978, 0xc000ef6fc0}, {0x2829640, ...})
	/remote-source/tempo/app/pkg/parquetquery/iters.go:406 +0x26c
github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).iterate.func3(0xc00018f740, 0xc00115ff48, 0xc00115fec8, {0xc001068000, 0x3e8, 0x3e8}, 0x3e8, {0x2815978, 0xc000ef6fc0}, {0x2821868, ...})
	/remote-source/tempo/app/pkg/parquetquery/iters.go:459 +0x1e4
github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).iterate(0xc00018f740, {0x2815978, 0xc000ef6fc0}, 0x3e8)
	/remote-source/tempo/app/pkg/parquetquery/iters.go:464 +0x676
github.com/grafana/tempo/pkg/parquetquery.NewColumnIterator.func1()
	/remote-source/tempo/app/pkg/parquetquery/iters.go:299 +0x42
created by github.com/grafana/tempo/pkg/parquetquery.(*ColumnIterator).next
	/remote-source/tempo/app/pkg/parquetquery/iters.go:485 +0x60

The text was updated successfully, but these errors were encountered:

achille-roussel · 2023-09-01T20:32:18Z

Hello @pavolloffay, thanks for reporting!

We might need a bit more details to be able to track down this issue. Would you be able to share a file that causes the panic? Or steps to generate one?

pavolloffay · 2023-09-04T15:46:15Z

The IBM Z (s390x) is big-endian. Is this causing the issue?

https://github.com/search?q=repo%3Aparquet-go%2Fparquet-go%20LittleEndian&type=code

pavolloffay · 2023-09-05T09:39:14Z

@achille-roussel see #58 to reproduce failing tests on s390x

pavolloffay · 2023-09-05T10:01:43Z

The panic in #58 happens as well in page.go, so we could take it as a reproducer.

JavaPerformance · 2024-02-12T08:59:00Z

Hi all, may I ask what the status of this is please, it is blocking us from upgrading Tempo.

JavaPerformance · 2024-02-12T12:46:29Z

@pavolloffay I'm going to work on a patch to Parquet, not quite sure I can do it but I'll try. Without it we cant upgrade. Are you in Zurich?

achille-roussel · 2024-02-12T19:03:32Z

Hello @JavaPerformance, progress on the resolution was tracked in #58

There hasn't been much progress lately, but it would be extremely helpful if you have the cycles and can pick it up. I don't have access to a system running on a big-endian architecture, so it's a bit tricky to investigate the remaining issues.

srinivas-pokala · 2024-02-22T09:04:32Z

@pavolloffay , @achille-roussel I have tried reproducing the issue on s390x platform(IBM-Z)

I am able reproduce the issue, below is the stack trace which I have got from the panic.

--- FAIL: TestOpenFile (0.01s)
    --- PASS: TestOpenFile/testdata/alltypes_dictionary.parquet (0.00s)
    --- PASS: TestOpenFile/testdata/alltypes_plain.parquet (0.00s)
    --- PASS: TestOpenFile/testdata/alltypes_plain.snappy.parquet (0.00s)
    --- FAIL: TestOpenFile/testdata/alltypes_tiny_pages.parquet (0.01s)
panic: runtime error: index out of range [620756992] with length 731 [recovered]
        panic: runtime error: index out of range [620756992] with length 731

goroutine 1642 [running]:
testing.tRunner.func1.2({0x961d80, 0xc0000287c8})
        testing/testing.go:1526 +0x2e8
testing.tRunner.func1()
        testing/testing.go:1529 +0x406
panic({0x961d80, 0xc0000287c8})
        runtime/panic.go:884 +0x23a
github.com/parquet-go/parquet-go.(*byteArrayPage).index(0xc00104d9d0, 0x25000000)
        github.com/parquet-go/parquet-go/page.go:981 +0x140
github.com/parquet-go/parquet-go.(*byteArrayDictionary).lookupString(0xc00104d9d0, {0xc000276000, 0xd, 0x400}, {{0xc00100ec00, 0xd, 0x18}})
        github.com/parquet-go/parquet-go/dictionary_purego.go:42 +0x102
github.com/parquet-go/parquet-go.(*byteArrayDictionary).Lookup(0xc00104d9d0, {0xc000276000, 0xd, 0x400}, {0xc00100ec00, 0xd, 0x2a})
        github.com/parquet-go/parquet-go/dictionary.go:748 +0x134
github.com/parquet-go/parquet-go.(*indexedPageValues).ReadValues(0xc00017de70, {0xc00100ec00, 0xd, 0x2a})
        github.com/parquet-go/parquet-go/dictionary.go:1332 +0x206
github.com/parquet-go/parquet-go.(*optionalPageValues).ReadValues(0xc0004c29e0, {0xc00100ec00, 0x2a, 0x2a})
        github.com/parquet-go/parquet-go/page_values.go:40 +0x580
github.com/parquet-go/parquet-go_test.printColumns(0xc0031c56c0, 0xc000b34ea0, {0xc000548508, 0x4})
        github.com/parquet-go/parquet-go/file_test.go:85 +0x8f6
github.com/parquet-go/parquet-go_test.printColumns(0xc0031c56c0, 0xc000b34000, {0x9abe3e, 0x2})
        github.com/parquet-go/parquet-go/file_test.go:119 +0x12f8
github.com/parquet-go/parquet-go_test.TestOpenFile.func1(0xc0031c56c0)
        github.com/parquet-go/parquet-go/file_test.go:50 +0x8e0
testing.tRunner(0xc0031c56c0, 0xc00094a780)
        testing/testing.go:1576 +0x128
created by testing.(*T).Run
        testing/testing.go:1629 +0x50a

In theTestOpenFile table driven test, the test "TestOpenFile/testdata/alltypes_tiny_pages.parquet:" which causing the issue.
When I have debugged further this issue with failing test, the index [620756992] which storing in little-endian fashion, It's actually trying to store the index value(of type int32[]) as 0x25, but it 's storing as 0x25000000 . Due to this panic has encountered.
Below is the code snippets which causing the index value storing in endian reverse through copy function:
https://github.com/parquet-go/parquet-go/blob/main/encoding/rle/rle.go#L379

		} else {
			j := i + bitpack.ByteCount(bitWidth)

			if j > len(src) {
				return dst, fmt.Errorf("decoding run-length block of %d values: %w", count, io.ErrUnexpectedEOF)
			}

			bits := [4]byte{}
===>>>>		copy(bits[:], src[i:j])
			dst = appendRepeat(dst, bits[:], count)
			i = j
		}
	}

	return dst, nil
}

After fixing this with endian check and if Big endian, swapped the Bytes. There are other failures which I have observed after this fix which already causing the same test to failure.
whatever this path above mentioned is executed when "bitpacked " boolean variable is false, that's working fine now with endian check and swapping int32 bytes.
But, for bitpacked boolean become true this if case is not handling propely the src[] int32 data. Below is the code snippet for if case.
https://github.com/parquet-go/parquet-go/blob/main/encoding/rle/rle.go#L358

if bitpacked {
			offset := len(dst)
			length := int(count * bitWidth)
			dst = resize(dst, offset+4*8*int(count))

			// The bitpack.UnpackInt32 function requires the input to be padded
			// or the function panics. If there is enough room in the input
			// buffer we can use it, otherwise we have to copy it to a larger
			// location (which should rarely happen).
			in := src[i : i+length]
			if (cap(in) - len(in)) >= bitpack.PaddingInt32 {
				in = in[:cap(in)]
			} else {
				buf = resize(buf, len(in)+bitpack.PaddingInt32)
				copy(buf, in)
				in = buf
			}

			out := unsafecast.BytesToInt32(dst[offset:])
			bitpack.UnpackInt32(out, in, bitWidth)
			i += length
		} else {

It would be really helpful any pointer's on bitpacked true if case execution code path. I am suspecting the issue lies in this code flow.

JavaPerformance · 2024-03-22T13:16:02Z

Hi all, is there any update on this, I know that the s390x Tempo is small but we are doing interesting things and being stuck on Tempo 1.5 is holding us back. @joe-elliott

joe-elliott · 2024-03-22T13:46:29Z

unfortunately, i think those that using big endian architectures are going to have to dig this one out on their own. we do not have the bandwidth or incentive right now.

JavaPerformance · 2024-03-22T13:51:33Z

@joe-elliott I understand, I'll make an effort. Forgive me ask the odd naive question in the process.

joe-elliott · 2024-03-22T13:53:44Z

no worries! i just want to be clear about where we are on this.

huge thx for making an attempt 🙏

JavaPerformance · 2024-03-22T18:52:27Z

I've been playing around, although there will probably be many other places in the code that have to be worked on I started with rle.go that @srinivas-pokala found. The private encodeInt32 decodeInt32 work just fine and pass the simple test I wrote. However the exported function DecodeInt32 is a problem and fails on BigEndian because of the unsafecasts (I was lazy and just replicated the unsafecasts from DecodeInt32 in my test that directly calls the provate decodeIn32. At least it gives me somewhere to start.

@joe-elliott is RLE used often? could it be the reason for the panic in page.go or is this a case of fix this anyway but keep looking?

=== RUN TestEncodeDecodeInt32
rle_test.go:10: **********************************************************
rle.go:42: Encode
rle.go:51: Decode
rle.go:64: Convert decoded bytes back to int32
rle.go:73: Compare src and int32Decoded
rle.go:79: decoded: 1, src: 1
rle.go:79: decoded: 2, src: 2
rle.go:79: decoded: 3, src: 3
rle.go:79: decoded: 4, src: 4
rle.go:79: decoded: 5, src: 5
rle.go:79: decoded: 6, src: 6
rle.go:79: decoded: 7, src: 7
rle.go:79: decoded: 8, src: 8
rle_test.go:14: **********************************************************
--- PASS: TestEncodeDecodeInt32 (0.00s)

=== RUN TestEncodeDecodeInt32
rle_test.go:10: **********************************************************
rle.go:109: Encode
rle.go:118: Decode
rle.go:131: Convert decoded bytes back to int32
rle.go:140: Compare src and int32Decoded
rle.go:146: decoded: 1, src: 16777216
rle.go:148: Mismatch at index 0: expected 1, got 16777216
rle.go:146: decoded: 2, src: 33554432
rle.go:148: Mismatch at index 1: expected 2, got 33554432
rle.go:146: decoded: 3, src: 50331648
rle.go:148: Mismatch at index 2: expected 3, got 50331648
rle.go:146: decoded: 4, src: 67108864
rle.go:148: Mismatch at index 3: expected 4, got 67108864
rle.go:146: decoded: 5, src: 83886080
rle.go:148: Mismatch at index 4: expected 5, got 83886080
rle.go:146: decoded: 6, src: 100663296
rle.go:148: Mismatch at index 5: expected 6, got 100663296
rle.go:146: decoded: 7, src: 117440512
rle.go:148: Mismatch at index 6: expected 7, got 117440512
rle.go:146: decoded: 8, src: 134217728
rle.go:148: Mismatch at index 7: expected 8, got 134217728
rle_test.go:14: **********************************************************
--- FAIL: TestEncodeDecodeInt32 (0.00s)
FAIL
FAIL github.com/parquet-go/parquet-go 0.003s
FAIL

JavaPerformance · 2024-03-28T10:50:59Z

@joe-elliott I've started work on this, A few of those question I warned you about :)

Binary compatibility. Do you think it is necessary to maintain the portability of stores between systems of different endianism? Additionally Would you imagine or require the mixing of endianisms in a network?
Testing. many of the tests use hard-coded data, either I write alternative tests or the big endian code translates to little endian externally (the same as question 1 I guess)

joe-elliott · 2024-03-28T12:55:42Z

Binary compatibility. Do you think it is necessary to maintain the portability of stores between systems of different endianism? Additionally Would you imagine or require the mixing of endianisms in a network?

Are you asking from a parquet-go perspective? Like should a file written by a big endian system be readable by a little endian system? I think ideally the storage format would not be dependent on the cpu architecture it was created with.

also, it seems like the parquet specification itself has endian requirements and we'd like the repo to be consistent:

https://parquet.apache.org/docs/file-format/data-pages/encodings/

Testing. many of the tests use hard-coded data, either I write alternative tests or the big endian code translates to little endian externally (the same as question 1 I guess)

Yeah, I think same answer as above. Presumably the tests are reading parquet files that are written using the standard encodings detailed above and a big endian system should be able to read those.

cc @achille-roussel in case he has a different viewpoint.

JavaPerformance · 2024-03-28T15:49:21Z

Agreed on both points.

*_bigendian.go and *_bigendian_test.go OK for you?

forsaken628 · 2024-04-04T13:09:22Z

binary.LittleEndian is safe
but unsafecast package not, it not working on BigEndian platform.

parquet-go/internal/unsafecast/unsafecast.go

Lines 62 to 80 in c916150

    
           // Slice converts the data slice of type []From to a slice of type []To sharing 
        
           // the same backing array. The length and capacity of the returned slice are 
        
           // scaled according to the size difference between the source and destination 
        
           // types. 
        
           // 
        
           // Note that the function does not perform any checks to ensure that the memory 
        
           // layouts of the types are compatible, it is possible to cause memory 
        
           // corruption if the layouts mismatch (e.g. the pointers in the From are different 
        
           // than the pointers in To). 
        
           func Slice[To, From any](data []From) []To { 
        
           	// This function could use unsafe.Slice but it would drop the capacity 
        
           	// information, so instead we implement the type conversion. 
        
           	var zf From 
        
           	var zt To 
        
           	var s = (*slice)(unsafe.Pointer(&data)) 
        
           	s.len = int((uintptr(s.len) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) 
        
           	s.cap = int((uintptr(s.cap) * unsafe.Sizeof(zf)) / unsafe.Sizeof(zt)) 
        
           	return *(*[]To)(unsafe.Pointer(s)) 
        
           }

"the layouts mismatch", It refers to cast little endian file to big endian int

There is no way to fix unsafecast package without incurring significant performance overhead

JavaPerformance · 2024-04-04T14:33:57Z

@forsaken628 I realise that I'll have to take a performance hit on Mainframe but my changes won't affect other platforms which will continue using unsafe casts and copies and maintain their existing performance.

Vishwanatha-HD · 2024-04-25T09:22:35Z

Hi @pavolloffay,
Srinivas and myself are working on providing the Parquet support on s390x, in order to help the Openshift team by enabling the Tempo Operator support..

We see that many of the test cases are failing right now on s390x, and we tried our best to handle the testcase failure by changing endianness of the data types.. Even though, it helps in succeeding at some place, but it eventually fails at later stages.

These are testcase groups that were failing initially when we started looking into the parquet issue.
TestSplitBlockFilter/INT96
TestBuffer/boolean/
TestBuffer/int32
TestBuffer/float
TestBuffer/uint32
TestBufferRoundtripNestedRepeated
TestBufferRoundtripNestedRepeatedPointer
TestBufferSeekToRow
TestColumnPageIndex/file
TestConvertValue/string_to_int96 – Fixed this issue.
TestDictionary/BOOLEAN
TestDictionary/FLOAT
TestDictionary/INT(32,false)
TestOpenFile/testdata/alltypes_tiny_pages.parquet

Out of the above testcases, we have successfully fixed “TestConvertValue/string_to_int96” testcase…

We need your help and support in understanding the complete parquet package and its functionality, so that we can work on fixing the other testcase failures..
We want to know how the data is getting stored and retrieved as part of Parquet file format..

Warm Regards,
Vishwa..

achille-roussel · 2024-04-25T15:49:57Z

@Vishwanatha-HD feel free to send your questions, I'll do my best to respond promptly.

We can chat on the #parquet-go Gopher Slack channel for more direct communication as well!

JavaPerformance · 2024-04-25T16:12:18Z

@Vishwanatha-HD @achille-roussel

Hi Vishwa, I have been attempting a port for a little while, but struggling. If you haven't already you might want to look at sparse/array.go and add 8- sizeof type to the Index functions. For example:

func (a Uint8Array) Index(i int) uint8 {
	return *(*uint8)(unsafe.Add(a.index(i), 7))
}

func (a Uint32Array) Index(i int) uint32 {
	return *(*uint32)(unsafe.Add(a.index(i), 4))
}

pavolloffay mentioned this issue Aug 30, 2023

Panic in page.go on IBM Z segmentio/parquet-go#515

Closed

achille-roussel added the bug Something isn't working label Sep 1, 2023

pavolloffay mentioned this issue Sep 5, 2023

Run tests on s390x #58

Draft

pavolloffay changed the title ~~Panic in page.go~~ Panic in page.go on bigendian arch Oct 12, 2023

joe-elliott mentioned this issue Feb 6, 2024

Parquet Index out of bounds, seems to be related to attribuites grafana/tempo#3370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic in page.go on bigendian arch #56

Panic in page.go on bigendian arch #56

pavolloffay commented Aug 30, 2023

achille-roussel commented Sep 1, 2023

pavolloffay commented Sep 4, 2023

pavolloffay commented Sep 5, 2023

pavolloffay commented Sep 5, 2023

JavaPerformance commented Feb 12, 2024

JavaPerformance commented Feb 12, 2024

achille-roussel commented Feb 12, 2024

srinivas-pokala commented Feb 22, 2024

JavaPerformance commented Mar 22, 2024

joe-elliott commented Mar 22, 2024

JavaPerformance commented Mar 22, 2024

joe-elliott commented Mar 22, 2024

JavaPerformance commented Mar 22, 2024

JavaPerformance commented Mar 28, 2024

joe-elliott commented Mar 28, 2024

JavaPerformance commented Mar 28, 2024

forsaken628 commented Apr 4, 2024 •

edited

JavaPerformance commented Apr 4, 2024

Vishwanatha-HD commented Apr 25, 2024

achille-roussel commented Apr 25, 2024 •

edited

JavaPerformance commented Apr 25, 2024 •

edited

Panic in page.go on bigendian arch #56

Panic in page.go on bigendian arch #56

Comments

pavolloffay commented Aug 30, 2023

achille-roussel commented Sep 1, 2023

pavolloffay commented Sep 4, 2023

pavolloffay commented Sep 5, 2023

pavolloffay commented Sep 5, 2023

JavaPerformance commented Feb 12, 2024

JavaPerformance commented Feb 12, 2024

achille-roussel commented Feb 12, 2024

srinivas-pokala commented Feb 22, 2024

JavaPerformance commented Mar 22, 2024

joe-elliott commented Mar 22, 2024

JavaPerformance commented Mar 22, 2024

joe-elliott commented Mar 22, 2024

JavaPerformance commented Mar 22, 2024

JavaPerformance commented Mar 28, 2024

joe-elliott commented Mar 28, 2024

JavaPerformance commented Mar 28, 2024

forsaken628 commented Apr 4, 2024 • edited

JavaPerformance commented Apr 4, 2024

Vishwanatha-HD commented Apr 25, 2024

achille-roussel commented Apr 25, 2024 • edited

JavaPerformance commented Apr 25, 2024 • edited

forsaken628 commented Apr 4, 2024 •

edited

achille-roussel commented Apr 25, 2024 •

edited

JavaPerformance commented Apr 25, 2024 •

edited