Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support big-endian platforms (hard) #21

Open
Tracked by #3
gaborcsardi opened this issue May 18, 2024 · 13 comments
Open
Tracked by #3

Support big-endian platforms (hard) #21

gaborcsardi opened this issue May 18, 2024 · 13 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@gaborcsardi
Copy link
Member

gaborcsardi commented May 18, 2024

At least we should fail nicely on big endian platforms.

@gaborcsardi gaborcsardi added the bug an unexpected problem or unintended behavior label May 23, 2024
gaborcsardi added a commit that referenced this issue May 24, 2024
Will support them at some point, but not now:
#21
@gaborcsardi
Copy link
Member Author

OK, now we error at compile time on big endian platforms.

@barracuda156
Copy link

@gaborcsardi Could we instead fix it?

@barracuda156
Copy link

Admittedly, it is pretty bad at the moment, though duckdb-related errors can be ignored, it is a bug in duckdb.


R version 4.4.1 (2024-06-14) -- "Race for Your Life"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: powerpc-apple-darwin10.0.0d2 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> testthat::test_check("nanoparquet")
Loading required package: nanoparquet
R(32280,0x96c408) malloc: *** mmap(size=3121152000) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3808428032) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=2533556224) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=2264924160) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3657760768) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3305242624) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3540779008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3121152000) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=2533556224) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3121152000) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=2533556224) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3540779008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3540779008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3540779008) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3657564160) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=3540123648) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=2264924160) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=4127260672) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
R(32280,0x96c408) malloc: *** mmap(size=4127260672) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
[ FAIL 66 | WARN 0 | SKIP 33 | PASS 48 ]

══ Skipped tests (33) ══════════════════════════════════════════════════════════
• On CRAN (33): 'test-arrow-schema.R:24:3', 'test-parquet-metadata.R:17:3',
  'test-porcelain.R:43:3', 'test-porcelain.R:50:3', 'test-porcelain.R:110:3',
  'test-read-parquet.R:106:3', 'test-read-parquet.R:114:3',
  'test-read-parquet.R:128:3', 'test-read-parquet.R:205:3',
  'test-read-parquet.R:298:3', 'test-read-parquet.R:370:3',
  'test-read-parquet.R:383:3', 'test-read-parquet.R:396:3', 'test-rle.R:3:3',
  'test-spelling.R:2:3', 'test-write-parquet-2.R:18:3',
  'test-write-parquet-2.R:49:3', 'test-write-parquet.R:7:3',
  'test-write-parquet.R:31:3', 'test-write-parquet.R:64:3',
  'test-write-parquet.R:115:3', 'test-write-parquet.R:133:3',
  'test-write-parquet.R:146:3', 'test-write-parquet.R:178:3',
  'test-write-parquet.R:211:3', 'test-write-parquet.R:243:3',
  'test-write-parquet.R:268:3', 'test-write-parquet.R:296:3',
  'test-write-parquet.R:323:3', 'test-write-parquet.R:374:3',
  'test-write-parquet.R:389:3', 'test-write-parquet.R:404:3',
  'test-write-parquet.R:419:3'

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test-parquet-metadata.R:30:3'): ENUM type ───────────────────────────
Error in `parquet_schema(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:30:3
── Error ('test-parquet-metadata.R:36:3'): UUID type ───────────────────────────
Error in `parquet_schema(pf)`: Could not read footer, invalid Parquet file at 'data/uuid-arrow.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:36:3
── Error ('test-parquet-metadata.R:42:3'): DATE type ───────────────────────────
Error in `parquet_schema(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:42:3
── Error ('test-parquet-metadata.R:48:3'): DECIMAL type ────────────────────────
Error in `parquet_schema(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:48:3
── Error ('test-parquet-metadata.R:55:3'): TIME type ───────────────────────────
Error in `parquet_schema(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:55:3
── Error ('test-parquet-metadata.R:62:3'): TIMESTAMP type ──────────────────────
Error in `parquet_schema(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:62:3
── Error ('test-parquet-metadata.R:69:3'): LIST type ───────────────────────────
Error in `parquet_schema(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:69:3
── Error ('test-parquet-metadata.R:75:3'): MAP type ────────────────────────────
Error in `parquet_schema(pf)`: Could not read footer, invalid Parquet file at 'data/map.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─nanoparquet::parquet_schema(pf) at test-parquet-metadata.R:75:3
── Error ('test-parquet-metadata.R:81:3'): key-value metadata ──────────────────
Error in `parquet_metadata(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::parquet_metadata(pf) at test-parquet-metadata.R:81:3
── Error ('test-parquet-metadata.R:89:3'): parquet_column_types ────────────────
Error in `parquet_metadata(file)`: std::bad_alloc
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-parquet-metadata.R:89:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-parquet-metadata.R:112:3'): parquet_info ───────────────────────
Error in `parquet_metadata(file)`: std::bad_alloc
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-parquet-metadata.R:112:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-porcelain.R:2:3'): parquet_pages ───────────────────────────────
Error in `parquet_pages(test_path("data/mtcars-arrow.parquet"))`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet:::parquet_pages(test_path("data/mtcars-arrow.parquet")) at test-porcelain.R:2:3
── Error ('test-porcelain.R:8:3'): read_parquet_page ───────────────────────────
Error in `read_parquet_page(pf, 4)`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet:::read_parquet_page(pf, 4) at test-porcelain.R:8:3
── Error ('test-porcelain.R:16:3'): read_parquet_page for trick v2 data page ───
Error in `read_parquet_page(pf, 4L)`: Could not read footer, invalid Parquet file at 'data/rle_boolean_encoding.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-porcelain.R:16:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-porcelain.R:32:5'): snappy ─────────────────────────────────────
Error in `snappy_uncompress(comp)`: Snappy Uncompression failure
Backtrace:
    ▆
 1. └─nanoparquet:::snappy_uncompress(comp) at test-porcelain.R:32:5
── Error ('test-porcelain.R:142:3'): DELTA_BIANRY_PACKED INT64 ─────────────────
Error in `read_parquet_page(pf, 4L)`: Could not read footer, invalid Parquet file at 'data/dbp-int64.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─nanoparquet:::read_parquet_page(pf, 4L) at test-porcelain.R:142:3
── Error ('test-read-parquet.R:88:3'): basic reading works ─────────────────────
Error in `read_parquet(test_path("data/alltypes_plain.parquet"))`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(test_path("data/alltypes_plain.parquet")) at test-read-parquet.R:88:3
── Error ('test-read-parquet.R:93:3'): basic reading works with snappy ─────────
Error in `read_parquet(test_path("data/alltypes_plain.snappy.parquet"))`: std::bad_alloc
Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(test_path("data/alltypes_plain.snappy.parquet")) at test-read-parquet.R:93:3
── Error ('test-read-parquet.R:98:3'): read factors, marked by Arrow ───────────
Error in `read_parquet(test_path("data/factor.parquet"))`: Could not read footer, invalid Parquet file at 'data/factor.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(test_path("data/factor.parquet")) at test-read-parquet.R:98:3
── Error ('test-read-parquet.R:164:3'): read Date ──────────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e185ddab755.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(tmp) at test-read-parquet.R:164:3
── Error ('test-read-parquet.R:178:3'): read hms ───────────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e185f6481d7.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(tmp) at test-read-parquet.R:178:3
── Error ('test-read-parquet.R:185:3'): read hms in MICROS ─────────────────────
Error in `read_parquet(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-read-parquet.R:185:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-read-parquet.R:199:3'): read POSIXct ───────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e183b28165c.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(tmp) at test-read-parquet.R:199:3
── Error ('test-read-parquet.R:227:3'): read difftime ──────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e18211c110c.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(tmp) at test-read-parquet.R:227:3
── Error ('test-read-parquet.R:261:3'): RLE BOOLEAN ────────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e186560015c.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. ├─testthat::expect_equal(as.data.frame(read_parquet(tmp)), d) at test-read-parquet.R:261:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─base::as.data.frame(read_parquet(tmp))
 5. └─nanoparquet::read_parquet(tmp)
── Error ('test-read-parquet.R:285:3'): read GZIP compressed files ─────────────
Error in `read_parquet(pf)`: Could not read footer, invalid Parquet file at 'data/gzip.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-read-parquet.R:285:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-read-parquet.R:292:3'): V2 data pages ──────────────────────────
Error in `read_parquet(pf)`: Could not read footer, invalid Parquet file at 'data/parquet_go.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-read-parquet.R:292:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-read-parquet.R:310:3'): Tricky V2 data page ────────────────────
Error in `read_parquet(pf)`: Could not read footer, invalid Parquet file at 'data/rle_boolean_encoding.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-read-parquet.R:310:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-read-parquet.R:317:3'): zstd ───────────────────────────────────
Error in `parquet_metadata(pf)`: Could not read footer, invalid Parquet file at 'data/zstd.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. ├─testthat::expect_true(...) at test-read-parquet.R:317:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─nanoparquet::parquet_metadata(pf)
── Error ('test-read-parquet.R:324:3'): zstd with data page v2 ─────────────────
Error in `parquet_metadata(pf)`: Could not read footer, invalid Parquet file at 'data/zstd-v2.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. ├─testthat::expect_true(...) at test-read-parquet.R:324:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─nanoparquet::parquet_metadata(pf)
── Error ('test-read-parquet.R:335:3'): DELTA_BIANRY_PACKED encoding ───────────
Error in `read_parquet(pf)`: std::bad_alloc
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-read-parquet.R:335:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-read-parquet.R:355:3'): UUID columns ───────────────────────────
Error in `read_parquet(pf)`: Could not read footer, invalid Parquet file at 'data/uuid-arrow.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─testthat::expect_snapshot(...) at test-read-parquet.R:355:3
 2.   └─rlang::cnd_signal(state$error)
── Error ('test-read-parquet.R:362:3'): DELTA_LENGTH_BYTE_ARRAY encoding ───────
Error in `read_parquet(pf)`: Could not read footer, invalid Parquet file at 'data/delta_length_byte_array.parquet' @ lib/ParquetFile.cpp:103
Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(pf) at test-read-parquet.R:362:3
── Failure ('test-rle.R:25:5'): rle_encode ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual[1:27]`: 5 14 10 17 18 28 8 17 1 2 and 17 more...
`expected[1:27]`: 0  1  2  3  4  5 6  7 8 9            ...

  `actual[35:64]`: 1 1 1 5 14 10 17 18 28 8 and 20 more...
`expected[35:64]`: 1 1 1 0  1  2  3  4  5 6            ...
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(c(0:16, rep(1L, 20), 0:16, rep(2L, 20))) at test-rle.R:27:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:25:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0
`expected`: 1
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1L) at test-rle.R:38:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0
`expected`: 7
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(7L) at test-rle.R:39:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0
`expected`: 8
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(8L) at test-rle.R:40:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`:   0
`expected`: 100
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(100L) at test-rle.R:41:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0
`expected`: 1 2
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:2) at test-rle.R:42:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0
`expected`: 1 2 3
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:3) at test-rle.R:43:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0
`expected`: 1 2 3 4
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:4) at test-rle.R:44:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0
`expected`: 1 2 3 4 5
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:5) at test-rle.R:45:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0
`expected`: 1 2 3 4 5 6
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:6) at test-rle.R:46:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 0
`expected`: 1 2 3 4 5 6 7
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:7) at test-rle.R:47:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 0 0
`expected`: 1 2 3 4 5 6 7 8
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:8) at test-rle.R:48:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 9 0 7
`expected`: 1 2 3 4 5 6 7 8 9
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(1:9) at test-rle.R:49:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0
`expected`: 0 1
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:1) at test-rle.R:50:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0
`expected`: 0 1 2
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:2) at test-rle.R:51:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0
`expected`: 0 1 2 3
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:3) at test-rle.R:52:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0
`expected`: 0 1 2 3 4
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:4) at test-rle.R:53:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0
`expected`: 0 1 2 3 4 5
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:5) at test-rle.R:54:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 0
`expected`: 0 1 2 3 4 5 6
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:6) at test-rle.R:55:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 0 0
`expected`: 0 1 2 3 4 5 6 7
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:7) at test-rle.R:56:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 8 0 6
`expected`: 0 1 2 3 4 5 6 7 8
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:8) at test-rle.R:57:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 8 9 6 7
`expected`: 0 1 2 3 4 5 6 7 8 9
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(0:9) at test-rle.R:58:3
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0
`expected`: 1 1
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(rep(1L, l)) at test-rle.R:61:5
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 0
`expected`: 1 1 1 1 1 1 1
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(rep(1L, l)) at test-rle.R:61:5
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Failure ('test-rle.R:34:5'): edge cases ─────────────────────────────────────
`x2` (`actual`) not equal to `x` (`expected`).

  `actual`: 0 0 0 0 0 0 0 0
`expected`: 1 1 1 1 1 1 1 1
Backtrace:
    ▆
 1. └─nanoparquet (local) chk(rep(1L, l)) at test-rle.R:61:5
 2.   └─testthat::expect_equal(x2, x) at test-rle.R:34:5
── Error ('test-write-parquet-2.R:78:3'): REQ RLE_DICT ─────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e1862088dc4.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. ├─testthat::expect_equal(as.data.frame(read_parquet(tmp)), d) at test-write-parquet-2.R:78:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─base::as.data.frame(read_parquet(tmp))
 5. └─nanoparquet::read_parquet(tmp)
── Error ('test-write-parquet-2.R:101:3'): OPT RLE_DICT ────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e18574afdc1.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. ├─testthat::expect_equal(as.data.frame(read_parquet(tmp)), d) at test-write-parquet-2.R:101:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. ├─base::as.data.frame(read_parquet(tmp))
 5. └─nanoparquet::read_parquet(tmp)
── Error ('test-write-parquet-2.R:128:3'): gzip compression ────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e184bb8b48a.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. ├─testthat::expect_equal(read_parquet(tmp), d) at test-write-parquet-2.R:128:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─nanoparquet::read_parquet(tmp)
── Error ('test-write-parquet-2.R:138:3'): zstd compression ────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e181e3ed29e.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. ├─testthat::expect_equal(read_parquet(tmp), d) at test-write-parquet-2.R:138:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─nanoparquet::read_parquet(tmp)
── Error ('test-write-parquet.R:21:3'): round trip ─────────────────────────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e187db24ed9.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. ├─testthat::expect_true(all(read_parquet(tmp) == mt)) at test-write-parquet.R:21:3
 2. │ └─testthat::quasi_label(enquo(object), label, arg = "object")
 3. │   └─rlang::eval_bare(expr, quo_get_env(quo))
 4. └─nanoparquet::read_parquet(tmp)
── Failure ('test-write-parquet.R:55:3'): round trip with duckdb ───────────────
`df` (`actual`) not equal to `mt` (`expected`).

actual vs expected
                 large
- actual[1, ]    FALSE
+ expected[1, ]   TRUE
- actual[2, ]    FALSE
+ expected[2, ]   TRUE
  actual[3, ]    FALSE
  actual[4, ]     TRUE
- actual[5, ]    FALSE
+ expected[5, ]   TRUE
- actual[6, ]    FALSE
+ expected[6, ]   TRUE
- actual[7, ]    FALSE
+ expected[7, ]   TRUE
- actual[8, ]     TRUE
+ expected[8, ]  FALSE
  actual[9, ]    FALSE
- actual[10, ]   FALSE
+ expected[10, ]  TRUE
and 22 more ...

     actual$large | expected$large                
 [1] FALSE        - TRUE           [1]            
 [2] FALSE        - TRUE           [2]            
 [3] FALSE        | FALSE          [3]            
 [4] TRUE         | TRUE           [4]            
 [5] FALSE        - TRUE           [5]            
 [6] FALSE        - TRUE           [6]            
 [7] FALSE        - TRUE           [7]            
 [8] TRUE         - FALSE          [8]            
 [9] FALSE        | FALSE          [9]            
[10] FALSE        - TRUE           [10]           
 ... ...            ...            and 22 more ...
── Error ('test-write-parquet.R:59:3'): round trip with duckdb ─────────────────
Error: rapi_execute: Failed to run query
Error: Invalid Error: Snappy decompression failure
Backtrace:
     ▆
  1. └─duckdb:::sql(sprintf("FROM '%s'", tmp)) at test-write-parquet.R:59:3
  2.   ├─DBI::dbGetQuery(conn, sql)
  3.   └─DBI::dbGetQuery(conn, sql)
  4.     └─DBI (local) .local(conn, statement, ...)
  5.       ├─DBI::dbSendQuery(conn, statement, ...)
  6.       └─duckdb::dbSendQuery(conn, statement, ...)
  7.         └─duckdb (local) .local(conn, statement, ...)
  8.           └─duckdb:::duckdb_result(...)
  9.             └─duckdb:::duckdb_execute(res)
 10.               └─duckdb:::rapi_execute(...)
── Error ('test-write-parquet.R:359:3'): Factor levels not in the data ─────────
Error in `read_parquet(tmp)`: Invalid Parquet file '/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_R_R-nanoparquet/R-nanoparquet/work/.tmp/Rtmp4ZwA1k/file7e18644e9d67.parquet'. Couldn't deserialize thrift: No more data to read.

Backtrace:
    ▆
 1. └─nanoparquet::read_parquet(tmp) at test-write-parquet.R:359:3

[ FAIL 66 | WARN 0 | SKIP 33 | PASS 48 ]
Error: Test failures
Execution halted

I could try asking Apache-Arrow upstream for some assistance, if the breakage is there.

@gaborcsardi
Copy link
Member Author

It is unlikely that we'll fix this soon, because there is still a lot of refactoring going on.
After the next nanoparquet release I might take a look, but I would also need access to a big-endian system.

@barracuda156
Copy link

@gaborcsardi Thank you! (There is no urgency here, of course.)

@gaborcsardi
Copy link
Member Author

If you or others have a real use case for big endian platforms, that would also motivate me to fix this.

@barracuda156
Copy link

@gaborcsardi AIX is BE-only, *BSD and Linux on most IBM Power CPUs are BE (not an expert here, but I think only Power9 defaults to LE, while still being bi-endian), s390 is BE. At least NetBSD maintains arm BE port.
(Not mentioning legacy ”hobbyist” platforms like Darwin, Irix, SPARC and Amiga.)

I cannot say how many people use R or interested in Parquet support, but here is a recent AIX-related ticket: apache/spark#29419

@gaborcsardi
Copy link
Member Author

Sure, I know that big endian platforms exist, I am just not sure if they are relevant for R. I.e. how many people are using R and (say) arrow on big endian platforms?

Unfortunately Parquet is little endian on disk, so to support big endian platforms, we'd need to review and potentially reimplement all of the Parquet encodings, both for reading and writing. So this is not a small fix, but it is a rather big undertaking.

I am still planning to do it at some point, but it'll take some time. First I would need to set up some tests on GHA.

@gaborcsardi
Copy link
Member Author

I have a GHA workflow now to test on S390x. Right now failing, as expected:

lib/nanoparquet.h:5:2: error: #error Nanoparquet does not support big-endian platforms: https:
    5 | #error Nanoparquet does not support big-endian platforms: https://github.com/r-lib/nanoparquet/issues/21
      |  ^~~~~

@barracuda156
Copy link

Thank you, great. I didn’t know GHA have s390z. Is it generally available?

@gaborcsardi
Copy link
Member Author

They don't have s390x, you need to run it on x86_64 via qemu.

@barracuda156
Copy link

@gaborcsardi BTW is there a similar Linux-based setup for R to test on 32-bit ppc?

@gaborcsardi
Copy link
Member Author

I don't know. qemu can probably emulate it, but IDK if there is a recent-ish Linux distro that supports it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants