[TEST] Run all Travis tests with Arrow enabled #1727

javierluraschi · 2018-10-22T18:36:09Z

Temp PR to investigate all Travis test to run using Arrow.

══ testthat results  ══════════════════════════════════════════════════════════════════════════════════════════════════
OK: 1848 SKIPPED: 28 FAILED: 17
1. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#135) 
2. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#139) 
3. Failure: gaussian_mixture.tidy() works (@test-broom-gaussian_mixture.R#14) 
4. Failure: kmeans.tidy() works (@test-broom-kmeans.R#14) 
5. Failure: random_forest.tidy() works (@test-broom-random_forest.R#15) 
6. Failure: random_forest.tidy() works (@test-broom-random_forest.R#24) 
7. Error: we can separate struct columns (#690) (@test-column-extraction.R#92) 
8. Failure: top_n works as expected (@test-dplyr-top-n.R#23) 
9. Error: ml_recommend() works (@test-ml-als.R#44) 
1. ...

Some of the failures...

── 1. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#135)  ──────────────────────────────
bind_rows(df5a, df6a) not equal to sdf_bind_rows(df5a_tbl, df6a_tbl) %>% collect().
Rows in x but not y: 2. Rows in y but not x: 2. 

── 2. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#139)  ──────────────────────────────
bind_rows(df6a, df5a) not equal to sdf_bind_rows(df6a_tbl, df5a_tbl) %>% collect().
Rows in x but not y: 4. Rows in y but not x: 4. 

── 3. Failure: gaussian_mixture.tidy() works (@test-broom-gaussian_mixture.R#14)  ─────────────────────────────────────
td1$size not equal to c(3, 14, 4, 11).
4/4 mismatches (average diff: 3)
[1]  0 -  3 == -3
[2] 11 - 14 == -3
[3]  7 -  4 ==  3
[4] 14 - 11 ==  3

── 4. Failure: kmeans.tidy() works (@test-broom-kmeans.R#14)  ─────────────────────────────────────────────────────────
td1$size not equal to c(14, 2, 6, 10).
4/4 mismatches (average diff: 4.5)
[1] 10 - 14 == -4
[2]  3 -  2 ==  1
[3] 14 -  6 ==  8
[4]  5 - 10 == -5

── 5. Failure: random_forest.tidy() works (@test-broom-random_forest.R#15)  ───────────────────────────────────────────
td1$importance not equal to c(0.941, 0.0586).
2/2 mismatches (average diff: 0.00487)
[1] 0.9363 - 0.9410 == -0.00467
[2] 0.0637 - 0.0586 ==  0.00507

── 6. Failure: random_forest.tidy() works (@test-broom-random_forest.R#24)  ───────────────────────────────────────────
td2$importance not equal to c(0.658, 0.342).
2/2 mismatches (average diff: 0.0163)
[1] 0.642 - 0.658 == -0.0163
[2] 0.358 - 0.342 ==  0.0163

── 7. Error: we can separate struct columns (#690) (@test-column-extraction.R#92)  ────────────────────────────────────
cannot handle Array of type struct
1: sliding_window_sdf %>% sdf_separate_column("sw") at testthat/test-column-extraction.R:92
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
5: `_fseq`(`_lhs`)
6: freduce(value, `_function_list`)
7: withVisible(function_list[[k]](value))
8: function_list[[k]](value)
9: sdf_separate_column(., "sw")
10: x %>% head(1) %>% dplyr::pull(!!rlang::sym(column)) %>% rlang::flatten() %>% length() %>% seq_len(.) at /Users/javierluraschi/RStudio/sparklyr/R/sdf_wrapper.R:293
...
32: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
33: eval(quote(`_fseq`(`_lhs`)), env, env)
34: eval(quote(`_fseq`(`_lhs`)), env, env)
35: `_fseq`(`_lhs`)
36: freduce(value, `_function_list`)
37: function_list[[i]](value)
38: arrow_read_stream(.)
39: tibble::as_tibble(record_entry) at /Users/javierluraschi/RStudio/sparklyr/R/arrow_data.R:37
40: `as_tibble.arrow::RecordBatch`(record_entry)
41: RecordBatch__to_dataframe(x) at /Users/javierluraschi/RStudio/arrow/r/R/RecordBatch.R:59

── 8. Failure: top_n works as expected (@test-dplyr-top-n.R#23)  ──────────────────────────────────────────────────────
all(tn1 == tn2) isn't true.

── 9. Error: ml_recommend() works (@test-ml-als.R#44)  ────────────────────────────────────────────────────────────────
cannot handle Array of type struct
1: expect_identical(als_model %>% ml_recommend("users", 2) %>% colnames(), c("item", "recommendations", "user", "rating")) at testthat/test-ml-als.R:44
2: quasi_label(enquo(object), label)
3: eval_bare(get_expr(quo), get_env(quo))
4: als_model %>% ml_recommend("users", 2) %>% colnames()
5: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
6: eval(quote(`_fseq`(`_lhs`)), env, env)
7: eval(quote(`_fseq`(`_lhs`)), env, env)
8: `_fseq`(`_lhs`)
9: freduce(value, `_function_list`)
10: function_list[[i]](value)
...
43: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
44: eval(quote(`_fseq`(`_lhs`)), env, env)
45: eval(quote(`_fseq`(`_lhs`)), env, env)
46: `_fseq`(`_lhs`)
47: freduce(value, `_function_list`)
48: function_list[[i]](value)
49: arrow_read_stream(.)
50: tibble::as_tibble(record_entry) at /Users/javierluraschi/RStudio/sparklyr/R/arrow_data.R:37
51: `as_tibble.arrow::RecordBatch`(record_entry)
52: RecordBatch__to_dataframe(x) at /Users/javierluraschi/RStudio/arrow/r/R/RecordBatch.R:59

── 10. Failure: ml_bisecting_kmeans() works properly (@test-ml-clustering-bisecting-kmeans.R#255)  ────────────────────
`print(ml_bisecting_kmeans(iris_tbl, ~. - Species, k = 5, seed = 11))` has changed from known value recorded in 'output/print/bisecting-kmeans.txt'.
3/11 mismatches
x[5]: "1     4.803226    3.225806     1.419355   0.2096774"
y[5]: "1     4.750000    3.012500     1.666667   0.3166667"

x[6]: "2     5.290909    3.572727     1.759091   0.4045455"
y[6]: "2     5.217241    3.665517     1.472414   0.2689655"

x[11]: "Within Set Sum of Squared Errors =  63.01642"
y[11]: "Within Set Sum of Squared Errors =  60.60717"

── 11. Failure: we can construct a simple pivot table (@test-pivot.R#22)  ─────────────────────────────────────────────
unname(s) not equal to unname(r).
Component 2: Attributes: < Modes: list, NULL >
Component 2: Attributes: < Lengths: 1, 0 >
Component 2: Attributes: < names for target but not for current >
Component 2: Attributes: < current is not list-like >
Component 2: target is integer64, current is numeric
Component 3: Attributes: < Modes: list, NULL >
Component 3: Attributes: < Lengths: 1, 0 >
Component 3: Attributes: < names for target but not for current >
Component 3: Attributes: < current is not list-like >
...

── 12. Error: spark_read_json() can load data using column types (@test-read-write.R#49)  ─────────────────────────────
cannot handle Array of type struct
1: spark_read_json(sc, name = "iris_json_typed", path = "test.json", columns = list(Sepal_Length = "character", Species = "character", 
       Other = "struct<a:integer,b:character>")) %>% collect() at testthat/test-read-write.R:49
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
5: `_fseq`(`_lhs`)
6: freduce(value, `_function_list`)
7: withVisible(function_list[[k]](value))
8: function_list[[k]](value)
9: collect(.)
10: collect.tbl_sql(.)
...
22: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
23: eval(quote(`_fseq`(`_lhs`)), env, env)
24: eval(quote(`_fseq`(`_lhs`)), env, env)
25: `_fseq`(`_lhs`)
26: freduce(value, `_function_list`)
27: function_list[[i]](value)
28: arrow_read_stream(.)
29: tibble::as_tibble(record_entry) at /Users/javierluraschi/RStudio/sparklyr/R/arrow_data.R:37
30: `as_tibble.arrow::RecordBatch`(record_entry)
31: RecordBatch__to_dataframe(x) at /Users/javierluraschi/RStudio/arrow/r/R/RecordBatch.R:59

── 13. Error: (unknown) (@test-serialization.R#7)  ────────────────────────────────────────────────────────────────────
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 25364.0 failed 1 times, most recent failure: Lost task 7.0 in stage 25364.0 (TID 132972, localhost, executor driver): java.lang.UnsupportedOperationException: Unsupported data type: Date(MILLISECOND)

── 14. Error: 'spark_apply' can add columns (@test-spark-apply.R#28)  ─────────────────────────────────────────────────
cannot handle Array of type decimal
1: expect_equal(iris_tbl %>% spark_apply(function(e) cbind(e, 1), names = c(colnames(iris_tbl), "new")) %>% collect(), iris_tbl

── 15. Error: 'spark_apply' can roundtrip Date-Time (@test-spark-apply.R#188)  ────────────────────────────────────────
sparklyr worker rscript failure, check worker logs for details
    Log: /var/folders/ks/wm_bx4cn70s6h0r5vgqpsldm0000gn/T//RtmpOXR9Td/file1da775bc76cd_spark.log
18/10/25 15:36:32 ERROR sparklyr: RScript (7555) terminated unexpectedly: java.lang.UnsupportedOperationException: Unsupported data type: Date(MILLISECOND)

── 16. Error: 'spark_apply' supports grouped empty results (@test-spark-apply.R#215)  ─────────────────────────────────
sparklyr worker rscript failure, check worker logs for details
    Log: /var/folders/ks/wm_bx4cn70s6h0r5vgqpsldm0000gn/T//RtmpOXR9Td/file1da775bc76cd_spark.log
18/10/25 15:36:44 ERROR sparklyr: RScript (3929) terminated unexpectedly: java.util.NoSuchElementException
	at org.apache.spark.sql.vectorized.ColumnarBatch$1.next(ColumnarBatch.java:65)

── 17. Failure: debug_string works (@test-spark-utils.R#38)  ──────────────────────────────────────────────────────────
grepl("^\\(1\\)", debug[1]) isn't true.

…w reader

…rialization

This reverts commit 115f29f.

javierluraschi · 2018-10-26T01:09:33Z

Merged into #1611

javierluraschi added 30 commits October 2, 2018 20:44

start with naive conversion using feather

fa4e6a8

use scala helper to load binary rdd holding arrow data

6b66774

support arrowconverters using public spark api

0427aee

make use of converters from arrow_copy_to

67c6ad7

fix typos while calling arrow converter from r

556fc88

add interface for python arrow serializer to compare with ease

66c4699

reuse known schema instead of relying on arrow's file schema

aa19a71

use proper arrow batch writer since batches are expected in java arro…

9396a4f

…w reader

avoid __index_level_0__ while converting to arrow

60d3898

use internal rows to match pythons arrow converter

ee23b1a

use sparklyr's invoke to properly match method arguments

7f2fbed

fix reticulate reference in arrow poc

31b4bb4

add arrow as remote

960da2f

make use of new arrow serializer and default to this

e1bd21e

simplify arrow serializers to only use the R serializer

04933d8

enable arrow upstream serialization in sdf_copy_to, dplyr and dbi

6a6ce05

rebuild jars and sources

7060a60

add tobatchiterator for arrowconverters

b3db8d0

rebuild jars

6ab9709

complete arrow serialization to enable raw collection

961d49f

add message headers with schema to arrow collection

825c498

rebuild jars

5703b4d

make use of arrows record_batch() pr to improve performance

273ffee

rebuild jars

53222f6

support for collecting using arrow

c8e81b5

enable arrow collection

6017892

support for arrow collection in spark_apply scala codebase

0e05606

support for arrow collection in spark_apply R codebase

c8ab71a

rebuild docs and sources

49a9182

fix null pointer exception while processing distributed map due to se…

56d1da1

…rialization

javierluraschi added 28 commits October 19, 2018 14:52

rebuild livy sources

0334064

add support for arrow in travis

8542a62

use addons to install arrow binaries

a4d9e75

fix typos and clean settings

082d8fc

correct spacing for travis apt source line

d0ccae9

fix script check in travis arrow installer

20bd4bd

one more fix to arrow travis installer

94838cd

install devtools package

6aa72bc

fix typo while installing devtools for arrow in travis

fa425d5

enable arrow in travis tests

16ea9b3

fix iris copy test under arrow

ebae3bb

qualify utils class for livy

115f29f

rebuild jars and sources

d8f1402

split arrowbatchstreamwritter to its own file

d41a53f

enable arrow transfer with livy

235ee87

rebuild jars and sources

e08e4c9

Revert "qualify utils class for livy"

a5b9ef2

This reverts commit 115f29f.

rebuild jars and sources

535c399

make arrow library call in tests dynamic to avoid suggests

9eb6084

fix livy connections under spark_apply() in spark 2.3

41202e2

better label for arrow travis environment

8aa5b2b

enable copy_to and collect using arrow and livy

a5181ad

disble spark apply packages distribution by default in livy

3981419

add log entry to connections pane under livy

9fe8727

support for spark_apply in livy with arrow

8019437

rebuild jars and sources

b0f1b85

attempt to enable all tests under arrow

8ed1a85

disable non-arrow tests while troubleshooting

d405af2

javierluraschi closed this Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST] Run all Travis tests with Arrow enabled #1727

[TEST] Run all Travis tests with Arrow enabled #1727

javierluraschi commented Oct 22, 2018 •

edited

javierluraschi commented Oct 26, 2018

[TEST] Run all Travis tests with Arrow enabled #1727

[TEST] Run all Travis tests with Arrow enabled #1727

Conversation

javierluraschi commented Oct 22, 2018 • edited

javierluraschi commented Oct 26, 2018

javierluraschi commented Oct 22, 2018 •

edited