Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] Run all Travis tests with Arrow enabled #1727

Closed
wants to merge 96 commits into from

Conversation

javierluraschi
Copy link
Collaborator

@javierluraschi javierluraschi commented Oct 22, 2018

Temp PR to investigate all Travis test to run using Arrow.

══ testthat results  ══════════════════════════════════════════════════════════════════════════════════════════════════
OK: 1848 SKIPPED: 28 FAILED: 17
1. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#135) 
2. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#139) 
3. Failure: gaussian_mixture.tidy() works (@test-broom-gaussian_mixture.R#14) 
4. Failure: kmeans.tidy() works (@test-broom-kmeans.R#14) 
5. Failure: random_forest.tidy() works (@test-broom-random_forest.R#15) 
6. Failure: random_forest.tidy() works (@test-broom-random_forest.R#24) 
7. Error: we can separate struct columns (#690) (@test-column-extraction.R#92) 
8. Failure: top_n works as expected (@test-dplyr-top-n.R#23) 
9. Error: ml_recommend() works (@test-ml-als.R#44) 
1. ...

Some of the failures...

── 1. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#135)  ──────────────────────────────
bind_rows(df5a, df6a) not equal to sdf_bind_rows(df5a_tbl, df6a_tbl) %>% collect().
Rows in x but not y: 2. Rows in y but not x: 2. 

── 2. Failure: 'sdf_bind_rows' handles column type upcasting (#804) (@test-binds.R#139)  ──────────────────────────────
bind_rows(df6a, df5a) not equal to sdf_bind_rows(df6a_tbl, df5a_tbl) %>% collect().
Rows in x but not y: 4. Rows in y but not x: 4. 

── 3. Failure: gaussian_mixture.tidy() works (@test-broom-gaussian_mixture.R#14)  ─────────────────────────────────────
td1$size not equal to c(3, 14, 4, 11).
4/4 mismatches (average diff: 3)
[1]  0 -  3 == -3
[2] 11 - 14 == -3
[3]  7 -  4 ==  3
[4] 14 - 11 ==  3

── 4. Failure: kmeans.tidy() works (@test-broom-kmeans.R#14)  ─────────────────────────────────────────────────────────
td1$size not equal to c(14, 2, 6, 10).
4/4 mismatches (average diff: 4.5)
[1] 10 - 14 == -4
[2]  3 -  2 ==  1
[3] 14 -  6 ==  8
[4]  5 - 10 == -5

── 5. Failure: random_forest.tidy() works (@test-broom-random_forest.R#15)  ───────────────────────────────────────────
td1$importance not equal to c(0.941, 0.0586).
2/2 mismatches (average diff: 0.00487)
[1] 0.9363 - 0.9410 == -0.00467
[2] 0.0637 - 0.0586 ==  0.00507

── 6. Failure: random_forest.tidy() works (@test-broom-random_forest.R#24)  ───────────────────────────────────────────
td2$importance not equal to c(0.658, 0.342).
2/2 mismatches (average diff: 0.0163)
[1] 0.642 - 0.658 == -0.0163
[2] 0.358 - 0.342 ==  0.0163

── 7. Error: we can separate struct columns (#690) (@test-column-extraction.R#92)  ────────────────────────────────────
cannot handle Array of type struct
1: sliding_window_sdf %>% sdf_separate_column("sw") at testthat/test-column-extraction.R:92
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
5: `_fseq`(`_lhs`)
6: freduce(value, `_function_list`)
7: withVisible(function_list[[k]](value))
8: function_list[[k]](value)
9: sdf_separate_column(., "sw")
10: x %>% head(1) %>% dplyr::pull(!!rlang::sym(column)) %>% rlang::flatten() %>% length() %>% seq_len(.) at /Users/javierluraschi/RStudio/sparklyr/R/sdf_wrapper.R:293
...
32: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
33: eval(quote(`_fseq`(`_lhs`)), env, env)
34: eval(quote(`_fseq`(`_lhs`)), env, env)
35: `_fseq`(`_lhs`)
36: freduce(value, `_function_list`)
37: function_list[[i]](value)
38: arrow_read_stream(.)
39: tibble::as_tibble(record_entry) at /Users/javierluraschi/RStudio/sparklyr/R/arrow_data.R:37
40: `as_tibble.arrow::RecordBatch`(record_entry)
41: RecordBatch__to_dataframe(x) at /Users/javierluraschi/RStudio/arrow/r/R/RecordBatch.R:59

── 8. Failure: top_n works as expected (@test-dplyr-top-n.R#23)  ──────────────────────────────────────────────────────
all(tn1 == tn2) isn't true.

── 9. Error: ml_recommend() works (@test-ml-als.R#44)  ────────────────────────────────────────────────────────────────
cannot handle Array of type struct
1: expect_identical(als_model %>% ml_recommend("users", 2) %>% colnames(), c("item", "recommendations", "user", "rating")) at testthat/test-ml-als.R:44
2: quasi_label(enquo(object), label)
3: eval_bare(get_expr(quo), get_env(quo))
4: als_model %>% ml_recommend("users", 2) %>% colnames()
5: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
6: eval(quote(`_fseq`(`_lhs`)), env, env)
7: eval(quote(`_fseq`(`_lhs`)), env, env)
8: `_fseq`(`_lhs`)
9: freduce(value, `_function_list`)
10: function_list[[i]](value)
...
43: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
44: eval(quote(`_fseq`(`_lhs`)), env, env)
45: eval(quote(`_fseq`(`_lhs`)), env, env)
46: `_fseq`(`_lhs`)
47: freduce(value, `_function_list`)
48: function_list[[i]](value)
49: arrow_read_stream(.)
50: tibble::as_tibble(record_entry) at /Users/javierluraschi/RStudio/sparklyr/R/arrow_data.R:37
51: `as_tibble.arrow::RecordBatch`(record_entry)
52: RecordBatch__to_dataframe(x) at /Users/javierluraschi/RStudio/arrow/r/R/RecordBatch.R:59

── 10. Failure: ml_bisecting_kmeans() works properly (@test-ml-clustering-bisecting-kmeans.R#255)  ────────────────────
`print(ml_bisecting_kmeans(iris_tbl, ~. - Species, k = 5, seed = 11))` has changed from known value recorded in 'output/print/bisecting-kmeans.txt'.
3/11 mismatches
x[5]: "1     4.803226    3.225806     1.419355   0.2096774"
y[5]: "1     4.750000    3.012500     1.666667   0.3166667"

x[6]: "2     5.290909    3.572727     1.759091   0.4045455"
y[6]: "2     5.217241    3.665517     1.472414   0.2689655"

x[11]: "Within Set Sum of Squared Errors =  63.01642"
y[11]: "Within Set Sum of Squared Errors =  60.60717"

── 11. Failure: we can construct a simple pivot table (@test-pivot.R#22)  ─────────────────────────────────────────────
unname(s) not equal to unname(r).
Component 2: Attributes: < Modes: list, NULL >
Component 2: Attributes: < Lengths: 1, 0 >
Component 2: Attributes: < names for target but not for current >
Component 2: Attributes: < current is not list-like >
Component 2: target is integer64, current is numeric
Component 3: Attributes: < Modes: list, NULL >
Component 3: Attributes: < Lengths: 1, 0 >
Component 3: Attributes: < names for target but not for current >
Component 3: Attributes: < current is not list-like >
...

── 12. Error: spark_read_json() can load data using column types (@test-read-write.R#49)  ─────────────────────────────
cannot handle Array of type struct
1: spark_read_json(sc, name = "iris_json_typed", path = "test.json", columns = list(Sepal_Length = "character", Species = "character", 
       Other = "struct<a:integer,b:character>")) %>% collect() at testthat/test-read-write.R:49
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3: eval(quote(`_fseq`(`_lhs`)), env, env)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
5: `_fseq`(`_lhs`)
6: freduce(value, `_function_list`)
7: withVisible(function_list[[k]](value))
8: function_list[[k]](value)
9: collect(.)
10: collect.tbl_sql(.)
...
22: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
23: eval(quote(`_fseq`(`_lhs`)), env, env)
24: eval(quote(`_fseq`(`_lhs`)), env, env)
25: `_fseq`(`_lhs`)
26: freduce(value, `_function_list`)
27: function_list[[i]](value)
28: arrow_read_stream(.)
29: tibble::as_tibble(record_entry) at /Users/javierluraschi/RStudio/sparklyr/R/arrow_data.R:37
30: `as_tibble.arrow::RecordBatch`(record_entry)
31: RecordBatch__to_dataframe(x) at /Users/javierluraschi/RStudio/arrow/r/R/RecordBatch.R:59

── 13. Error: (unknown) (@test-serialization.R#7)  ────────────────────────────────────────────────────────────────────
org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 25364.0 failed 1 times, most recent failure: Lost task 7.0 in stage 25364.0 (TID 132972, localhost, executor driver): java.lang.UnsupportedOperationException: Unsupported data type: Date(MILLISECOND)

── 14. Error: 'spark_apply' can add columns (@test-spark-apply.R#28)  ─────────────────────────────────────────────────
cannot handle Array of type decimal
1: expect_equal(iris_tbl %>% spark_apply(function(e) cbind(e, 1), names = c(colnames(iris_tbl), "new")) %>% collect(), iris_tbl

── 15. Error: 'spark_apply' can roundtrip Date-Time (@test-spark-apply.R#188)  ────────────────────────────────────────
sparklyr worker rscript failure, check worker logs for details
    Log: /var/folders/ks/wm_bx4cn70s6h0r5vgqpsldm0000gn/T//RtmpOXR9Td/file1da775bc76cd_spark.log
18/10/25 15:36:32 ERROR sparklyr: RScript (7555) terminated unexpectedly: java.lang.UnsupportedOperationException: Unsupported data type: Date(MILLISECOND)

── 16. Error: 'spark_apply' supports grouped empty results (@test-spark-apply.R#215)  ─────────────────────────────────
sparklyr worker rscript failure, check worker logs for details
    Log: /var/folders/ks/wm_bx4cn70s6h0r5vgqpsldm0000gn/T//RtmpOXR9Td/file1da775bc76cd_spark.log
18/10/25 15:36:44 ERROR sparklyr: RScript (3929) terminated unexpectedly: java.util.NoSuchElementException
	at org.apache.spark.sql.vectorized.ColumnarBatch$1.next(ColumnarBatch.java:65)

── 17. Failure: debug_string works (@test-spark-utils.R#38)  ──────────────────────────────────────────────────────────
grepl("^\\(1\\)", debug[1]) isn't true.

@javierluraschi
Copy link
Collaborator Author

Merged into #1611

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant