Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with nested fields in spark_apply #3242

Open
bhogan-mitre opened this issue Mar 25, 2022 · 0 comments
Open

error with nested fields in spark_apply #3242

bhogan-mitre opened this issue Mar 25, 2022 · 0 comments

Comments

@bhogan-mitre
Copy link

I'm running into issues with spark_apply and nested columns. For example the snippet below produces the following error.

Error: org.apache.spark.sql.AnalysisException: cannot resolve 'from_json(vals)' due to data type mismatch: Input schema bigint must be a struct, an array or a map.;
'Project [a#152, b#153, from_json(LongType, vals#154, Some(America/New_York)) AS vals#201, d#155]

I'm curious where the Some(America/New_York) piece comes from given that this is an array of integers.

The error appears to be an issue with serialization of nested columns, vals in this case, even though spark_apply is just passing that column through and not trying to operate on it. The NA value in the field b that is used in the calculation seems to trigger the issue.

library(sparklyr)
library(dplyr)

spark_version <- "3.2.1"
sc <- spark_connect(master = "local", version = spark_version)

tribble(
  ~a, ~b,           ~c,
   1,  NA_integer_,  1,
   1,  1,            2,
   1,  1,            3,
   2,  2,            1,
   2,  2,            2,
) %>% 
  copy_to(sc, df = ., name = "test_sdf1", overwrite = TRUE) %>% 
  group_by(a, b) %>%
  summarise(vals = collect_list(c), .groups = "drop") %>% 
  arrange(a, b) %>% 
  spark_apply(
    function(df) {
      library(dplyr)
      
      df %>% 
        mutate(
          d = b * 2
        )
    }
  )

Screen Shot 2022-03-25 at 10 46 06 AM

On the other hand, the same calculation without the NA value runs through okay.

Screen Shot 2022-03-25 at 4 44 11 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant