Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive SQL functions not registered when called through sparklyr #39

Closed
JordanCuevas opened this issue Oct 13, 2018 · 2 comments
Closed

Comments

@JordanCuevas
Copy link

JordanCuevas commented Oct 13, 2018

I'm trying to use Hive date functions in sparklyr::sdf_sql to manipulate some data, however some of these return errors that the function is not registered in the database. This only occurs after installation of spark-sas7bdat on the cluster. Note that I've duplicated this issue with sparklyr as I'm not sure which team would own this.
Reproducible example below:


>library(sparklyr)
>library(dplyr)
Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

>sc <- spark_connect(method="databricks")
>dat <- data.frame(person=rep(c(1:3),3), measure=rnorm(9))
>src_tbls(sc)
[1] "netprice_092018"        "netprice_42018"         "test_netprice_external"
[4] "test_table"  

>dat <- data.frame(person=rep(c(1:3),3), measure=rnorm(9))
>dat_sparkly <- copy_to(sc, dat, "dat_sparkly") #Gives error, but "dat_sparkly" is sent to Spark (see next command). Same root cause as other errors below?
Error : org.apache.spark.sql.AnalysisException: Undefined function: 'count'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 (NOTE: If you wish to use SparkR, import it by calling 'library(SparkR)'.)

#copy_to gives error, however table is correctly sent to Spark:
>src_tbls(sc)
[1] "dat_sparkly"            "netprice_092018"        "netprice_42018"        
[4] "test_netprice_external" "test_table"

>sdf_sql(sc, "select * from dat_sparkly") #Works
# Source: spark [?? x 2]
  person measure
*     
1      1  -0.354
2      2  -0.197
3      3  -0.747
4      1   0.118
5      2  -0.742
6      3  -0.430
7      1  -2.55 
8      2   0.886
9      3  -0.713

>sdf_sql(sc, "select current_date from dat_sparkly") #Works
# Source: spark [?? x 1]
  `current_date()`
*           
1 2018-10-12      
2 2018-10-12      
3 2018-10-12      
4 2018-10-12      
5 2018-10-12      
6 2018-10-12      
7 2018-10-12      
8 2018-10-12      
9 2018-10-12 

>sdf_sql(sc, "select date_format(current_date,'E') as week from dat_sparkly") #FAILS

Error : org.apache.spark.sql.AnalysisException: Undefined function: 'date_format'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7

Session info below:

>devtools::session_info()
Session info ------------------------------------------------------------------
Packages ----------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Etc/UTC                     
 date     2018-10-12                  

 package       * version date       source        
 assertthat      0.2.0   2017-04-11 CRAN (R 3.4.4)
 backports       1.1.2   2017-12-13 CRAN (R 3.4.4)
 base          * 3.4.4   2018-03-16 local         
 base64enc       0.1-3   2015-07-28 CRAN (R 3.4.4)
 bindr           0.1.1   2018-03-13 CRAN (R 3.4.4)
 bindrcpp        0.2.2   2018-03-29 CRAN (R 3.4.4)
 broom           0.4.4   2018-03-29 CRAN (R 3.4.4)
 cli             1.0.0   2017-11-05 CRAN (R 3.4.4)
 compiler        3.4.4   2018-03-16 local         
 config          0.3     2018-03-27 CRAN (R 3.4.4)
 crayon          1.3.4   2017-09-16 CRAN (R 3.4.4)
 datasets      * 3.4.4   2018-03-16 local         
 DBI             0.8     2018-03-02 CRAN (R 3.4.4)
 dbplyr          1.2.2   2018-07-25 CRAN (R 3.4.4)
 devtools        1.13.5  2018-02-18 CRAN (R 3.4.4)
 digest          0.6.15  2018-01-28 CRAN (R 3.4.4)
 dplyr         * 0.7.4   2017-09-28 CRAN (R 3.4.4)
 foreign         0.8-70  2018-04-23 CRAN (R 3.4.4)
 forge           0.1.0   2018-08-31 CRAN (R 3.4.4)
 glue            1.2.0   2017-10-29 CRAN (R 3.4.4)
 graphics      * 3.4.4   2018-03-16 local         
 grDevices     * 3.4.4   2018-03-16 local         
 grid            3.4.4   2018-03-16 local         
 htmltools       0.3.6   2017-04-28 CRAN (R 3.4.4)
 htmlwidgets     1.3     2018-09-30 CRAN (R 3.4.4)
 httpuv          1.4.5   2018-07-19 CRAN (R 3.4.4)
 httr            1.3.1   2017-08-20 CRAN (R 3.4.4)
 hwriter         1.3.2   2014-09-10 CRAN (R 3.4.4)
 hwriterPlus     1.0-3   2015-01-05 CRAN (R 3.4.4)
 jsonlite        1.5     2017-06-01 CRAN (R 3.4.4)
 later           0.7.5   2018-09-18 CRAN (R 3.4.4)
 lattice         0.20-35 2017-03-25 CRAN (R 3.3.3)
 lazyeval        0.2.1   2017-10-29 CRAN (R 3.4.4)
 magrittr        1.5     2014-11-22 CRAN (R 3.4.4)
 memoise         1.1.0   2017-04-21 CRAN (R 3.4.4)
 methods       * 3.4.4   2018-03-16 local         
 mime            0.5     2016-07-07 CRAN (R 3.4.4)
 mnormt          1.5-5   2016-10-15 CRAN (R 3.4.4)
 nlme            3.1-137 2018-04-07 CRAN (R 3.4.4)
 parallel        3.4.4   2018-03-16 local         
 pillar          1.2.1   2018-02-27 CRAN (R 3.4.4)
 pkgconfig       2.0.1   2017-03-21 CRAN (R 3.4.4)
 plyr            1.8.4   2016-06-08 CRAN (R 3.4.4)
 promises        1.0.1   2018-04-13 CRAN (R 3.4.4)
 psych           1.8.3.3 2018-03-30 CRAN (R 3.4.4)
 purrr           0.2.4   2017-10-18 CRAN (R 3.4.4)
 r2d3            0.2.2   2018-05-30 CRAN (R 3.4.4)
 R6              2.2.2   2017-06-17 CRAN (R 3.4.4)
 Rcpp            0.12.16 2018-03-13 CRAN (R 3.4.4)
 reshape2        1.4.3   2017-12-11 CRAN (R 3.4.4)
 rlang           0.2.0   2018-02-20 CRAN (R 3.4.4)
 rprojroot       1.3-2   2018-01-03 CRAN (R 3.4.4)
 Rserve          1.7-3   2013-08-21 CRAN (R 3.4.4)
 rstudioapi      0.7     2017-09-07 CRAN (R 3.4.4)
 shiny           1.1.0   2018-05-17 CRAN (R 3.4.4)
 sparklyr      * 0.9.1   2018-09-27 CRAN (R 3.4.4)
 SparkR          2.3.1   2018-10-12 local         
 stats         * 3.4.4   2018-03-16 local         
 stringi         1.1.7   2018-03-12 CRAN (R 3.4.4)
 stringr         1.3.0   2018-02-19 CRAN (R 3.4.4)
 TeachingDemos   2.10    2016-02-12 CRAN (R 3.4.4)
 tibble          1.4.2   2018-01-22 CRAN (R 3.4.4)
 tidyr           0.8.0   2018-01-29 CRAN (R 3.4.4)
 tools           3.4.4   2018-03-16 local         
 utf8            1.1.3   2018-01-03 CRAN (R 3.4.4)
 utils         * 3.4.4   2018-03-16 local         
 withr           2.1.2   2018-03-15 CRAN (R 3.4.4)
 xtable          1.8-3   2018-08-29 CRAN (R 3.4.4)
 yaml            2.2.0   2018-07-25 CRAN (R 3.4.4)
@thesuperzapper
Copy link
Collaborator

@JordanCuevas can you confirm that you still have this issue with 2.1

@JordanCuevas
Copy link
Author

@JordanCuevas can you confirm that you still have this issue with 2.1

Interestingly, we uninstalled a big query package that was installed on the same cluster, after which sparklyr has been working as expected even when sas7bdat was also installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants