Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive date functions throw errors in sdf_sql #1711

Open
JordanCuevas opened this issue Oct 12, 2018 · 3 comments
Open

Hive date functions throw errors in sdf_sql #1711

JordanCuevas opened this issue Oct 12, 2018 · 3 comments

Comments

@JordanCuevas
Copy link

I'm trying to use Hive date functions in sdf_sql to manipulate some data, however some of these return errors that the function is not registered in the database. Reproducible example below:


>library(sparklyr)
>library(dplyr)
Attaching package:dplyrThe following objects are masked frompackage:stats:

    filter, lag

The following objects are masked frompackage:base:

    intersect, setdiff, setequal, union

>sc <- spark_connect(method="databricks")
>dat <- data.frame(person=rep(c(1:3),3), measure=rnorm(9))
>src_tbls(sc)
[1] "netprice_092018"        "netprice_42018"         "test_netprice_external"
[4] "test_table"  

>dat <- data.frame(person=rep(c(1:3),3), measure=rnorm(9))
>dat_sparkly <- copy_to(sc, dat, "dat_sparkly") #Gives error, but "dat_sparkly" is sent to Spark (see next command). Same root cause as other errors below?
Error : org.apache.spark.sql.AnalysisException: Undefined function: 'count'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 (NOTE: If you wish to use SparkR, import it by calling 'library(SparkR)'.)

#copy_to gives error, however table is correctly sent to Spark:
>src_tbls(sc)
[1] "dat_sparkly"            "netprice_092018"        "netprice_42018"        
[4] "test_netprice_external" "test_table"

>sdf_sql(sc, "select * from dat_sparkly") #Works
# Source: spark [?? x 2]
  person measure
*     
1      1  -0.354
2      2  -0.197
3      3  -0.747
4      1   0.118
5      2  -0.742
6      3  -0.430
7      1  -2.55 
8      2   0.886
9      3  -0.713

>sdf_sql(sc, "select current_date from dat_sparkly") #Works
# Source: spark [?? x 1]
  `current_date()`
*           
1 2018-10-12      
2 2018-10-12      
3 2018-10-12      
4 2018-10-12      
5 2018-10-12      
6 2018-10-12      
7 2018-10-12      
8 2018-10-12      
9 2018-10-12 

>sdf_sql(sc, "select date_format(current_date,'E') as week from dat_sparkly") #FAILS

Error : org.apache.spark.sql.AnalysisException: Undefined function: 'date_format'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7

Session info below:

>devtools::session_info()
Session info ------------------------------------------------------------------
Packages ----------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.4 (2018-03-15)
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       Etc/UTC                     
 date     2018-10-12                  

 package       * version date       source        
 assertthat      0.2.0   2017-04-11 CRAN (R 3.4.4)
 backports       1.1.2   2017-12-13 CRAN (R 3.4.4)
 base          * 3.4.4   2018-03-16 local         
 base64enc       0.1-3   2015-07-28 CRAN (R 3.4.4)
 bindr           0.1.1   2018-03-13 CRAN (R 3.4.4)
 bindrcpp        0.2.2   2018-03-29 CRAN (R 3.4.4)
 broom           0.4.4   2018-03-29 CRAN (R 3.4.4)
 cli             1.0.0   2017-11-05 CRAN (R 3.4.4)
 compiler        3.4.4   2018-03-16 local         
 config          0.3     2018-03-27 CRAN (R 3.4.4)
 crayon          1.3.4   2017-09-16 CRAN (R 3.4.4)
 datasets      * 3.4.4   2018-03-16 local         
 DBI             0.8     2018-03-02 CRAN (R 3.4.4)
 dbplyr          1.2.2   2018-07-25 CRAN (R 3.4.4)
 devtools        1.13.5  2018-02-18 CRAN (R 3.4.4)
 digest          0.6.15  2018-01-28 CRAN (R 3.4.4)
 dplyr         * 0.7.4   2017-09-28 CRAN (R 3.4.4)
 foreign         0.8-70  2018-04-23 CRAN (R 3.4.4)
 forge           0.1.0   2018-08-31 CRAN (R 3.4.4)
 glue            1.2.0   2017-10-29 CRAN (R 3.4.4)
 graphics      * 3.4.4   2018-03-16 local         
 grDevices     * 3.4.4   2018-03-16 local         
 grid            3.4.4   2018-03-16 local         
 htmltools       0.3.6   2017-04-28 CRAN (R 3.4.4)
 htmlwidgets     1.3     2018-09-30 CRAN (R 3.4.4)
 httpuv          1.4.5   2018-07-19 CRAN (R 3.4.4)
 httr            1.3.1   2017-08-20 CRAN (R 3.4.4)
 hwriter         1.3.2   2014-09-10 CRAN (R 3.4.4)
 hwriterPlus     1.0-3   2015-01-05 CRAN (R 3.4.4)
 jsonlite        1.5     2017-06-01 CRAN (R 3.4.4)
 later           0.7.5   2018-09-18 CRAN (R 3.4.4)
 lattice         0.20-35 2017-03-25 CRAN (R 3.3.3)
 lazyeval        0.2.1   2017-10-29 CRAN (R 3.4.4)
 magrittr        1.5     2014-11-22 CRAN (R 3.4.4)
 memoise         1.1.0   2017-04-21 CRAN (R 3.4.4)
 methods       * 3.4.4   2018-03-16 local         
 mime            0.5     2016-07-07 CRAN (R 3.4.4)
 mnormt          1.5-5   2016-10-15 CRAN (R 3.4.4)
 nlme            3.1-137 2018-04-07 CRAN (R 3.4.4)
 parallel        3.4.4   2018-03-16 local         
 pillar          1.2.1   2018-02-27 CRAN (R 3.4.4)
 pkgconfig       2.0.1   2017-03-21 CRAN (R 3.4.4)
 plyr            1.8.4   2016-06-08 CRAN (R 3.4.4)
 promises        1.0.1   2018-04-13 CRAN (R 3.4.4)
 psych           1.8.3.3 2018-03-30 CRAN (R 3.4.4)
 purrr           0.2.4   2017-10-18 CRAN (R 3.4.4)
 r2d3            0.2.2   2018-05-30 CRAN (R 3.4.4)
 R6              2.2.2   2017-06-17 CRAN (R 3.4.4)
 Rcpp            0.12.16 2018-03-13 CRAN (R 3.4.4)
 reshape2        1.4.3   2017-12-11 CRAN (R 3.4.4)
 rlang           0.2.0   2018-02-20 CRAN (R 3.4.4)
 rprojroot       1.3-2   2018-01-03 CRAN (R 3.4.4)
 Rserve          1.7-3   2013-08-21 CRAN (R 3.4.4)
 rstudioapi      0.7     2017-09-07 CRAN (R 3.4.4)
 shiny           1.1.0   2018-05-17 CRAN (R 3.4.4)
 sparklyr      * 0.9.1   2018-09-27 CRAN (R 3.4.4)
 SparkR          2.3.1   2018-10-12 local         
 stats         * 3.4.4   2018-03-16 local         
 stringi         1.1.7   2018-03-12 CRAN (R 3.4.4)
 stringr         1.3.0   2018-02-19 CRAN (R 3.4.4)
 TeachingDemos   2.10    2016-02-12 CRAN (R 3.4.4)
 tibble          1.4.2   2018-01-22 CRAN (R 3.4.4)
 tidyr           0.8.0   2018-01-29 CRAN (R 3.4.4)
 tools           3.4.4   2018-03-16 local         
 utf8            1.1.3   2018-01-03 CRAN (R 3.4.4)
 utils         * 3.4.4   2018-03-16 local         
 withr           2.1.2   2018-03-15 CRAN (R 3.4.4)
 xtable          1.8-3   2018-08-29 CRAN (R 3.4.4)
 yaml            2.2.0   2018-07-25 CRAN (R 3.4.4)
@JordanCuevas
Copy link
Author

We actually just started a new cluster and reinstalled sparklyr and this issue no longer exists (not sure why). I will close this issue.

@JordanCuevas
Copy link
Author

Reopening because we found that the issue reappeared after installation of the spark-sas7bdat package. I will open an issue with them as well.

@kevinykuo
Copy link
Collaborator

I can't repro with local connection so I suspect this is specific to databricks

cc @falaki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants