You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use Hive date functions in sparklyr::sdf_sql to manipulate some data, however some of these return errors that the function is not registered in the database. This only occurs after installation of spark-sas7bdat on the cluster. Note that I've duplicated this issue with sparklyr as I'm not sure which team would own this.
Reproducible example below:
>library(sparklyr)
>library(dplyr)
Attachingpackage: ‘dplyr’
Thefollowingobjectsaremaskedfrom ‘package:stats’:filter, lagThefollowingobjectsaremaskedfrom ‘package:base’:intersect, setdiff, setequal, union>sc<- spark_connect(method="databricks")
>dat<-data.frame(person=rep(c(1:3),3), measure=rnorm(9))
>src_tbls(sc)
[1] "netprice_092018""netprice_42018""test_netprice_external"
[4] "test_table">dat<-data.frame(person=rep(c(1:3),3), measure=rnorm(9))
>dat_sparkly<- copy_to(sc, dat, "dat_sparkly") #Gives error, but "dat_sparkly" is sent to Spark (see next command). Same root cause as other errors below?Error:org.apache.spark.sql.AnalysisException:Undefinedfunction:'count'.Thisfunctionisneitheraregisteredtemporaryfunctionnorapermanentfunctionregisteredinthedatabase'default'.; line1pos7 (NOTE:IfyouwishtouseSparkR, importitbycalling'library(SparkR)'.)
#copy_to gives error, however table is correctly sent to Spark:>src_tbls(sc)
[1] "dat_sparkly""netprice_092018""netprice_42018"
[4] "test_netprice_external""test_table">sdf_sql(sc, "select * from dat_sparkly") #Works# Source: spark [?? x 2]personmeasure*11-0.35422-0.19733-0.747410.11852-0.74263-0.43071-2.55820.88693-0.713>sdf_sql(sc, "select current_date from dat_sparkly") #Works# Source: spark [?? x 1]`current_date()`*12018-10-1222018-10-1232018-10-1242018-10-1252018-10-1262018-10-1272018-10-1282018-10-1292018-10-12>sdf_sql(sc, "select date_format(current_date,'E') as week from dat_sparkly") #FAILSError:org.apache.spark.sql.AnalysisException:Undefinedfunction:'date_format'.Thisfunctionisneitheraregisteredtemporaryfunctionnorapermanentfunctionregisteredinthedatabase'default'.; line1pos7
@JordanCuevas can you confirm that you still have this issue with 2.1
Interestingly, we uninstalled a big query package that was installed on the same cluster, after which sparklyr has been working as expected even when sas7bdat was also installed.
I'm trying to use Hive date functions in sparklyr::sdf_sql to manipulate some data, however some of these return errors that the function is not registered in the database. This only occurs after installation of spark-sas7bdat on the cluster. Note that I've duplicated this issue with sparklyr as I'm not sure which team would own this.
Reproducible example below:
Session info below:
The text was updated successfully, but these errors were encountered: