Databricks connection should be using SparkSession not HiveContext #1921

MrBago · 2019-02-20T21:31:09Z

It seems that sparklyr is expect sc$state$hive_context to be a SparkSession for spark 2.x. This PR updates the "hive_context" to use the same SparkSession as the notebook or rstudio environment.

This PR brings the databricks connection in line with what's done for spark_shell_connection, https://github.com/rstudio/sparklyr/blob/e05190e953d313483660af6a2c6a16f4ae50fb86/R/shell_connection.R#L595-L608

falaki · 2019-02-20T21:48:52Z

@javierluraschi we tested this on Databricks and seems to resolve the issue our customers observed when using ml_load() in multiple notebooks.

Use spark-session for sc.

79b571a

javierluraschi merged commit 90466eb into sparklyr:master Feb 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks connection should be using SparkSession not HiveContext #1921

Databricks connection should be using SparkSession not HiveContext #1921

MrBago commented Feb 20, 2019 •

edited

falaki commented Feb 20, 2019

Databricks connection should be using SparkSession not HiveContext #1921

Databricks connection should be using SparkSession not HiveContext #1921

Conversation

MrBago commented Feb 20, 2019 • edited

falaki commented Feb 20, 2019

MrBago commented Feb 20, 2019 •

edited