Release sparklyr 1.7.8 · sparklyr/sparklyr

Adds new metric extraction functions: ml_metrics_binary(),
ml_metrics_regression() and ml_metrics_multiclass(). They work closer to
how yardstick metric extraction functions work. They expect a table with
the predictions and actual values, and returns a concise tibble with the
metrics. (#3281)
Adds new spark_insert_table() function. This allows one to insert data into
an existing table definition without redefining the table, even when overwriting
the existing data. (#3272 @jimhester)

Restores "validator" functions to regression models. Removing them in a previous
version broke ml_cross_validator() for regression models. (#3273)

Adds support to Spark 3.3 local installation. This includes the ability to
enable and setup log4j version 2. (#3269)
Updates the JSON file that sparklyr uses to find and download Spark for
local use. It is worth mentioning that starting with Spark 3.3, the Hadoop
version number is no longer using a minor version for its download link. So,
instead of requesting 3.2, the version to request is 3.

Removes workaround for older versions of arrow. Bumps arrow version
dependency, from 0.14.0 to 0.17.0 (#3283 @nealrichardson)
Removes code related to backwards compatibility with dbplyr. sparklyr
requires dbplyr version 2.2.1 or above, so the code is no longer needed.
(#3277)
Begins centralizing ML parameter validation into a single function that will
run the proper cast function for each Spark parameter. It also starts using
S3 methods, instead of searching for a concatenated function name, to find the
proper parameter validator. Regression models are the first ones to use this
new method. (#3279)
sparklyr compilation routines have been improved and simplified.
spark_compile() now provides more informative output when used. It also adds
tests to compilation to make sure. It also adds a step to install Scala in the
corresponding GHAs. This is so that the new JAR build tests are able to run.
(#3275)
Stops using package environment variables directly. Any package level variable
will be handled by a genv prefixed function to set and retrieve values. This
avoids the risk of having the exact same variable initialized on more than on
R script. (#3274)
Adds more tests to improve coverage.

Addresses new CRAN HTML check NOTEs. It also adds a new GHA action to run the
same checks to make sure we avoid new issues with this in the future.

Provide feedback