sparklyr 1.7.8
New features
-
Adds new metric extraction functions:
ml_metrics_binary()
,
ml_metrics_regression()
andml_metrics_multiclass()
. They work closer to
howyardstick
metric extraction functions work. They expect a table with
the predictions and actual values, and returns a concisetibble
with the
metrics. (#3281) -
Adds new
spark_insert_table()
function. This allows one to insert data into
an existing table definition without redefining the table, even when overwriting
the existing data. (#3272 @jimhester)
Bug Fixes
- Restores "validator" functions to regression models. Removing them in a previous
version brokeml_cross_validator()
for regression models. (#3273)
Spark
-
Adds support to Spark 3.3 local installation. This includes the ability to
enable and setup log4j version 2. (#3269) -
Updates the JSON file that
sparklyr
uses to find and download Spark for
local use. It is worth mentioning that starting with Spark 3.3, the Hadoop
version number is no longer using a minor version for its download link. So,
instead of requesting 3.2, the version to request is 3.
Internal functionality
-
Removes workaround for older versions of
arrow
. Bumpsarrow
version
dependency, from 0.14.0 to 0.17.0 (#3283 @nealrichardson) -
Removes code related to backwards compatibility with
dbplyr
.sparklyr
requiresdbplyr
version 2.2.1 or above, so the code is no longer needed.
(#3277) -
Begins centralizing ML parameter validation into a single function that will
run the propercast
function for each Spark parameter. It also starts using
S3 methods, instead of searching for a concatenated function name, to find the
proper parameter validator. Regression models are the first ones to use this
new method. (#3279) -
sparklyr
compilation routines have been improved and simplified.
spark_compile()
now provides more informative output when used. It also adds
tests to compilation to make sure. It also adds a step to install Scala in the
corresponding GHAs. This is so that the new JAR build tests are able to run.
(#3275) -
Stops using package environment variables directly. Any package level variable
will be handled by agenv
prefixed function to set and retrieve values. This
avoids the risk of having the exact same variable initialized on more than on
R script. (#3274) -
Adds more tests to improve coverage.
Misc
- Addresses new CRAN HTML check NOTEs. It also adds a new GHA action to run the
same checks to make sure we avoid new issues with this in the future.