New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ft_robust_scaler #2254
Add ft_robust_scaler #2254
Conversation
|
It seems like build on master is older than apache/spark@bb47870 |
| ... | ||
| ) | ||
|
|
||
| if (is_ml_transformer(stage)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like this can ever happen, but since this pattern is repeated all over the code base, I'll keep it for consistency.
8b73595
to
ba49785
Compare
|
Databricks Connect tests failed. View logs here. |
Signed-off-by: zero323 <mszymkiewicz@gmail.com>
Signed-off-by: zero323 <mszymkiewicz@gmail.com>
|
@zero323 Nice! Looks like you fixed the build failure and the change looks good to me. Would you mind signing off your 3 commits (by running Once you signed off all your commits the "DCO check" should pass, and then we can merge your commits into master. |
|
@zero323 ^^ Nah actually never mind. No need to rebase manually. I'll just append the correct sign-off-by field to your commit message which would be easier. Thanks for contributing to |
Add fr_robust_scaler Signed-off-by: zero323 <mszymkiewicz@gmail.com>
* fix typo in test-dplyr-hof.R (#2688) Signed-off-by: Yitao Li <yitao@rstudio.com> * implement `pcre_to_java` and all relevant test cases Signed-off-by: Yitao Li <yitao@rstudio.com> * support POSIX char classes in `sep` parameter of separate.tbl_spark Signed-off-by: Yitao Li <yitao@rstudio.com> * support deterministic sampling outcomes for dplyr::sample_* on Spark dataframes (#2689) Signed-off-by: Yitao Li <yitao@rstudio.com> * NEWS.md update for sparklyr 1.4 release (#2693) Signed-off-by: Yitao Li <yitao@rstudio.com> * also mention `grepl` support in `dplyr` (#2695) Signed-off-by: Yitao Li <yitao@rstudio.com> * implement tidyr::fill functionality for Spark data frame (#2691) Signed-off-by: Yitao Li <yitao@rstudio.com> * update default Spark version for spark_install() (#2696) Signed-off-by: Yitao Li <yitao@rstudio.com> * make `dplyr_sample_*` work with `ft_dplyr_transformer` (#2698) Signed-off-by: Yitao Li <yitao@rstudio.com> * support `ptype` specification in `unnest.tbl_spark` (#2700) Signed-off-by: Yitao Li <yitao@rstudio.com> * fix tidyr-unnest requirement (#2702) Signed-off-by: Yitao Li <yitao@rstudio.com> * update sparklyr_livy_branch (#2704) Signed-off-by: Yitao Li <yitao@rstudio.com> * fix reexports.R (#2706) Signed-off-by: Yitao Li <yitao@rstudio.com> * prepare for sparklyr 1.4.0 release (#2709) Signed-off-by: Yitao Li <yitao@rstudio.com> * Add ft_robust_scaler (#2254) Add fr_robust_scaler Signed-off-by: zero323 <mszymkiewicz@gmail.com> * sdf_quantile() handles multiple columns (#2716) sdf_quantile() handles multiple columns Signed-off-by: wkdavis <william.davis@worthingtonindustries.com> * fix warnings from --as-cran checks (#2715) minor changes to fix warnings from CRAN-related checks Signed-off-by: Yitao Li <yitao@rstudio.com> * fix a bug with grouping vars in nest.tbl_spark (#2720) Signed-off-by: Yitao Li <yitao@rstudio.com> * Avoiding bundle file name collision when session_id is not provided (#2721) * Avoiding bundle file name collision with session_id is not provided * Mon Sep 21 23:26:36 PDT 2020 * Update R/spark_apply_bundle.R Co-authored-by: Yitao Li <yl790@10xeng.ca> * Update R/spark_apply_bundle.R Co-authored-by: Yitao Li <yl790@10xeng.ca> * update package metadata (#2723) Signed-off-by: Yitao Li <yitao@rstudio.com> * skip append-data test on db connect (#2727) Signed-off-by: Yitao Li <yitao@rstudio.com> * fix incorrect column name in `stream_watermark()` (#2728) Signed-off-by: Yitao Li <yitao@rstudio.com> * implement `unnest_wider` functionality for Spark dataframes (#2730) Signed-off-by: Yitao Li <yitao@rstudio.com> * implement `unnest_longer` functionality for Spark dataframes (#2732) Signed-off-by: Yitao Li <yitao@rstudio.com> * update _pkgdown.yml with recent topics and make docs/reference contain only static html content (#2726) update _pkgdown.yml with recent topics and make docs/reference contain only static html content Signed-off-by: Yitao Li <yitao@rstudio.com> * ignore platform-specific date serialization issue on Windows (#2734) Signed-off-by: Yitao Li <yitao@rstudio.com> * remove rjson usage (#2735) Signed-off-by: Yitao Li <yitao@rstudio.com> * revise spark_web impl (#2738) Signed-off-by: Yitao Li <yitao@rstudio.com> * call rstudioapi::translateLocalUrl() when applicable (#2625) Signed-off-by: yl790 <yitao@rstudio.com> * implement the equivalent of dplyr lag() functionality for streaming dataframes (#2739) Signed-off-by: Yitao Li <yitao@rstudio.com> * implement timestamp threshold option for stream_lag() (#2743) Signed-off-by: Yitao Li <yitao@rstudio.com> * switch CI workflow default branch from 'master' to 'main' Signed-off-by: Yitao Li <yitao@rstudio.com> * update CONTRIBUTORS.md (#2748) Signed-off-by: Yitao Li <yitao@rstudio.com> * remove GitHub CI workflow for R 3.2.5 (#2749) Signed-off-by: Yitao Li <yitao@rstudio.com> * replace 'master' with 'main' in jenkins config file (#2746) Signed-off-by: Yitao Li <yitao@rstudio.com> * skip unsupported test on windows (#2751) Signed-off-by: Yitao Li <yitao@rstudio.com> * improve sparklyr serialization routines Signed-off-by: Yitao Li <yitao@rstudio.com> Co-authored-by: Maciej <zero323@users.noreply.github.com> Co-authored-by: Wil Davis <william.davis@worthingtonindustries.com> Co-authored-by: Hossein Falaki <falaki@gmail.com>
This PR add
ft_robust_scaleras a wrapper forRobustScaler‒ SPARK-28399 ‒ thatIt is applicable for Spark >= 3.0.0
Signed-off-by: zero323 mszymkiewicz@gmail.com