Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Background execution and R process pool #311

Merged
merged 79 commits into from
Aug 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
d577bc4
Prototype execution in background thread
Sicheng-Pan Jun 25, 2023
9918a64
Refactor rthreadhandle prototype
Sicheng-Pan Jun 26, 2023
cb41db2
Prototype background R process
Sicheng-Pan Jul 2, 2023
eae0ebc
Made a functional R subprocess!
Sicheng-Pan Jul 2, 2023
9b1ad25
Refactor code for background handler
Sicheng-Pan Jul 3, 2023
18f98c1
Implement R process pool
Sicheng-Pan Jul 4, 2023
f907a22
Merge branch 'main' into lazy_in_background
Sicheng-Pan Jul 4, 2023
fdee6b5
Attempt to (de)serialize series from polars
Sicheng-Pan Jul 4, 2023
4f5886b
Rename R background module file
Sicheng-Pan Jul 4, 2023
fd3a9fb
Implement expr and lf
Sicheng-Pan Jul 5, 2023
4c8e06c
Merge branch 'main' into lazy_in_background
Sicheng-Pan Jul 5, 2023
41f180a
Implement R wrappers for thread handle and global R pool config
Sicheng-Pan Jul 5, 2023
020f902
Use private function to handle request in subprocess
Sicheng-Pan Jul 5, 2023
513fee6
Fix handle display
Sicheng-Pan Jul 6, 2023
717ecd7
Fix error handling in background process
Sicheng-Pan Jul 6, 2023
44799fc
Fix more error handling issues
Sicheng-Pan Jul 6, 2023
5b9ffed
Use shared memory for IPC communication
Sicheng-Pan Jul 11, 2023
6452598
make build export NOT_CRAN=true
sorhawell Jul 11, 2023
c60e596
functions to load and check polars in an R session
sorhawell Jul 11, 2023
49181f7
now it work
sorhawell Jul 11, 2023
ecd7199
update tip to readme
sorhawell Jul 11, 2023
0100441
add test_serde_df add benchmark script
sorhawell Jul 11, 2023
c68a443
Merge branch 'main' into lazy_in_background
Sicheng-Pan Jul 11, 2023
3c1e045
Minor refactor
Sicheng-Pan Jul 11, 2023
07d150d
rename
sorhawell Jul 12, 2023
397c4dd
add build and submit_polars + fix minors
sorhawell Jul 12, 2023
196e548
Merge remote-tracking branch 'origin/not_cran_dev_builds' into lazy_i…
Sicheng-Pan Jul 15, 2023
4557b1b
Implement apply-in-background
Sicheng-Pan Jul 15, 2023
daa90c8
Write unit tests
Sicheng-Pan Jul 15, 2023
78a03c8
Benchmark rbackground
Sicheng-Pan Jul 16, 2023
d990822
add expectations print + is_finished handle
sorhawell Jul 17, 2023
7a02517
add more scenarios
sorhawell Jul 17, 2023
f4b1f0f
Improve display of joined handle
Sicheng-Pan Jul 18, 2023
57def1f
RThreadHandle_is_finished.Rd
sorhawell Jul 18, 2023
3fee2c5
try remove NOT_CRAN
sorhawell Jul 18, 2023
d0758f0
try quote R -e arg
sorhawell Jul 18, 2023
3cbaceb
remove R native pipe operator
sorhawell Jul 18, 2023
5f05fce
try not use arg method
sorhawell Jul 18, 2023
0694664
write failed cmd string in error
sorhawell Jul 18, 2023
4ef435e
drop cmd_string
sorhawell Jul 18, 2023
c7653c2
try add line ending
sorhawell Jul 19, 2023
88d8b52
try not redirect std
sorhawell Jul 19, 2023
88048f1
dunno
sorhawell Jul 19, 2023
3d81fcf
Revert "RThreadHandle_is_finished.Rd"
sorhawell Jul 19, 2023
0781242
revert experiments
sorhawell Jul 19, 2023
65bb48f
Try std::env::current_exe
Sicheng-Pan Jul 20, 2023
ca17fa3
Merge remote-tracking branch 'origin/main' into lazy_in_background
Sicheng-Pan Jul 23, 2023
b03de39
Update extendr
Sicheng-Pan Jul 23, 2023
00c2b4d
Try environment variable
Sicheng-Pan Jul 23, 2023
97b2fb9
change low io high cpu example
sorhawell Jul 23, 2023
980abe8
update benchmark
sorhawell Jul 24, 2023
bd15917
improve Expr_map doc
sorhawell Jul 24, 2023
d21a881
collect_in_background docs
sorhawell Jul 24, 2023
75d0252
update oxygen
sorhawell Jul 24, 2023
c2b6237
super minor, Rctx::Handled
sorhawell Jul 24, 2023
0dee41a
Track R environment in Rust
Sicheng-Pan Jul 25, 2023
d4dbb5f
Remove old PolarsBackgroundHandler
Sicheng-Pan Jul 27, 2023
d9e2dab
chore rextendr 0.3.1.9000 nanoarrow .Rd
sorhawell Jul 27, 2023
388297a
avoid Rctx.into() -> RpolarsErr
sorhawell Jul 27, 2023
87adf0d
Merge apply_in_background to apply
Sicheng-Pan Jul 30, 2023
a2c30ae
Improve background error handling
Sicheng-Pan Jul 30, 2023
13bd3c7
make: redefine install, add install to docs
sorhawell Jul 30, 2023
d0f1765
merge main
sorhawell Aug 1, 2023
ecd6a7c
Merge branch 'main' into lazy_in_background
sorhawell Aug 1, 2023
bc4bb4c
more docs + fix RBackgroundPool
sorhawell Aug 1, 2023
08d8eab
impl thread queue in RBackGroundPool
sorhawell Aug 2, 2023
4fea62d
spawn processes in paralell
sorhawell Aug 2, 2023
2777c94
add test 3d to benchmark
sorhawell Aug 2, 2023
95122a5
increase default pool size to 4
sorhawell Aug 2, 2023
bc12766
update unit test
sorhawell Aug 2, 2023
1cdefe4
add parallel examples to map and apply
sorhawell Aug 7, 2023
1616095
merge main + update docs
sorhawell Aug 7, 2023
90d8be7
merge main + update news + roxygen
sorhawell Aug 8, 2023
afee8a9
polish news [skip ci]
sorhawell Aug 8, 2023
61cc469
add links
sorhawell Aug 8, 2023
8af927f
try rename all-features to full features
sorhawell Aug 8, 2023
46058db
fmt
sorhawell Aug 8, 2023
f352e0b
merge main + solve conflicts + docs + fix a utests
sorhawell Aug 9, 2023
b6916e4
fmt + roxygen
sorhawell Aug 9, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/actions/setup/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ runs:
libssl-dev

- name: Set up Rust nightly toolchain
if: inputs.rust-nightly == 'true' || env.RPOLARS_ALL_FEATURES == 'true'
if: inputs.rust-nightly == 'true' || env.RPOLARS_FULL_FEATURES == 'true'
shell: bash
run: |
make requirements-rs
Expand Down
14 changes: 7 additions & 7 deletions .github/workflows/check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ env:
jobs:
R-CMD-check:
runs-on: ${{ matrix.config.os }}
name: ${{ matrix.config.os }} (${{ matrix.config.r }}) ${{ matrix.config.all-features && 'all-features' || '' }}
name: ${{ matrix.config.os }} (${{ matrix.config.r }}) ${{ matrix.config.full-features && 'full-features' || '' }}
strategy:
fail-fast: false
matrix:
Expand All @@ -30,9 +30,9 @@ jobs:
- { os: ubuntu-latest, r: "release" }
- { os: ubuntu-latest, r: "oldrel-1" }
include:
- config: { os: macos-latest, r: "release", all-features: true }
- config: { os: windows-latest, r: "release", all-features: true }
- config: { os: ubuntu-latest, r: "release", all-features: true }
- config: { os: macos-latest, r: "release", full-features: true }
- config: { os: windows-latest, r: "release", full-features: true }
- config: { os: ubuntu-latest, r: "release", full-features: true }

env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
Expand All @@ -46,7 +46,7 @@ jobs:

- uses: ./.github/actions/setup
with:
rust-nightly: "${{ matrix.config.all-features }}"
rust-nightly: "${{ matrix.config.full-features }}"

- name: print files
run: print(list.files("..",recursive = TRUE,full.names=TRUE))
Expand Down Expand Up @@ -77,10 +77,10 @@ jobs:
echo "RPOLARS_RUST_SOURCE=${PWD}/src/rust" >> $GITHUB_ENV

- name: Set env vars for build option
if: matrix.config.all-features
if: matrix.config.full-features
shell: bash
run: |
echo "RPOLARS_ALL_FEATURES=true" >>$GITHUB_ENV
echo "RPOLARS_FULL_FEATURES=true" >>$GITHUB_ENV
echo "RPOLARS_PROFILE=release-optimized" >>$GITHUB_ENV

- uses: r-lib/actions/check-r-package@v2
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ concurrency:
cancel-in-progress: true

env:
RPOLARS_ALL_FEATURES: "true"
RPOLARS_FULL_FEATURES: "true"
RPOLARS_CARGO_CLEAN_DEPS: "true"
RPOLARS_PROFILE: release-optimized

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ concurrency:
cancel-in-progress: true

env:
RPOLARS_ALL_FEATURES: "true"
RPOLARS_FULL_FEATURES: "true"
RPOLARS_CARGO_CLEAN_DEPS: "true"
RPOLARS_PROFILE: release-optimized

Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -83,14 +83,14 @@ Collate:
'groupby.R'
'info.R'
'ipc.R'
'lazyframe__background.R'
'lazyframe__groupby.R'
'lazyframe__lazy.R'
'namespace.R'
'options.R'
'parquet.R'
'pkg-knitr.R'
'pkg-nanoarrow.R'
'rbackground.R'
'rlang.R'
'rust_result.R'
's3_methods.R'
Expand Down
14 changes: 7 additions & 7 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,19 @@ requirements-rs:

.PHONY: build
build: ## Compile polars R package with all features and generate Rd files
export RPOLARS_ALL_FEATURES=true \
&& export RPOLARS_PROFILE=release-optimized \
export RPOLARS_FULL_FEATURES=true \
&& Rscript -e 'if (!(require(arrow)&&require(nanoarrow))) warning("could not load arrow/nanoarrow, igonore changes to nanoarrow.Rd"); rextendr::document()'

.PHONY: install
install:
export RPOLARS_FULL_FEATURES=true \
&& R CMD INSTALL --no-multiarch --with-keep.source .

.PHONY: all
all: fmt build test README.md LICENSE.note ## build -> test -> Update README.md, LICENSE.note

.PHONY: docs
docs: build README.md docs/docs/reference_home.md ## Generate docs
docs: build install README.md docs/docs/reference_home.md ## Generate docs
cp docs/mkdocs.orig.yml docs/mkdocs.yml
Rscript -e 'altdoc::update_docs(custom_reference = "docs/make-docs.R")'
cd docs && ../$(VENV_BIN)/python3 -m mkdocs build
Expand All @@ -76,10 +80,6 @@ LICENSE.note: src/rust/Cargo.lock ## Update LICENSE.note
test: build ## Run fast unittests
Rscript -e 'devtools::load_all(); devtools::test()'

.PHONY: install
install: ## Install this R package locally
Rscript -e 'devtools::install(pkg = ".", dependencies = TRUE)'

.PHONY: fmt
fmt: fmt-rs fmt-r ## Format files

Expand Down
9 changes: 5 additions & 4 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ S3method("$",FeatureInfo)
S3method("$",GroupBy)
S3method("$",LazyFrame)
S3method("$",LazyGroupBy)
S3method("$",PolarsBackgroundHandle)
S3method("$",ProtoExprArray)
S3method("$",RField)
S3method("$",RNullValues)
S3method("$",RPolarsDataType)
S3method("$",RPolarsErr)
S3method("$",RThreadHandle)
S3method("$",Series)
S3method("$",VecDataFrame)
S3method("$",When)
Expand Down Expand Up @@ -63,12 +63,12 @@ S3method("[[",FeatureInfo)
S3method("[[",GroupBy)
S3method("[[",LazyFrame)
S3method("[[",LazyGroupBy)
S3method("[[",PolarsBackgroundHandle)
S3method("[[",ProtoExprArray)
S3method("[[",RField)
S3method("[[",RNullValues)
S3method("[[",RPolarsDataType)
S3method("[[",RPolarsErr)
S3method("[[",RThreadHandle)
S3method("[[",Series)
S3method("[[",VecDataFrame)
S3method("[[",When)
Expand All @@ -80,9 +80,9 @@ S3method(.DollarNames,DataFrame)
S3method(.DollarNames,Expr)
S3method(.DollarNames,GroupBy)
S3method(.DollarNames,LazyFrame)
S3method(.DollarNames,PolarsBackgroundHandle)
S3method(.DollarNames,RField)
S3method(.DollarNames,RPolarsErr)
S3method(.DollarNames,RThreadHandle)
S3method(.DollarNames,Series)
S3method(.DollarNames,VecDataFrame)
S3method(.DollarNames,When)
Expand All @@ -91,6 +91,7 @@ S3method(.DollarNames,WhenThenThen)
S3method(.DollarNames,method_environment)
S3method(.DollarNames,polars_option_list)
S3method(as.character,RPolarsErr)
S3method(as.character,RThreadHandle)
S3method(as.character,Series)
S3method(as.data.frame,DataFrame)
S3method(as.data.frame,LazyFrame)
Expand Down Expand Up @@ -128,10 +129,10 @@ S3method(print,GroupBy)
S3method(print,LazyFrame)
S3method(print,LazyGroupBy)
S3method(print,PTime)
S3method(print,PolarsBackgroundHandle)
S3method(print,RField)
S3method(print,RPolarsDataType)
S3method(print,RPolarsErr)
S3method(print,RThreadHandle)
S3method(print,Series)
S3method(print,When)
S3method(print,WhenThen)
Expand Down
14 changes: 12 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,30 @@

## BREAKING CHANGES
- `$rpow()` is removed. It should never have been translated. Use `^` and `$pow()` instead (#346).
- `<LazyFrame>$collect_background()` renamed `<LazyFrame>$collect_in_background()` and reworked.
Likewise `PolarsBackgroundHandle` reworked and renamed to `RThreadHandle` (#311).

## What's changed

- New method `$explode()` for `DataFrame` and `LazyFrame` (#314).
- New method `$clone()` for `LazyFrame` (#347).
- `$with_column()` is now deprecated (following upstream `polars`). It will be
removed in 0.9.0. It should be replaced with `$with_columns()` (#313).
- New lazy function translated: `concat_str()` to concatenate several columns
into one (#349).
- New stat functions `pl$cov()`, `pl$rolling_cov()` `pl$corr()`, `pl$rolling_corr()` (#351).
- Add functions `pl$set_global_rpool_cap()`, `pl$get_global_rpool_cap()`, class `RThreadHandle` and
`in_background = FALSE` param to `<Expr>$map()` and `$apply()`. It is now possible to run R code
with `<LazyFrame>collect_in_background()` and/or let polars parallize R code in an R processes
pool. See `RThreadHandle-class` in reference docs for more info. (#311)
- Internal IPC/shared-mem channel to serialize and send R objects / polars DataFrame across
R processes. (#311)
- Compile environment flag RPOLARS_ALL_FEATURES changes name to RPOLARS_FULL_FEATURES. If 'true'
will trigger something like `Cargo build --features "full_features"` which is not exactly the same
as `Cargo build --all-features`. Some dev features are not included in "full_features" (#311).
- Fix bug to allow using polars without library(polars) (#355).
- New methods `<LazyFrame>$optimization_toggle()` + `$profile()` and enable rust-polars feature
CSE: "Activate common subplan elimination optimization" (#323)

# polars 0.7.0

## BREAKING CHANGES
Expand Down
9 changes: 4 additions & 5 deletions R/after-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,6 @@ extendr_method_to_pure_functions = function(env, class_name = NULL) {
.pr$GroupBy = NULL # derived from DataFrame in R, has no rust calls
.pr$LazyFrame = extendr_method_to_pure_functions(LazyFrame)
.pr$LazyGroupBy = extendr_method_to_pure_functions(LazyGroupBy)
.pr$PolarsBackgroundHandle = extendr_method_to_pure_functions(PolarsBackgroundHandle)
.pr$DataType = extendr_method_to_pure_functions(RPolarsDataType)
.pr$DataTypeVector = extendr_method_to_pure_functions(DataTypeVector)
.pr$RField = extendr_method_to_pure_functions(RField)
Expand All @@ -94,6 +93,7 @@ extendr_method_to_pure_functions = function(env, class_name = NULL) {
.pr$VecDataFrame = extendr_method_to_pure_functions(VecDataFrame)
.pr$RNullValues = extendr_method_to_pure_functions(RNullValues)
.pr$RPolarsErr = extendr_method_to_pure_functions(RPolarsErr)
.pr$RThreadHandle = extendr_method_to_pure_functions(RThreadHandle)



Expand Down Expand Up @@ -223,9 +223,8 @@ pl$show_all_public_functions = function() {
#' @examples
#' pl$show_all_public_methods()
pl$show_all_public_methods = function(class_names = NULL) {

#subset classes to show
show_this_env = if(!is.null(class_names)) {
# subset classes to show
show_this_env = if (!is.null(class_names)) {
as.environment(mget(class_names, envir = pl_pub_class_env))
} else {
pl_pub_class_env
Expand Down Expand Up @@ -265,7 +264,7 @@ DataType = clone_env_one_level_deep(RPolarsDataType)
# used for printing public environment
pl_class_names = sort(
c(
"LazyFrame", "Series", "LazyGroupBy", "DataType", "Expr", "DataFrame", "PolarsBackgroundHandle",
"LazyFrame", "Series", "LazyGroupBy", "DataType", "Expr", "DataFrame",
"When", "WhenThen", "WhenThenThen"
)
) # TODO discover all public class automatically
Expand Down
Loading