Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check the pointer is valid #874

Merged
merged 5 commits into from
Mar 1, 2024
Merged

fix: check the pointer is valid #874

merged 5 commits into from
Mar 1, 2024

Conversation

eitsupi
Copy link
Collaborator

@eitsupi eitsupi commented Mar 1, 2024

Fix #851

I feel this ideally be implemented in extendr (or savvy etc.), but I will implement it here anyway.
cc @yutannihilation @CGMossa @JosiahParry

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 1, 2024

It seems to have little effect on performance.

Installed from 12d4bc2

library(polars)

countries = c(
  "France", "Germany", "United Kingdom", "Japan", "Columbia",
  "South Korea", "Vietnam", "South Africa", "Senegal", "Iran"
)

set.seed(123)
df_test = pl$DataFrame(
  country = sample(countries, 1e7, TRUE),
  x = sample(1:100, 1e7, TRUE),
  y = sample(1:1000, 1e7, TRUE)
)

lf_test = df_test$lazy()

lazy_query = lf_test$
  sort(pl$col("country"))$
  filter(
  pl$col("country")$is_in(pl$lit(c("United Kingdom", "Japan", "Vietnam")))
)

bench::mark(
  eager = df_test$
    sort(pl$col("country"))$
    filter(
      pl$col("country")$is_in(pl$lit(c("United Kingdom", "Japan", "Vietnam")))
    ),
  lazy = lazy_query$collect(),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eager         911ms    1.07s     0.934   14.56KB        0
#> 2 lazy          488ms 593.78ms     1.71     1.94KB        0

Created on 2024-03-01 with reprex v2.0.2

installed from main (bec40c6)

library(polars)

countries = c(
  "France", "Germany", "United Kingdom", "Japan", "Columbia",
  "South Korea", "Vietnam", "South Africa", "Senegal", "Iran"
)

set.seed(123)
df_test = pl$DataFrame(
  country = sample(countries, 1e7, TRUE),
  x = sample(1:100, 1e7, TRUE),
  y = sample(1:1000, 1e7, TRUE)
)

lf_test = df_test$lazy()

lazy_query = lf_test$
  sort(pl$col("country"))$
  filter(
  pl$col("country")$is_in(pl$lit(c("United Kingdom", "Japan", "Vietnam")))
)

bench::mark(
  eager = df_test$
    sort(pl$col("country"))$
    filter(
      pl$col("country")$is_in(pl$lit(c("United Kingdom", "Japan", "Vietnam")))
    ),
  lazy = lazy_query$collect(),
  iterations = 10
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 eager         978ms    1.05s     0.906   12.46KB        0
#> 2 lazy          532ms 574.97ms     1.72     1.94KB        0

Created on 2024-03-01 with reprex v2.0.2

@CGMossa
Copy link

CGMossa commented Mar 1, 2024

Hello! Allow me to give my two cents on this.

  • A null in R, NULL, is an SEXP with type NILSXP. This means that when you set a variable to NULL, you're changing the type to a "disposable" one.
  • A null-check for an external-ptr consists of checking whether the externalptr attributed to an SEXP is in fact a NULL in terms of C, meaning that the value of the pointer is 0. A NULL in R's C-API is a R_NilValue, which is not a C NULL.
  • Do you return &self/&mut self in any of the #[extendr]-impl blocks here? I'm about to check that myself.

Copy link
Collaborator

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I confirm the test doesn't segfault locally. I'm gonna let @CGMossa and others discuss about the best method here, I don't have enough knowledge for that.

Can you bump NEWS?

R/polars-package.R Show resolved Hide resolved
@yutannihilation
Copy link
Contributor

Thanks! I actually saw that null check in cpp11, but didn't realize when this is actually needed until today :)

https://github.com/r-lib/cpp11/blob/main/inst/include/cpp11/external_pointer.hpp

I'll fix savvy first.

@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 1, 2024

@CGMossa Thanks for taking a look at this.
Referring to the comment #851 (comment) for details.

@CGMossa
Copy link

CGMossa commented Mar 1, 2024

I've been experimenting and fixing how we do externalptr in extendr on https://github.com/cgmossa/extendr/. It should become more robust soon. I tend to focus on the wrong thing, so allow me to ask you this: Where do you believe the null-checking should occur, and what should it do?

@yutannihilation
Copy link
Contributor

yutannihilation commented Mar 1, 2024

Here's my take. Hope this helps.

https://github.com/yutannihilation/savvy/pull/95/files#diff-e2092a335ec465830416f879103b89ea75be8e414065e9e934cd82697ea63ea6

library(savvyExamples)

rds_file <- tempfile()

x <- Person()
saveRDS(x, rds_file)

x <- readRDS(rds_file)
x$name()
#> Error: Invalid external pointer

Created on 2024-03-01 with reprex v2.1.0

@eitsupi eitsupi merged commit ec868aa into main Mar 1, 2024
18 checks passed
@eitsupi eitsupi deleted the null-pointer branch March 1, 2024 13:31
@eitsupi eitsupi mentioned this pull request May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segfault when printing a list of DataFrames created with future
4 participants