Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as_polars_df() for nanoarrow_array_stream seems slow #893

Closed
eitsupi opened this issue Mar 4, 2024 · 1 comment · Fixed by #896
Closed

as_polars_df() for nanoarrow_array_stream seems slow #893

eitsupi opened this issue Mar 4, 2024 · 1 comment · Fixed by #896
Assignees
Labels
enhancement New feature or request

Comments

@eitsupi
Copy link
Collaborator

eitsupi commented Mar 4, 2024

Slower than as_tibble surprised me.

library(adbcdrivermanager)
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#>     timestamp
library(tibble)
library(polars)

polars_info()
#> Polars R package version : 0.15.0
#> Rust Polars crate version: 0.38.1
#>
#> Thread pool size: 16
#>
#> Features:
#> default                    TRUE
#> full_features              TRUE
#> disable_limit_max_threads  TRUE
#> nightly                    TRUE
#> sql                        TRUE
#> rpolars_debug_print       FALSE
#>
#> Code completion: deactivated

db <- adbc_database_init(adbcsqlite::adbcsqlite())
con <- adbc_connection_init(db)

flights <- nycflights13::flights
flights$time_hour <- NULL
flights |>
  write_adbc(con, "flights")

query <- "SELECT * from flights LIMIT 10000"

bench::mark(
  polars_df_1 = {
    con |>
      read_adbc(query) |>
      as_polars_df()
  },
  arrow_table = {
    con |>
      read_adbc(query) |>
      as_arrow_table()
  },
  tibble = {
    con |>
      read_adbc(query) |>
      as_tibble()
  },
  polars_df_2 = {
    con |>
      read_adbc(query) |>
      as_polars_df()
  },
  check = FALSE,
  min_iterations = 5
)
#> # A tibble: 4 × 6
#>   expression       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars_df_1  158.5ms  158.8ms      5.58    1.13MB     3.72
#> 2 arrow_table   36.5ms     39ms     25.6     1.62MB     2.13
#> 3 tibble        40.8ms   43.2ms     22.9     1.62MB     0
#> 4 polars_df_2  153.8ms  185.8ms      5.38    89.1KB     8.07

Created on 2024-03-04 with reprex v2.0.2

@eitsupi eitsupi added the enhancement New feature or request label Mar 4, 2024
@eitsupi
Copy link
Collaborator Author

eitsupi commented Mar 4, 2024

I think the content of as_polars_df.nanoarrow_array_stream is bad.
Proof of this is that converting to DataFrame after converting to Series is not slow.

I will rewrite that. (Related to #755)

library(adbcdrivermanager)
library(polars)

polars_info()
#> Polars R package version : 0.15.0
#> Rust Polars crate version: 0.38.1
#>
#> Thread pool size: 16
#>
#> Features:
#> default                    TRUE
#> full_features              TRUE
#> disable_limit_max_threads  TRUE
#> nightly                    TRUE
#> sql                        TRUE
#> rpolars_debug_print       FALSE
#>
#> Code completion: deactivated

db <- adbc_database_init(adbcsqlite::adbcsqlite())
con <- adbc_connection_init(db)

flights <- nycflights13::flights
flights$time_hour <- NULL
flights |>
  write_adbc(con, "flights")

query <- "SELECT * from flights LIMIT 10000"

bench::mark(
  polars_df = {
    con |>
      read_adbc(query) |>
      as_polars_df()
  },
  polars_df2 = {
    con |>
      read_adbc(query) |>
      (\(x) as_polars_series(x)$to_frame())()
  },
  check = FALSE,
  min_iterations = 5
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 polars_df   154.1ms  154.1ms      6.49    1.17MB    26.0
#> 2 polars_df2   44.2ms   47.9ms     20.2     1.64MB     2.02

Created on 2024-03-04 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant