-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
debug/profile polars #193
debug/profile polars #193
Conversation
There's sth wrong with the first timing, it can't be 9999ms |
It is a very naive timer which computes |
I tried to write a scoped profiler, which starts timing before executing any method and stops after. It also kept track of the polars method call-stack. However, this made the class system code more complicated/ugly and probably more difficult to maintain. Probably if this simple |
Thanks, there's no need to complexify all of this. Maybe it's just possible to remove the first timing which is always Otherwise it's not a big deal, it's already useful as is |
@etiennebacher it looks like this now. I think this option could be quite helpful. > pl$DataFrame(iris)$lazy()$select(pl$col("Sepal.Length")$mean())$collect()
[TIME? ms]
pl$DataFrame() -> [0.483ms]
.pr$DataFrame$new_with_capacity() -> [0.344ms]
.pr$DataFrame$set_column_from_robj() -> [0.2971ms]
.pr$DataFrame$set_column_from_robj() -> [0.262ms]
.pr$DataFrame$set_column_from_robj() -> [0.253ms]
.pr$DataFrame$set_column_from_robj() -> [0.2599ms]
.pr$DataFrame$set_column_from_robj() -> [0.298ms]
DataFrame$lazy() -> [0.237ms]
LazyFrame$select() -> [0.21ms]
pl$col() -> [0.164ms]
.pr$Expr$col() -> [0.237ms]
Expr$mean() -> [0.242ms]
ProtoExprArray$push_back_rexpr() -> [0.18ms]
.pr$LazyFrame$select() -> [0.206ms]
LazyFrame$collect() -> [0.169ms]
.pr$LazyFrame$collect() -> [0.7479ms]
DataFrame$print() -> [0.174ms]
.pr$DataFrame$print() -> shape: (1, 1)
┌──────────────┐
│ Sepal.Length │
│ --- │
│ f64 │
╞══════════════╡
│ 5.843333 │
└──────────────┘
> |
Looks great, just one minor thing: when I set the option to library(polars)
pl$set_polars_options(debug_polars=TRUE) # enable
#> $debug_polars
#> [1] FALSE
pl$set_polars_options(debug_polars=FALSE) # disable
#> [TIME? ms]
#> pl$set_polars_options() ->
#> $debug_polars
#> [1] TRUE |
That is a bit confusing, it arises from this is the defalt param.
I have made the previous state invisible. If #201 is accepted then pl$options$debug_polars(TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know in which order you want to merge your PRs but it looks good to me,
Do you want to mention somewhere in the docs that users can use this option if they feel that polars
is abnormally slow?
Merge branch 'main' into debug_polars # Conflicts: # NEWS.md # R/options.R # README.md # man/pl_pl.Rd # man/polars_options.Rd
I have added a mention in the bottom of README and in NEWS |
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
Merge branch 'main' into debug_polars # Conflicts: # man/pl_pl.Rd
In issue #135 we had some non obvious run-away S3 methods hampering performance. To assist future debugging/profiling such issues, I suggest to add a simple polars class call-stack viewer / profiler.
The little amount of extra indirection/overhead to activate this, is
if(polars_optenv$debug_polars) { #debug stuff }
in every public/private method-call and pl$-function call.It may also improve bug-reports. It may help with further development of S3-methods.
use case