Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug/profile polars #193

Merged
merged 10 commits into from
May 12, 2023
Merged

debug/profile polars #193

merged 10 commits into from
May 12, 2023

Conversation

sorhawell
Copy link
Collaborator

@sorhawell sorhawell commented May 3, 2023

In issue #135 we had some non obvious run-away S3 methods hampering performance. To assist future debugging/profiling such issues, I suggest to add a simple polars class call-stack viewer / profiler.

The little amount of extra indirection/overhead to activate this, is if(polars_optenv$debug_polars) { #debug stuff } in every public/private method-call and pl$-function call.

It may also improve bug-reports. It may help with further development of S3-methods.

use case

pl$set_polars_options(debug_polars=TRUE) #enable

# run some polars code
> pl$DataFrame(iris)$select("Species","Sepal.Length")$limit(10)
[9999ms]
pl$DataFrame() -> [1.973ms]
   .pr$DataFrame$new_with_capacity() -> [1.188ms]
   .pr$DataFrame$set_column_from_robj() -> [2.131ms]
   .pr$DataFrame$set_column_from_robj() -> [1.701ms]
   .pr$DataFrame$set_column_from_robj() -> [1.585ms]
   .pr$DataFrame$set_column_from_robj() -> [1.453ms]
   .pr$DataFrame$set_column_from_robj() -> [2.558ms]
DataFrame$select() -> [0.953ms]
ProtoExprArray$push_back_rexpr() -> [1.961ms]
pl$col() -> [1.382ms]
   .pr$Expr$col() -> [0.9389ms]
ProtoExprArray$push_back_rexpr() -> [0.885ms]
pl$col() -> [1.932ms]
   .pr$Expr$col() -> [1.98ms]
   .pr$DataFrame$select() -> [2.111ms]
DataFrame$limit() -> [1.115ms]
DataFrame$lazy() -> [0.458ms]
LazyFrame$limit() -> [0.2651ms]
   .pr$LazyFrame$limit() -> [0.355ms]
LazyFrame$collect() -> [0.3009ms]
   .pr$LazyFrame$collect() -> [0.6909ms]
DataFrame$print() -> [1.349ms]
   .pr$DataFrame$print() -> shape: (10, 2)
┌─────────┬──────────────┐
│ SpeciesSepal.Length │
│ ------          │
│ catf64          │
╞═════════╪══════════════╡
│ setosa5.1          │
│ setosa4.9          │
│ setosa4.7          │
│ setosa4.6          │
│ …       ┆ …            │
│ setosa4.6          │
│ setosa5.0          │
│ setosa4.4          │
│ setosa4.9          │
└─────────┴──────────────┘

@sorhawell sorhawell marked this pull request as ready for review May 3, 2023 11:46
@etiennebacher
Copy link
Collaborator

There's sth wrong with the first timing, it can't be 9999ms

@sorhawell
Copy link
Collaborator Author

There's sth wrong with the first timing, it can't be 9999ms

It is a very naive timer which computes this_time - last_time, where only last_time is stored in runtime_state$last_subtime. The first time will always be 9999ms or how long time since last timer call. Maybe I could wrap all methods with a timerfunction, but it would become more complicated.

@sorhawell
Copy link
Collaborator Author

I tried to write a scoped profiler, which starts timing before executing any method and stops after. It also kept track of the polars method call-stack. However, this made the class system code more complicated/ugly and probably more difficult to maintain. Probably if this simple debug_polars is not telling enough, then profvis and/or browser() are better alternatives.

@etiennebacher
Copy link
Collaborator

Thanks, there's no need to complexify all of this. Maybe it's just possible to remove the first timing which is always [9999ms]? Like replace it with [TIME] or sth else?

Otherwise it's not a big deal, it's already useful as is

@sorhawell
Copy link
Collaborator Author

@etiennebacher it looks like this now. I think this option could be quite helpful.

> pl$DataFrame(iris)$lazy()$select(pl$col("Sepal.Length")$mean())$collect()
[TIME? ms]
pl$DataFrame() -> [0.483ms]
   .pr$DataFrame$new_with_capacity() -> [0.344ms]
   .pr$DataFrame$set_column_from_robj() -> [0.2971ms]
   .pr$DataFrame$set_column_from_robj() -> [0.262ms]
   .pr$DataFrame$set_column_from_robj() -> [0.253ms]
   .pr$DataFrame$set_column_from_robj() -> [0.2599ms]
   .pr$DataFrame$set_column_from_robj() -> [0.298ms]
DataFrame$lazy() -> [0.237ms]
LazyFrame$select() -> [0.21ms]
pl$col() -> [0.164ms]
   .pr$Expr$col() -> [0.237ms]
Expr$mean() -> [0.242ms]
ProtoExprArray$push_back_rexpr() -> [0.18ms]
   .pr$LazyFrame$select() -> [0.206ms]
LazyFrame$collect() -> [0.169ms]
   .pr$LazyFrame$collect() -> [0.7479ms]
DataFrame$print() -> [0.174ms]
   .pr$DataFrame$print() -> shape: (1, 1)
┌──────────────┐
│ Sepal.Length │
│ ---          │
│ f64          │
╞══════════════╡
│ 5.843333     │
└──────────────┘
> 

@etiennebacher
Copy link
Collaborator

Looks great, just one minor thing: when I set the option to TRUE, it prints that it is FALSE (and vice-versa)

library(polars)

pl$set_polars_options(debug_polars=TRUE) # enable
#> $debug_polars
#> [1] FALSE

pl$set_polars_options(debug_polars=FALSE) # disable
#> [TIME? ms]
#> pl$set_polars_options() ->
#> $debug_polars
#> [1] TRUE

@sorhawell
Copy link
Collaborator Author

That is a bit confusing, it arises from this is the defalt param.

#' @param return_replaced_options return previous state of modified options
#' Convenient for temporarily swapping of options during testing.
#'

I have made the previous state invisible.

If #201 is accepted then debug_polars can also be activated like this

pl$options$debug_polars(TRUE)

Copy link
Collaborator

@etiennebacher etiennebacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know in which order you want to merge your PRs but it looks good to me,

Do you want to mention somewhere in the docs that users can use this option if they feel that polars is abnormally slow?

Merge branch 'main' into debug_polars

# Conflicts:
#	NEWS.md
#	R/options.R
#	README.md
#	man/pl_pl.Rd
#	man/polars_options.Rd
@sorhawell
Copy link
Collaborator Author

Do you want to mention somewhere in the docs that users can use this option if they feel that polars is abnormally slow?

I have added a mention in the bottom of README and in NEWS

README.Rmd Outdated Show resolved Hide resolved
sorhawell and others added 3 commits May 10, 2023 15:55
Co-authored-by: Etienne Bacher <52219252+etiennebacher@users.noreply.github.com>
Merge branch 'main' into debug_polars

# Conflicts:
#	man/pl_pl.Rd
@sorhawell sorhawell merged commit 6a3522d into main May 12, 2023
8 checks passed
@sorhawell sorhawell deleted the debug_polars branch May 12, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants