Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve performance of XPath linters using //* predicates #1197

Merged
merged 11 commits into from
May 23, 2022

Conversation

AshesITR
Copy link
Collaborator

Rewrote all //*[(self::xxx or self::yyy) and zzz] constructs to the faster //xxx[zzz] | //xxx[zzz] XPath representation.

@AshesITR AshesITR linked an issue May 23, 2022 that may be closed by this pull request
@AshesITR
Copy link
Collaborator Author

Ran a more thorough benchmark on tests/testthat/default_linter_testcode.R with interesting results:

modified_linters <- c(
  "assignment_linter",
  "backport_linter",
  "infix_spaces_linter",
  "object_usage_linter",
  "package_hooks_linter",
  "undesirable_function_linter",
  "vector_logic_linter"
)

gert::git_branch_checkout("main")
devtools::load_all()
old_linters <- lapply(modified_linters, function(linter_name) {
  eval(call(linter_name))
})
gert::git_branch_checkout("fix/1196-performance")
devtools::load_all()
new_linters <- lapply(modified_linters, function(linter_name) {
  eval(call(linter_name))
})

exprs <- get_source_expressions("tests/testthat/default_linter_testcode.R")
benchmarks <- vector(mode = "list", length = length(modified_linters))

for (i in seq_along(modified_linters)) {
  cat("## ", modified_linters[i], ":\n", sep = "")
  benchmark <- bench::mark(
    lapply(exprs$expressions, old_linters[[i]]),
    lapply(exprs$expressions, new_linters[[i]]),
    min_time = 1.0
  )
  benchmarks[[i]] <- benchmark
  print(benchmark)
  cat("\n\n")
}

vapply(setNames(benchmarks, modified_linters), function(bm) c("old", "new")[which.max(bm[["itr/sec"]])], character(1L))

Produces the following verdict:

          assignment_linter             backport_linter         infix_spaces_linter 
                      "new"                       "new"                       "old" 
        object_usage_linter        package_hooks_linter undesirable_function_linter 
                      "new"                       "old"                       "old" 
        vector_logic_linter 
                      "old" 

I'm reverting the implementations where "old" is faster, although not 100% sure for some implementations if that is just because of the relatively simple structure of the testcode.

@AshesITR
Copy link
Collaborator Author

@MichaelChirico I've managed to halve the runtimes of undesirable_function_linter and infix_spaces_linter by moving the XPath construction out of the linting closure.
Now all linters are either unchanged or faster than main, as tested by the benchmark above.

@AshesITR
Copy link
Collaborator Author

On tools/QC.R:

## infix_spaces_linter:
Warning: Some expressions had a GC in every iteration; so filtering is disabled.
# A tibble: 2 x 13
  expression                                      min median `itr/sec` mem_alloc `gc/sec`
  <bch:expr>                                  <bch:t> <bch:>     <dbl> <bch:byt>    <dbl>
1 lapply(exprs$expressions, old_linters[[i]])   798ms  798ms      1.25   644.6KB    1.25 
2 lapply(exprs$expressions, new_linters[[i]])   215ms  219ms      4.56    11.9KB    0.913
# ... with 7 more variables: n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
#   result <list>, memory <list>, time <list>, gc <list>

@MichaelChirico
Copy link
Collaborator

Nice. Can you break that down by linter?

@AshesITR
Copy link
Collaborator Author

AshesITR commented May 23, 2022

Here's a fresh run on 4446242, using tools/QC.R:

## assignment_linter:
# A tibble: 2 × 13
  expression                                       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result       memory time            gc               
  <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>       <list> <list>          <list>           
1 lapply(exprs$expressions, old_linters[[i]])   27.2ms   28.4ms      34.5        NA     6.63    26     5      754ms <list [401]> <NULL> <bench_tm [31]> <tibble [31 × 3]>
2 lapply(exprs$expressions, new_linters[[i]])   27.3ms   28.4ms      35.2        NA     5.87    30     5      852ms <list [401]> <NULL> <bench_tm [35]> <tibble [35 × 3]>


## backport_linter:
# A tibble: 2 × 13
  expression                                       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result       memory time            gc               
  <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>       <list> <list>          <list>           
1 lapply(exprs$expressions, old_linters[[i]])     97ms   98.2ms      10.2        NA     23.8     3     7      295ms <list [401]> <NULL> <bench_tm [10]> <tibble [10 × 3]>
2 lapply(exprs$expressions, new_linters[[i]])   95.4ms   95.8ms      10.4        NA     41.7     2     8      192ms <list [401]> <NULL> <bench_tm [10]> <tibble [10 × 3]>


## infix_spaces_linter:
# A tibble: 2 × 13
  expression                                       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result       memory time           gc              
  <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>       <list> <list>         <list>          
1 lapply(exprs$expressions, old_linters[[i]])    323ms    323ms      3.09        NA     9.28     1     3      323ms <list [401]> <NULL> <bench_tm [4]> <tibble [4 × 3]>
2 lapply(exprs$expressions, new_linters[[i]])    154ms    157ms      6.36        NA     1.06     6     1      943ms <list [401]> <NULL> <bench_tm [7]> <tibble [7 × 3]>


## object_usage_linter:
Warning: Some expressions had a GC in every iteration; so filtering is disabled.
# A tibble: 2 × 13
  expression                                       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result       memory time           gc              
  <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>       <list> <list>         <list>          
1 lapply(exprs$expressions, old_linters[[i]])    5.47s    5.47s     0.183        NA     7.49     1    41      5.47s <list [401]> <NULL> <bench_tm [1]> <tibble [1 × 3]>
2 lapply(exprs$expressions, new_linters[[i]])    5.35s    5.35s     0.187        NA     7.67     1    41      5.35s <list [401]> <NULL> <bench_tm [1]> <tibble [1 × 3]>


## undesirable_function_linter:
# A tibble: 2 × 13
  expression                                       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result       memory time            gc               
  <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>       <list> <list>          <list>           
1 lapply(exprs$expressions, old_linters[[i]])    134ms    135ms      7.34        NA     4.40     5     3      681ms <list [401]> <NULL> <bench_tm [8]>  <tibble [8 × 3]> 
2 lapply(exprs$expressions, new_linters[[i]])    100ms    101ms      9.89        NA     1.10     9     1      910ms <list [401]> <NULL> <bench_tm [10]> <tibble [10 × 3]>


## vector_logic_linter:
# A tibble: 2 × 13
  expression                                       min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result       memory time            gc               
  <bch:expr>                                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>       <list> <list>          <list>           
1 lapply(exprs$expressions, old_linters[[i]])     54ms   54.9ms      18.1        NA     3.62    15     3      829ms <list [401]> <NULL> <bench_tm [18]> <tibble [18 × 3]>
2 lapply(exprs$expressions, new_linters[[i]])   53.9ms   55.4ms      17.8        NA     2.23    16     2      898ms <list [401]> <NULL> <bench_tm [18]> <tibble [18 × 3]>
  • assignment_linter: code is the same (just removed the TODO w.r.t. performance since we'd tested that)
  • backport_linter: new is ever so slightly faster
  • infix_spaces_linter: double speed by moving XPath outside
  • object_usage_linter: new XPath is a little bit faster
  • undesirable_function_linter: slightly faster by moving XPath outside
  • vector_logic_linter: randomly slower than main. It seems for STR_CONST XPaths, the benefit of moving outside is marginal - probably because the global string table only contains the string once anyway.

@MichaelChirico
Copy link
Collaborator

it seems like most of the benefit here is from moving the xpath generation into the factory then?

i am wary of readability...

//NODE1 | //NODE2
# is more readable than
//*[self::NODE1 or self::NODE2]

# but i don't think so for this case
//NODE1[some complicated conditons] | //NODE2[some complicated conditions]
# vs.
//*[(self::NODE1 or self::NODE2) and (some complicated conditions)]

also it looks like the benefits from optimizing get_source_expressions() far outweigh the improvements here, is that right?

@AshesITR
Copy link
Collaborator Author

The get_source_expressions() improvement will definitely help.
Performance improvements were also visible for just XPath changes (that was the comparison from before), but I agree the refactoring of computing XPaths to outside the linter closure provides most of the speedup.

I suggest we try to move all XPaths to the factories if possible and try to make them as readable as possible.

@AshesITR
Copy link
Collaborator Author

WDYT about the new get_function_assignments() XPath? I actually find it more readable because the old XPath has very contrived logical conjunctions.

Ready to merge IMO.

@MichaelChirico
Copy link
Collaborator

I actually find it more readable because the old XPath has very contrived logical conjunctions.

agreed... that was trying too hard to avoid re-using [].

@AshesITR AshesITR merged commit ee21612 into main May 23, 2022
@AshesITR AshesITR deleted the fix/1196-performance branch May 23, 2022 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance: Avoid //*[self::...] constructs.
2 participants