Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop get_by_id traversal on first match #572

Merged
merged 1 commit into from
May 24, 2024

Conversation

ypconstante
Copy link
Contributor

Today get_by_id just calls Finder.find and gets the first item in the list.
This PR refactors this method to avoid the unnecessary checks on match? and check the id value directly, and to stop the traversal on the first matching element.

Benchee.run(
  %{
    "get_by_id - beginning document" => fn doc -> Floki.get_by_id(doc, "mw-page-base") end,
    "get_by_id - end document" => fn doc -> Floki.get_by_id(doc, "footer-poweredbyico") end
  },
  time: 10,
  inputs: inputs,
  save: [path: "benchs/results/finder-#{tag}.benchee", tag: tag],
  memory_time: 2
)
##### With input big #####
Name                                                          ips        average  deviation         median         99th %
get_by_id - beginning document (PR)                      691.49 K     0.00145 ms  ±2360.83%     0.00124 ms     0.00233 ms
get_by_id - end document (PR)                              1.36 K        0.74 ms    ±11.08%        0.70 ms        1.09 ms
get_by_id - beginning document (v0.36.2-6-gb04be56)        0.75 K        1.33 ms     ±9.20%        1.29 ms        1.88 ms
get_by_id - end document (v0.36.2-6-gb04be56)              0.75 K        1.34 ms     ±9.00%        1.30 ms        1.89 ms

Comparison: 
get_by_id - beginning document (PR)                      691.49 K
get_by_id - end document (PR)                              1.36 K - 509.97x slower +0.74 ms
get_by_id - beginning document (v0.36.2-6-gb04be56)        0.75 K - 922.09x slower +1.33 ms
get_by_id - end document (v0.36.2-6-gb04be56)              0.75 K - 925.15x slower +1.34 ms

Memory usage statistics:

Name                                                   Memory usage
get_by_id - beginning document (PR)                             0 B
get_by_id - end document (PR)                                   0 B - 1.00x memory usage +0 B
get_by_id - beginning document (v0.36.2-6-gb04be56)           120 B - ∞ x memory usage +120 B
get_by_id - end document (v0.36.2-6-gb04be56)                 120 B - ∞ x memory usage +120 B

**All measurements for memory usage were the same**

##### With input medium #####
Name                                                          ips        average  deviation         median         99th %
get_by_id - beginning document (PR)                      740.60 K        1.35 μs  ±2586.25%        1.17 μs        1.79 μs
get_by_id - end document (PR)                              3.78 K      264.49 μs    ±12.71%      253.67 μs      432.10 μs
get_by_id - beginning document (v0.36.2-6-gb04be56)        2.16 K      462.08 μs    ±12.16%      442.41 μs      711.64 μs
get_by_id - end document (v0.36.2-6-gb04be56)              2.16 K      462.52 μs    ±12.39%      446.88 μs      727.15 μs

Comparison: 
get_by_id - beginning document (PR)                      740.60 K
get_by_id - end document (PR)                              3.78 K - 195.88x slower +263.14 μs
get_by_id - beginning document (v0.36.2-6-gb04be56)        2.16 K - 342.22x slower +460.73 μs
get_by_id - end document (v0.36.2-6-gb04be56)              2.16 K - 342.54x slower +461.17 μs

Memory usage statistics:

Name                                                   Memory usage
get_by_id - beginning document (PR)                             0 B
get_by_id - end document (PR)                                   0 B - 1.00x memory usage +0 B
get_by_id - beginning document (v0.36.2-6-gb04be56)           120 B - ∞ x memory usage +120 B
get_by_id - end document (v0.36.2-6-gb04be56)                 120 B - ∞ x memory usage +120 B

**All measurements for memory usage were the same**

##### With input small #####
Name                                                          ips        average  deviation         median         99th %
get_by_id - beginning document (PR)                      690.85 K        1.45 μs  ±1595.44%        1.29 μs        1.73 μs
get_by_id - end document (PR)                             17.30 K       57.80 μs    ±20.12%       54.98 μs      110.33 μs
get_by_id - end document (v0.36.2-6-gb04be56)              9.88 K      101.18 μs    ±20.39%       93.89 μs      190.50 μs
get_by_id - beginning document (v0.36.2-6-gb04be56)        9.88 K      101.20 μs    ±19.41%       93.79 μs      180.39 μs

Comparison: 
get_by_id - beginning document (PR)                      690.85 K
get_by_id - end document (PR)                             17.30 K - 39.93x slower +56.35 μs
get_by_id - end document (v0.36.2-6-gb04be56)              9.88 K - 69.90x slower +99.73 μs
get_by_id - beginning document (v0.36.2-6-gb04be56)        9.88 K - 69.92x slower +99.76 μs

Memory usage statistics:

Name                                                   Memory usage
get_by_id - beginning document (PR)                             0 B
get_by_id - end document (PR)                                   0 B - 1.00x memory usage +0 B
get_by_id - end document (v0.36.2-6-gb04be56)                 120 B - ∞ x memory usage +120 B
get_by_id - beginning document (v0.36.2-6-gb04be56)           120 B - ∞ x memory usage +120 B

@philss philss merged commit ac18305 into philss:main May 24, 2024
9 checks passed
@philss
Copy link
Owner

philss commented May 24, 2024

Cool! Thanks! :D

@ypconstante ypconstante deleted the optimize-get_by_id branch May 24, 2024 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants