Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize getting node tokens #8450

Merged
merged 1 commit into from Aug 5, 2020
Merged

Conversation

fatkodima
Copy link
Contributor

@fatkodima fatkodima commented Aug 4, 2020

Ran on 30k files.

Before

(1) 74659  (    6.8%)  RuboCop::Cop::Layout::SpaceInsideReferenceBrackets#on_send
(2) 39346  (    3.6%)  RuboCop::Cop::Layout::SpaceInsideArrayLiteralBrackets#on_array
    38575  (    3.5%)  RuboCop::Cop::Style::RedundantSelf#on_block
    33935  (    3.1%)  RuboCop::Cop::Layout::SpaceBeforeFirstArg#on_send
    26757  (    2.4%)  RuboCop::Cop::Layout::FirstArgumentIndentation#on_send
    26271  (    2.4%)  RuboCop::Cop::StringHelp#on_str
    24590  (    2.2%)  RuboCop::Cop::Layout::DefEndAlignment#on_send
    23826  (    2.2%)  RuboCop::Cop::Layout::IndentationConsistency#on_begin
(3) 17870  (    1.6%)  RuboCop::Cop::Layout::SpaceInsideHashLiteralBraces#on_hash
    16582  (    1.5%)  RuboCop::Cop::MultilineExpressionIndentation#on_send
    15623  (    1.4%)  RuboCop::Cop::Interpolation#on_dstr
......

After

    45373  (    3.7%)  RuboCop::Cop::Style::RedundantSelf#on_block
    40617  (    3.3%)  RuboCop::Cop::Layout::SpaceBeforeFirstArg#on_send
    36688  (    3.0%)  RuboCop::Cop::Layout::FirstArgumentIndentation#on_send
    31966  (    2.6%)  RuboCop::Cop::StringHelp#on_str
    31289  (    2.5%)  RuboCop::Cop::Layout::DefEndAlignment#on_send
    29682  (    2.4%)  RuboCop::Cop::Layout::IndentationConsistency#on_begin
(1) 25490  (    2.1%)  RuboCop::Cop::Layout::SpaceInsideReferenceBrackets#on_send
    23245  (    1.9%)  RuboCop::Cop::MultilineExpressionIndentation#on_send
    18734  (    1.5%)  RuboCop::Cop::Style::AccessModifierDeclarations#on_send
    17297  (    1.4%)  RuboCop::Cop::Layout::ArgumentAlignment#on_send
    16772  (    1.4%)  RuboCop::Cop::Style::FormatStringToken#on_str
    15592  (    1.3%)  RuboCop::Cop::Style::NumericPredicate#on_send
    14655  (    1.2%)  RuboCop::Cop::Layout::LineLength#on_potential_breakable_node
    14547  (    1.2%)  RuboCop::Cop::Layout::DotPosition#on_send
    13943  (    1.1%)  RuboCop::Cop::Layout::EmptyLinesAroundAccessModifier#on_send
    13836  (    1.1%)  RuboCop::Cop::Layout::SpaceAroundOperators#on_send
    13180  (    1.1%)  RuboCop::Cop::Style::TrailingCommaInArguments#on_send
    12772  (    1.0%)  RuboCop::Cop::Style::ConditionalAssignment#on_send
    12746  (    1.0%)  RuboCop::Cop::Metrics::BlockLength#on_block
    11655  (    0.9%)  RuboCop::Cop::Style::ZeroLengthPredicate#on_send
    11447  (    0.9%)  RuboCop::Cop::Style::MethodCallWithoutArgsParentheses#on_send
    11365  (    0.9%)  RuboCop::Cop::CheckAssignment#on_send
    11055  (    0.9%)  RuboCop::Cop::Layout::IndentationWidth#on_send
    10931  (    0.9%)  RuboCop::Cop::MethodComplexity#on_def
    10492  (    0.9%)  RuboCop::Cop::MethodComplexity#on_def
    10311  (    0.8%)  RuboCop::Cop::Style::InverseMethods#on_send
    10222  (    0.8%)  RuboCop::Cop::StringHelp#on_str
    10211  (    0.8%)  RuboCop::Cop::Layout::ClosingParenthesisIndentation#on_send
    10131  (    0.8%)  RuboCop::Cop::Interpolation#on_dstr
    9837  (    0.8%)  RuboCop::Cop::Style::SignalException#on_send
    9732  (    0.8%)  RuboCop::Cop::Layout::BlockAlignment#on_block
    9620  (    0.8%)  RuboCop::Cop::Style::RedundantSort#on_send
(2) 9319  (    0.8%)  RuboCop::Cop::Layout::SpaceInsideArrayLiteralBrackets#on_array
    8877  (    0.7%)  RuboCop::Cop::Style::EmptyLiteral#on_send
    8427  (    0.7%)  RuboCop::Cop::Style::RedundantException#on_send
    8300  (    0.7%)  RuboCop::Cop::Style::ExpandPathArguments#on_send
    8288  (    0.7%)  RuboCop::Cop::Lint::SafeNavigationChain#on_send
    7771  (    0.6%)  RuboCop::Cop::CheckAssignment#on_send
    7580  (    0.6%)  RuboCop::Cop::Style::EvalWithLocation#on_send
    7485  (    0.6%)  RuboCop::Cop::Style::NonNilCheck#on_send

Improvement: 9-10%

@marcandre
Copy link
Contributor

Awesome!
It's unfortunate that tokens aren't always sorted, I didn't know this 🙇‍♂️

I'm ok to merge this, but one easy thing we really should be doing is putting the caching / searching as methods of ProcessedSource. This would have two advantages: methods not being accessible to cops (the fact that your mixin's methods are private makes no difference) and the cache doesn't have to be rebuilt for different cops (although I imagine they might not access the same nodes?). WDYT?

Also: did you try sorting the tokens by positions?


# rubocop:disable Metrics/AbcSize
def tokens(node)
@tokens ||= {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you just copy-pasted, but {}.compare_by_identity would be nicer to use.

@fatkodima
Copy link
Contributor Author

and the cache doesn't have to be rebuilt for different cops (although I imagine they might not access the same nodes?)

Yes, looks like the cache not needed in this case. I have also removed such a cache https://github.com/rubocop-hq/rubocop/pull/8450/files#diff-938b46da0d108c71380ac9427b83dde0L47-L56 and didn't notice any difference. In my previous PRs, where I tried to reduce memory, I've seen that method allocated enough memory (can't remember how much), so a win here also.

WDYT?

Yes, will move those methods to ProcessedSource.

Also: did you try sorting the tokens by positions?

This will always work in O(nlogn), while original works in O(n) and the proposed approach almost always will work in O(logn + #node_tokens). Or I'm missing something?

@marcandre
Copy link
Contributor

This will always work in O(nlogn)

Correct

while original works in O(n)

Yes, each search is O(n), but the number of searches is also roughly linear with the number of tokens, so overall it was O(n^2), and now with bsearch it's O(n log(n)) overall, so maybe a sort (assuming it's once per ProcessedSource and not per cop) is similar speed (and simpler), or it could be slower or faster, I don't know.

Probably better would simply be to index the tokens by start and end position and do a straight lookups, since we will should always find tokens that start and end exactly where we are looking to, right? Building the indices would be O(n) and each lookup would be O(1).

@fatkodima
Copy link
Contributor Author

but the number of searches is also roughly linear with the number of tokens

Why? I would say, it is a constant. Most of the time, we are not performing searches for every token, but just for some of them, like in Lint/SpaceInsideArrayLiteralBrackets for example, for the whole file we may be searching only for just [ and ].

@marcandre
Copy link
Contributor

but the number of searches is also roughly linear with the number of tokens

Why? I would say, it is a constant. Most of the time, we are not performing searches for every token, but just for some of them, like in Lint/SpaceInsideArrayLiteralBrackets for example, for the whole file we may be searching only for just [ and ].

Right, but how many [ ... ] literals will you find in a Ruby file of n bytes, on average? I imagine that the best approximation is linear with n. The size in bytes, the number of tokens, the number of def nodes, of comments, of lines of codes, of string literals, of array literals, etc., all of these should be roughly linear to one another, even if they are not directly related...

What do you think of my idea of indexing the begin and end position of tokens?

@fatkodima
Copy link
Contributor Author

What do you think of my idea of indexing the begin and end position of tokens?

I'm still have doubt that this will make things faster. Will implement all 3 approaches (original, sorting and indexing) and report results here.

@marcandre
Copy link
Contributor

I'm still have doubt that this will make things faster. Will implement all 3 approaches (original, sorting and indexing) and report results here.

That would be great, but I wouldn't want you to feel forced; as stated previously, I'm 👍 to merge this as is.

@fatkodima
Copy link
Contributor Author

but I wouldn't want you to feel forced

No problems 😄

Ok, I have tested 3 implementations and was a bit surprised as I get almost identical results for them. I rechecked, I was testing 3 different versions, not the same one, so everything ok here.

With sorting version, I get the most concise code and it does not allocate extra space, like in indexing version, so I would implement that and submit a PR into rubocop-ast tomorrow.

@marcandre
Copy link
Contributor

Ok, I have tested 3 implementations and was a bit surprised as I get almost identical results for them. I rechecked, I was testing 3 different versions, not the same one, so everything ok here.

Cool! Glad you checked 👍

@bbatsov bbatsov merged commit c29441f into rubocop:master Aug 5, 2020
@bbatsov
Copy link
Collaborator

bbatsov commented Aug 5, 2020

Good work!

@marcandre
Copy link
Contributor

Good work!

🤣 we'll be picking a different implementation, but it won't be difficult to revert this

@bbatsov
Copy link
Collaborator

bbatsov commented Aug 5, 2020

Yeah, I saw this, but I'm thinking of cutting a release later today, so that's going to benefit our users in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants