Skip to content

Parsing seems slow #1643

@lnicola

Description

@lnicola

It looks like the parser is one of the slower parts of rust-analyzer:

6.01%  <rowan::green::GreenNode as core::hash::Hash>::hash
4.49%  _int_malloc
4.34%  ra_parser::parser::Parser::nth
3.66%  ra_syntax::syntax_node::SyntaxTreeBuilder::finish_node
3.64%  <ra_syntax::parsing::text_token_source::TextTokenSource as ra_parser::TokenSource>::lookahead_nth
3.51%  _int_free
3.04%  malloc
2.82%  ra_parser::event::process
2.71%  alloc::sync::Arc<T>::drop_slow
2.67%  ra_mbe::subtree_source::SubtreeTokenSource::get
2.43%  <smol_str::SmolStr as core::hash::Hash>::hash
1.98%  __memmove_avx_unaligned_erms
1.72%  <rowan::cursor::SyntaxNode as core::ops::drop::Drop>::drop
1.68%  unlink_chunk.isra.0
1.54%  <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter
1.45%  malloc_consolidate
1.43%  ra_syntax::parsing::text_tree_sink::TextTreeSink::do_token
1.42%  ra_rustc_lexer::<impl ra_rustc_lexer::cursor::Cursor>::advance_token
1.25%  rowan::cursor::FreeList::try_push
1.21%  <rowan::cursor::SyntaxNodeChildren as core::iter::traits::iterator::Iterator>::next
1.19%  realloc
0.99%  <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::next
0.93%  <ra_mbe::subtree_source::SubtreeTokenSource as ra_parser::TokenSource>::lookahead_nth

Besides the green node hashing for which there is a TODO in rowan and the memory allocation, Parser::nth takes the most time while running ra_cli analysis-stats, especially once you factor its children from TextTokenSource and SubtreeTokenSource.

Note that nth is called very often, via current{,2,3} and at. The n ranges from 0 to 2, with an exception of 3 during parameter parsing. Looking at nth, it seems that:

  • it calls token_source.lookahead_nth(i) to get the current token kind
  • it passes the kind to is_composite
  • is_composite calls token_source.lookahead_nth for i, i + 1, i + 2
  • the things above happen in a loop from 0 to n
  • nth itself is often called by the parser in a row with 0..3
  • there's also current3, current2, current, which call nth three, two and one time
  • these functions are themselves called in a row:
    if let Some(t) = p.current3() {
        // ...
    }
    if let Some(t) = p.current2() {
        // ...
    }
    match p.current() {
        // ...
    }

These cause token_source.lookahead_nth to be called a large number of times, and that function in itself is not necessarily inexpensive.

While there is some low-hanging fruit like avoiding the first lookahead in is_composite (the caller already knows it) and only computing second and third if they're needed, it might be better to keep track of the current and next three tokens in bump or somewhere similar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    E-unknownIt's unclear if the issue is E-hard or E-easy without digging in

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions