Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(semantic): indextree #115

Closed
Boshen opened this issue Mar 4, 2023 · 4 comments
Closed

perf(semantic): indextree #115

Boshen opened this issue Mar 4, 2023 · 4 comments
Assignees
Labels
C-enhancement Category - New feature or request

Comments

@Boshen
Copy link
Member

Boshen commented Mar 4, 2023

We are currently constructing a parent pointing tree using the indextree crate, but this is probably an overkill both cpu and memory wise.

The only functionality we need are:

  • find a node's parent node
  • iterate through a node's ancestor nodes

For performance, I wonder if using bumpalo will be better than a Vec by using the same allocator for the AST.

@Boshen Boshen added the C-enhancement Category - New feature or request label Mar 4, 2023
@Boshen Boshen added this to the MVP milestone Mar 4, 2023
@Boshen
Copy link
Member Author

Boshen commented Mar 4, 2023

cc @YoniFeng, since you are really good at this kind of low level stuff, and I'm learning a lot from you.

@Boshen
Copy link
Member Author

Boshen commented Mar 4, 2023

Relevant CPU profile:

image

@YoniFeng
Copy link
Contributor

YoniFeng commented Mar 4, 2023

I need to get a setup for profiling native applications(on Windows..), been lazy so far.
Most of my profiling analysis experience is from a previous job where the profiling was already instrumented on production machines, and the data was served on a silver platter (and it was mostly non-native).

From reading indextree's doc and skimming the code, I'm skeptical that hand-rolling this tree on bumpallo will see significant gains.
However, I did notice that the append we use in create_ast_node isn't specialized for a new node.

pub fn checked_append<T>(
        self,
        new_child: NodeId,
        arena: &mut Arena<T>,
    ) -> Result<(), NodeError> {
        if new_child == self {
            return Err(NodeError::AppendSelf);
        }
        if arena[self].is_removed() || arena[new_child].is_removed() {
            return Err(NodeError::Removed);
        }
        if self.ancestors(arena).any(|ancestor| new_child == ancestor) {
            return Err(NodeError::AppendAncestor);
        }
        new_child.detach(arena);
        insert_with_neighbors(arena, new_child, Some(self), arena[self].last_child, None)
            .expect("Should never fail: `new_child` is not `self` and they are not removed");

        Ok(())
    }

In our case, everything above insert_with_neighbors() is redundant work for us.
Even within insert_with_neighbors(), there's a detach_from_siblings() call which is redundant when we know the inserted node is new (and even beyond the "new node" use case - it's redundant because it was just performed in detach).

In your profiling screenshot detach_from_siblings is 50ms which would get shaved off.
Probably worth checking out some sort of an append_new_node fast path, where it's the user's responsibility to uphold assumptions.

TL;DR I think making trying to make changes in indextree makes more sense, at least for a first approach.
You can assign this issue to me.

@Boshen Boshen changed the title perf(semantic): replace indextree perf(semantic): indextree Mar 5, 2023
@Boshen
Copy link
Member Author

Boshen commented Mar 11, 2023

Done in #122.

@Boshen Boshen closed this as completed Mar 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category - New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants