Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to cache node names across transformers #184

Closed
stanhu opened this issue Jul 22, 2018 · 1 comment
Closed

Add option to cache node names across transformers #184

stanhu opened this issue Jul 22, 2018 · 1 comment

Comments

@stanhu
Copy link
Contributor

stanhu commented Jul 22, 2018

In 4.6.2, caa558a significantly cut down on the memory usage by downcasing the node name once.

However, someone reported in #177 reported that this broke certain transformers that modified the node name since the name was cached.

I'd suggest that it would still be nice to have a way to support that optimization for applications that don't change the node name. In my sample test, I see that 11 MB of RAM is being allocated, and with thousands of these documents being processed every second on GitLab, we are seeing slowdowns due to garbage collection.

Thoughts?

@rgrove
Copy link
Owner

rgrove commented Jul 24, 2018

Seems like a simple solution might be to reimplement the change from caa558a but compare node_name with node.name after each iteration and update it if the node name changes.

stanhu added a commit to stanhu/sanitize that referenced this issue Jul 24, 2018
In 4.6.2, rgrove#175 significantly cut down on memory usage by caching the lowercase
version of the node name. However, as reported in rgrove#177, this broke certain
transformers that modified the node name since the name was cached. We can
bring back this optimization by updating the node name only if it has been
changed.

Closes rgrove#184
stanhu added a commit to stanhu/sanitize that referenced this issue Jul 24, 2018
In 4.6.2, rgrove#175 significantly cut down on memory usage by caching the lowercase
version of the node name. However, as reported in rgrove#177, this broke certain
transformers that modified the node name since the name was cached. We can
bring back this optimization by updating the node name only if it has been
changed.

Closes rgrove#184
Repository owner locked and limited conversation to collaborators Aug 21, 2021
@rgrove rgrove closed this as completed Aug 21, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants