New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose collect_ids for HTMLParser as well #216

Merged
merged 1 commit into from Dec 10, 2016

Conversation

Projects
None yet
2 participants
@plq
Contributor

plq commented Dec 4, 2016

It's said to lower the memory footprint of the parser at the expense of increased CPU load. Some use cases might care more about the amount memory consumed than CPU time, so it might make sense to expose this parameter.

src/lxml/parser.pxi
@@ -1615,6 +1615,8 @@ cdef class HTMLParser(_FeedParser):
- strip_cdata - replace CDATA sections by normal text content (default: True)
- compact - save memory for short text content (default: True)
- default_doctype - add a default doctype even if it is not found in the HTML (default: True)
+ - collect_ids - Set this to False to avoid ever-growing id cache in
+ long-running processes (at a bit of a performance cost)

This comment has been minimized.

@scoder

scoder Dec 6, 2016

Member

That reads a bit biased. The downside is that you'd loose fast ID lookups, that should be mentioned instead.

@scoder

scoder Dec 6, 2016

Member

That reads a bit biased. The downside is that you'd loose fast ID lookups, that should be mentioned instead.

@plq

This comment has been minimized.

Show comment
Hide comment
@plq

plq Dec 8, 2016

Contributor

did some rewording. you can chage it to your heart's content after your merge it as well, this is a fairly simple patch.

Contributor

plq commented Dec 8, 2016

did some rewording. you can chage it to your heart's content after your merge it as well, this is a fairly simple patch.

@scoder scoder merged commit 7b175fc into lxml:master Dec 10, 2016

@scoder

This comment has been minimized.

Show comment
Hide comment
@scoder

scoder Dec 10, 2016

Member

Hmm, did you actually test if this has an impact? The code says that it shouldn't.

Member

scoder commented Dec 10, 2016

Hmm, did you actually test if this has an impact? The code says that it shouldn't.

@plq

This comment has been minimized.

Show comment
Hide comment
@plq

plq Dec 15, 2016

Contributor

Well I'm trying to troubleshoot a memory leak myself and having close_ids=false in my daemon with this patch doesn't seem to have an impact.

Contributor

plq commented Dec 15, 2016

Well I'm trying to troubleshoot a memory leak myself and having close_ids=false in my daemon with this patch doesn't seem to have an impact.

@plq

This comment has been minimized.

Show comment
Hide comment
@plq

plq Dec 15, 2016

Contributor

More specifically, the leak is still there and having this flag doesn't seem to have slowed down the rate of the leak

Contributor

plq commented Dec 15, 2016

More specifically, the leak is still there and having this flag doesn't seem to have slowed down the rate of the leak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment