Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MiniDom by lxml #118

Closed
wants to merge 1 commit into from

Conversation

@TiloW
Copy link

TiloW commented Oct 20, 2015

This targets #1 and reduces peak memory usage by 83% on my instance.

Plain Text HTML MiniDom (current) lxml (proposed)
Peak Memory [KB] 94280 97076 2505896 450224
Runtime 2.70 2.01 3.74 2.78

Tests were failing before making any changes so I did not bother adjusting them. If this is an error on my side please let me know.
The most common failure seems to be XML testfile does not match the baseline.

@TiloW TiloW force-pushed the TiloW:lxml branch from 85e7188 to b3ad0b1 Oct 20, 2015
@whart222

This comment has been minimized.

Copy link
Member

whart222 commented Aug 6, 2016

This is a very cool idea. However, I want to make the dependency on lxml optional. I think I'm going to defer this until after the next release. At that point, I'm probably going to rework the master branch using ideas developed on the 'dev' branch.

@enunes

This comment has been minimized.

Copy link

enunes commented Oct 19, 2016

Hey, just an FYI, I have tested this patch and it has worked perfectly for me. Using gcovr with the linux kernel, without this patch it was eating up about 5GB of memory in my machine, and with this patch it has gone down to about 1GB, which is much more reasonable. Thanks!

@myint

This comment has been minimized.

Copy link
Contributor

myint commented Feb 10, 2018

I've seen MemoryError exceptions from gcovr so this is of interest to me. @whart222 Are you talking about replacing from lxml import etree with something like the fallback block in http://lxml.de/tutorial.html?

@latk

This comment has been minimized.

Copy link
Member

latk commented Feb 10, 2018

I understand that memory use right now is problematic, but I won't merge this PR anytime soon. After gcovr has been modularized (like experimented with in the dev branch), I'm open to reworking how XML is generated. But at that point the code will have changed substantially so that the proposed change cannot be merged directly.

I'm slightly hesitant to add a hard dependency on lxml. Not everyone will be using this functionality.

  • If lxml can be used with an ElementTree fallback, that is good.
  • But do we even need DOM operations? As we are only generating XML and never reading it, we could also use a template, or write directly to the output file. That should be more efficient than any DOM implementation.

I will revisit this issue in the future, and keep this PR open until then as a reminder.

@latk

This comment has been minimized.

Copy link
Member

latk commented Feb 19, 2018

The gcovr program is now split up into separate modules, so I am looking at this issue again. In order to apply this change, we need to

  • port it to the new project layout. The XML format is now in the gcovr.cobertura_xml_generator module.
  • make sure that there are no regressions, as the XML code has changed a bit since this PR was opened.
  • I no longer consider a hard dependency on lxml to be a blocker because pre-built binaries are available on all common platforms.

@TiloW if you want to look into this that would be great, otherwise I'd also be glad if anyone else re-submits the PR with the above issues addressed. Please let me know if there are any problems, especially regarding the tests on Travis/Appveyor.

@latk latk added the help wanted label Mar 5, 2018
latk added a commit to latk/gcovr that referenced this pull request May 25, 2019
In gcovr#118, TiloW took a stab at this issue.
@latk latk closed this in a5dda51 May 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.