New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ElementTree to parse JUnit XML files because it is much faster than minidom #4693

Merged
merged 1 commit into from Jun 23, 2017

Conversation

Projects
None yet
5 participants
@cheister
Copy link
Contributor

cheister commented Jun 23, 2017

Problem

The new stat collecting code in 1.3+ #4561 uses pants.util.xml_parser which uses minidom to parse XML files. minidom loads the whole xml file into memory so when the xml files are large it can take minutes to load individual xml files.

Solution

Change testrunner_task_mixin.py to use ElementTree for parsing XML instead of pants.util.xml_parser. This is much faster for parsing large xml files.

Result

Without this change we had tests generating xml files of up to 40MB which were taking an additional 3 minutes to run for each test and effectively timing out in the build system. The change to ElementTree got the XML parsing down to sub-second.

This change is the minimal change that will get our tests working again. It would be nice to consolidate the JUnit XML parsing that is done in src/python/pants/backend/jvm/tasks/reports/junit_html_report.py with similar code that is in src/python/pants/task/testrunner_task_mixin.py.

It would also be good to have an option to turn off stats gathering in case it affects performance in the future.

cc: @dotordogh

@cheister cheister requested review from stuhood and benjyw Jun 23, 2017

test_info[attribute] = testcase.attrib.get(attribute)

result = SUCCESS
for _ in testcase.iter('skipped'):

This comment has been minimized.

@cheister

cheister Jun 23, 2017

Contributor

I tried using if next(testcase.iter('skipped'): here instead but it didn't work. Is there a nicer way to check if the iter has any elements?

This comment has been minimized.

@jsirois

jsirois Jun 23, 2017

Member

I assume it didn't work because it raised, send a default to avoid that:

if next(testcase.iter('skipped'), None) is not None:
@dotordogh

This comment has been minimized.

Copy link
Contributor

dotordogh commented Jun 23, 2017

This is great! I really like how it improves performance. I definitely agree that we should 1) add an option to turn off different segments of stats reporting (test data, target data, and any other data that we may be reporting in the future); 2) use this approach in for all xml parsing.

@cheister cheister force-pushed the cheister:faster-junit-xml-parsing branch from 35e224b to 20f4c5c Jun 23, 2017

@jsirois
Copy link
Member

jsirois left a comment

This change is the minimal change that will get our tests working again. It would be nice to consolidate the JUnit XML parsing that is done in src/python/pants/backend/jvm/tasks/reports/junit_html_report.py with similar code that is in src/python/pants/task/testrunner_task_mixin.py. It would also be good to have an option to turn off stats gathering in case it affects performance in the future.

Filing issues against these 2 ideas would be good.

@cheister cheister force-pushed the cheister:faster-junit-xml-parsing branch from 20f4c5c to 7ea2f6b Jun 23, 2017

@stuhood stuhood added this to the 1.3.1 milestone Jun 23, 2017

@stuhood
Copy link
Member

stuhood left a comment

Thanks!

@cheister cheister merged commit d8303cb into pantsbuild:master Jun 23, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@cheister cheister deleted the cheister:faster-junit-xml-parsing branch Jun 23, 2017

@benjyw
Copy link
Contributor

benjyw left a comment

LGTM

stuhood added a commit that referenced this pull request Jun 27, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment