-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster xml load #839
Faster xml load #839
Conversation
…rt rather than waiting for it to be requested to be discovered as users will get confused and likely create a new one if none is given
…_getName more efficient
… major CPU bottleneck
…nto fasterXMLLoad
@mesmith75 I would like to have this considered for 6.3 (if not 6.4 feels OK). This addresses a specific problem which LHCb users are still afflicted with. This PR addresses the XML problem by reducing the need to fully construct objects which don't need to be fully qualified and reduces the number of I/O ops for the writing of the XML to disk by using a String buffer which allows for the data to be written in a single bust after being constructed. Testing with a dummy Job containing 10k DiracFile in the inputfile field: Without this PR: 70sec + 37sec = 107sec ~30% performance improvement in the basic test and I claim some general speed improvements in casual use using this PR. Code used for testing: Construction:
restart ganga with
|
@rob-c I agree this looks nice to have in for 6.3. Just let me give it a quick try and then we can merge it. |
Ok this is good. If there are no objections I'll merge this and start the release. |
This PR contains:
Some minor test fixes (obviously bad code if the test was run interactively!)
Minor speed improvements (Pieces of code which turn up in benchmarks way too much)
Fix for slow reading in of XML files
consists of adding a new class factory method to
GangaObject
which makes correct use of the__new__
method as well as fully initializing the created class instance without populating the internal dictionary with the default values from the schema. This is useful in situations where this expensive operation is a waste as the code intends to replace these objects in the next step.This factory is not a good solution to initializing a normal class for normal use but for the specific corner cases such as a deep copy or constructing a class for populating from XML this offers a real demonstrable performance boost.
This has the advantage that it doesn't require fixing
__init__
to be correct all across the codebase. It works as expected. It doesn't require special types to get it's manipulations done and it doesn't add significant overhead into theGangaObject
orMetaClass
objects.