Faster xml load #839

rob-c · 2016-10-25T10:09:30Z

This PR contains:

Some minor test fixes (obviously bad code if the test was run interactively!)
Minor speed improvements (Pieces of code which turn up in benchmarks way too much)
Fix for slow reading in of XML files
consists of adding a new class factory method to GangaObject which makes correct use of the __new__ method as well as fully initializing the created class instance without populating the internal dictionary with the default values from the schema. This is useful in situations where this expensive operation is a waste as the code intends to replace these objects in the next step.
This factory is not a good solution to initializing a normal class for normal use but for the specific corner cases such as a deep copy or constructing a class for populating from XML this offers a real demonstrable performance boost.

This has the advantage that it doesn't require fixing __init__ to be correct all across the codebase. It works as expected. It doesn't require special types to get it's manipulations done and it doesn't add significant overhead into the GangaObject or MetaClass objects.

…rt rather than waiting for it to be requested to be discovered as users will get confused and likely create a new one if none is given

…tion

…sses

…_getName more efficient

… major CPU bottleneck

…nto fasterXMLLoad

…tance

rob-c · 2016-11-24T20:38:00Z

@mesmith75 I would like to have this considered for 6.3 (if not 6.4 feels OK).

This addresses a specific problem which LHCb users are still afflicted with.
When a user has a large inputdata set (10k+ files) writing this to disk and reading it in is a very expensive OP.

This PR addresses the XML problem by reducing the need to fully construct objects which don't need to be fully qualified and reduces the number of I/O ops for the writing of the XML to disk by using a String buffer which allows for the data to be written in a single bust after being constructed.

Testing with a dummy Job containing 10k DiracFile in the inputfile field:

Without this PR: 70sec + 37sec = 107sec
With this PR: 54sec + 17sec = 71sec

~30% performance improvement in the basic test and I claim some general speed improvements in casual use using this PR.

Code used for testing:

Construction:

%time j=Job(inputfiles=[DiracFile(str(_)+'.root') for _ in range(10000)])

restart ganga with --no-mon

%time print jobs[-1].backend

mesmith75 · 2016-11-24T22:25:35Z

@rob-c I agree this looks nice to have in for 6.3. Just let me give it a quick try and then we can merge it.

mesmith75 · 2016-12-01T16:32:17Z

Ok this is good. If there are no objections I'll merge this and start the release.

rob-c added 30 commits October 6, 2016 11:12

Moving Interfaces and cleaning up files

6300856

Updating test paths

3037097

Fixing minor bug in Config setup exposed in these tests

3bd21e1

Adding fixes for LHCb, adding DiracProxy tests and fixing bugs found

bdb1bd6

Disabling adding lhcb_user to default LHCb proxies

cf8b538

Merging latest 6.3 into branch

9ec7746

Expanding the test and string rep of the Credential

7177776

Adding auto-detection of the LHCbDirac and Dirac proxies on ganga sta…

3a02746

…rt rather than waiting for it to be requested to be discovered as users will get confused and likely create a new one if none is given

Adding gridProxy for LHCb users to ease migration to 6.3

eb46d7d

Adding config before the test starts

5720b56

Adding missing import

959e8f2

Fixing minor bug in Shell

27c7206

Refactoring code to make it clearer what is going on under what situa…

1104eb5

…tion

Fixing minor bug in SessionLock

fdf54ea

Adding retry_command and extra docs

09e1202

Changing default for credential_requirements

45fd70d

Adding cred requirements to BookKeeping

cd87cc1

Fixing minor bugs in DiracFile put

da24664

Minor fixes and cleaning up Dirac splitter

a72ec05

Small changes and performance improvements

0026b3f

SMall performance fix

634316d

Adding extra docs

d98fb4e

Small performance improvements

ec1ffb5

Fixing minor bug

80d1028

Simplifying AFS vs non-AFS global lock due to errors

c994bc9

Minor fixes for tests

d115eda

Removing copyTo from GPI with great relief

ac4b181

Forcing check of AFS file

a6ee58c

Updating AfsToken. Adding extra docs and refactoring globals into cla…

fb2bd53

…sses

Fixing test to reflect copyTo is private

7fa3e1b

rob-c added 8 commits October 21, 2016 18:03

Updating test

73c26e2

Adding credential requirement for SplitByFiles

7992714

Minor fixes

e219ebb

Removing some more bottlenecks

e0fab55

Reducing calls to potentially expensive _setParent as well as making …

19375fa

…_getName more efficient

Adding more checks and fixing _getName

41ddf94

Minor optimisations

46bd12f

Adding logic to speed up complex XML loading dramatically by removing…

bd03c73

… major CPU bottleneck

rob-c added bug Core labels Oct 25, 2016

rob-c self-assigned this Oct 25, 2016

rob-c changed the base branch from 6.3-features to develop November 18, 2016 19:25

rob-c added 9 commits November 23, 2016 11:13

Merge develop

5e60d03

Moving to new factory which works

d3e6e06

Cleaning up and changing to use higher, more correct __init__

dbae7ec

typo

fcf3ca2

Merge branch 'develop' into fasterXMLLoad

1bd440e

Ensuring pass throgh inheritance, fixing test

9364e1e

Merge branch 'fasterXMLLoad' of https://github.com/ganga-devs/ganga i…

a3b339b

…nto fasterXMLLoad

Simplifying some expensive calls in profiling

d23a789

Adding docs and moving to correctly use a class factory for a new ins…

6ee15ec

…tance

Merge branch 'develop' into fasterXMLLoad

f14030c

mesmith75 merged commit f0acce4 into develop Dec 1, 2016

mesmith75 deleted the fasterXMLLoad branch December 1, 2016 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster xml load #839

Faster xml load #839

rob-c commented Oct 25, 2016 •

edited

rob-c commented Nov 24, 2016

mesmith75 commented Nov 24, 2016

mesmith75 commented Dec 1, 2016

Faster xml load #839

Faster xml load #839

Conversation

rob-c commented Oct 25, 2016 • edited

rob-c commented Nov 24, 2016

mesmith75 commented Nov 24, 2016

mesmith75 commented Dec 1, 2016

rob-c commented Oct 25, 2016 •

edited