Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up XML loading #121

Merged
merged 11 commits into from
Jan 18, 2016
Merged

Speed up XML loading #121

merged 11 commits into from
Jan 18, 2016

Conversation

milliams
Copy link
Contributor

I spent some time getting the XML loading working faster. On my benchmark of:

import cProfile
from Ganga.Core.GangaRepository.VStreamer import from_file, to_file

j = Job()
j.inputfiles = [LocalFile(compressed=True)]*1000
j.outputfiles = [LCGSEFile()]*1000

cProfile.run('to_file(j, open("test.xml", "w"))', 'write_profile')
cProfile.run('j = from_file(open("test.xml"))', 'read_profile')

assert j[0].inputfiles[100].compressed == True

import pstats
p = pstats.Stats('read_profile')
p.strip_dirs().sort_stats('time').print_stats(20)

I was finding load times of 7 seconds on latest develop. After these changes I'm now seeing it between 4 and 4.5 seconds.

The bulk of the changes in here are removing redundant checks in Objects.py for attributes which always exist (or at least do after making the objects be created correctly). It also changes some isType for isinstance (that alone gives almost a second speed-up).

Most importantly is that his PR adds 52 lines of code but removes 154.

I've run my local tests over this and have seen no regressions but I'd welcome a second and third pair of eyes.

It's possible that this clashes with #117 and #119 but since this patch is much smaller it shouldn't be a problem.

@rob-c
Copy link
Member

rob-c commented Jan 18, 2016

I think given that this is a smaller patch I'm happy to merge this ahead of #117 and look at the fallout there.

I am slightly concerned about the stripProxy possibly having a performance problem as I have highlighted.
We also had older code used to set _index_cache and _data to None (or even delete them) at times but if this code has gone I see no reason not to remove the checks around these objects.

@rob-c rob-c added this to the 6.1.15 milestone Jan 18, 2016
@rob-c
Copy link
Member

rob-c commented Jan 18, 2016

Just out of curiosity @milliams have you run any integration tests on this branch?

@milliams
Copy link
Contributor Author

If we merge #122 then I will merge those changes into this branch and see how they do. I haven't given it a completely thorough test yet, no. I will certainly not be merging until the tests have completed.

@milliams
Copy link
Contributor Author

All the currently converted integration tests work correctly as well as all the unit tests.

@rob-c
Copy link
Member

rob-c commented Jan 18, 2016

I'm happy to merge this in that case :)

@milliams
Copy link
Contributor Author

Ok, great. I will go ahead and merge shortly. I suggest merging develop into your branches and pushing the changes to GitHub so that the integration tests can run.

@rob-c
Copy link
Member

rob-c commented Jan 18, 2016

I'm playing with that now, there's at least one thing has cropped up so I will likely need some time to get #117 into a sensible state

milliams added a commit that referenced this pull request Jan 18, 2016
@milliams milliams merged commit 89b67f0 into develop Jan 18, 2016
@milliams milliams deleted the feature/speed_up_xml_loading branch January 18, 2016 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants