New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate GuessZoneIdByTransitionsUncached performance #686

Closed
jskeet opened this Issue Mar 9, 2017 · 6 comments

Comments

Projects
None yet
1 participant
@jskeet
Member

jskeet commented Mar 9, 2017

This one parameterized test takes ~30 seconds at the moment, which seems way too long for running only ~132 test cases.

To investigate:

  • See whether this is a regression from 1.3
  • See whether most of that time is in DateTimeZone

@jskeet jskeet self-assigned this Mar 9, 2017

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Mar 10, 2017

Member

Observations:

  • This is about twice as fast in release as in debug (at least in net451)
  • It's not just guessing the transitions - it's then checking for correctness for every day over a 2 year period for the resulting time zone. But commenting out that part of the test doesn't change anything
  • We're doing a lot of work each time the method is called, that isn't actually specific to the time zone being checked.
  • There may be a regression due to needing to call Instant.ToDateTimeUtc once per candidate, per instant being checked - which is a 593x316 matrix
  • If we need a hit score of at least 70%, we could reject zones as soon as they hit a particular failure rate

Basically, there's a lot we could optimize here. Will look at how reasonable it is to do so.

Member

jskeet commented Mar 10, 2017

Observations:

  • This is about twice as fast in release as in debug (at least in net451)
  • It's not just guessing the transitions - it's then checking for correctness for every day over a 2 year period for the resulting time zone. But commenting out that part of the test doesn't change anything
  • We're doing a lot of work each time the method is called, that isn't actually specific to the time zone being checked.
  • There may be a regression due to needing to call Instant.ToDateTimeUtc once per candidate, per instant being checked - which is a 593x316 matrix
  • If we need a hit score of at least 70%, we could reject zones as soon as they hit a particular failure rate

Basically, there's a lot we could optimize here. Will look at how reasonable it is to do so.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Mar 10, 2017

Member

Cost of adding in five extra calls within the inner Count loop, in release mode (which normally takes about 11 seconds):

  • candidate.GetUtcOffset(instant) - about 0.8s
  • instant.ToDateTimeUtc() - about 0.9s
  • zone.GetUtcOffset(...) - about 15s (!!!)

Further observation - we're making the expensive TimeZoneInfo repeatedly for the same instants, over and over again. If we remove that redundancy, that should be the bulk of the call. Additionally, that does change with each call, so there's not a lot of point in caching things, probably.

Member

jskeet commented Mar 10, 2017

Cost of adding in five extra calls within the inner Count loop, in release mode (which normally takes about 11 seconds):

  • candidate.GetUtcOffset(instant) - about 0.8s
  • instant.ToDateTimeUtc() - about 0.9s
  • zone.GetUtcOffset(...) - about 15s (!!!)

Further observation - we're making the expensive TimeZoneInfo repeatedly for the same instants, over and over again. If we remove that redundancy, that should be the bulk of the call. Additionally, that does change with each call, so there's not a lot of point in caching things, probably.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Mar 10, 2017

Member

Okay, not quite as big a benefit as expected: reduced from 11s to 7s. Great, but we can do better.
Removing the candidate.GetUtcOffset call reduces the time by another ~2s, which makes little sense to me given the above comment. Hmm.

Member

jskeet commented Mar 10, 2017

Okay, not quite as big a benefit as expected: reduced from 11s to 7s. Great, but we can do better.
Removing the candidate.GetUtcOffset call reduces the time by another ~2s, which makes little sense to me given the above comment. Hmm.

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Mar 10, 2017

Member

(Calling DateTimeZone.GetUtcOffset multiple times will hit the cache of course... but we should be hitting the cache anyway due to the test being across all BCL zones. Maybe it's a processor cache difference?)

Now adding code to give up as soon as we know we should...

Member

jskeet commented Mar 10, 2017

(Calling DateTimeZone.GetUtcOffset multiple times will hit the cache of course... but we should be hitting the cache anyway due to the test being across all BCL zones. Maybe it's a processor cache difference?)

Now adding code to give up as soon as we know we should...

@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Mar 10, 2017

Member

Ha - fresh observation... this line:

var candidates = CanonicalIdMap.Values.Select(ForId).ToList()

... takes about half the time of the test! That should be somewhat expected, given that a time zone source doesn't implement caching.

Next observation: this method is only called in non-test code via GetSystemDefault, which goes via a cache. So unless the user changes local time zone frequently, we're only going to get to this code once... so the only benefit from any further optimization would be to speed up the tests. I don't think we need to go there.

Final result: let's avoid being completely gratuitous with DateTimeZone, and early-out when we've seen more failures than we want, but avoid adding even more caching.

Member

jskeet commented Mar 10, 2017

Ha - fresh observation... this line:

var candidates = CanonicalIdMap.Values.Select(ForId).ToList()

... takes about half the time of the test! That should be somewhat expected, given that a time zone source doesn't implement caching.

Next observation: this method is only called in non-test code via GetSystemDefault, which goes via a cache. So unless the user changes local time zone frequently, we're only going to get to this code once... so the only benefit from any further optimization would be to speed up the tests. I don't think we need to go there.

Final result: let's avoid being completely gratuitous with DateTimeZone, and early-out when we've seen more failures than we want, but avoid adding even more caching.

jskeet added a commit to jskeet/nodatime that referenced this issue Mar 10, 2017

Improve performance of TzdbDateTimeZoneSource.GuessZoneIdByTransition…
…sUncached

This is primarily to speed up tests, but it's a safe and useful
optimization anyway.

Fixes #686.

jskeet added a commit to jskeet/nodatime that referenced this issue Mar 10, 2017

Improve performance of TzdbDateTimeZoneSource.GuessZoneIdByTransition…
…sUncached

This is primarily to speed up tests, but it's a safe and useful
optimization anyway.

Fixes #686.

jskeet added a commit to jskeet/nodatime that referenced this issue Mar 10, 2017

Improve performance of TzdbDateTimeZoneSource.GuessZoneIdByTransition…
…sUncached

This is primarily to speed up tests, but it's a safe and useful
optimization anyway.

Fixes #686.

@jskeet jskeet closed this in #687 Mar 12, 2017

jskeet added a commit that referenced this issue Mar 12, 2017

Improve performance of TzdbDateTimeZoneSource.GuessZoneIdByTransition…
…sUncached

This is primarily to speed up tests, but it's a safe and useful
optimization anyway.

Fixes #686.
@jskeet

This comment has been minimized.

Show comment
Hide comment
@jskeet

jskeet Aug 20, 2017

Member

Just worked out how to improve this massively. Much of the problem is that we reload the time zones from raw data on each iteration. If we move this line of code:

var candidates = CanonicalIdMap.Values.Select(ForId).ToList();

to GuessZoneIdByTransitions (which provides caching), and pass it into GuessZoneIdByTransitionsUncached, then the tests can pass in the same list for each BCL zone.

Not only does that save us the time of loading, but it means that after the first test, the caches will all be populated.

Member

jskeet commented Aug 20, 2017

Just worked out how to improve this massively. Much of the problem is that we reload the time zones from raw data on each iteration. If we move this line of code:

var candidates = CanonicalIdMap.Values.Select(ForId).ToList();

to GuessZoneIdByTransitions (which provides caching), and pass it into GuessZoneIdByTransitionsUncached, then the tests can pass in the same list for each BCL zone.

Not only does that save us the time of loading, but it means that after the first test, the caches will all be populated.

@jskeet jskeet reopened this Aug 20, 2017

jskeet added a commit to jskeet/nodatime that referenced this issue Aug 20, 2017

jskeet added a commit to jskeet/nodatime that referenced this issue Aug 20, 2017

@jskeet jskeet closed this in #953 Aug 20, 2017

jskeet added a commit that referenced this issue Aug 20, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment