Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize arrays following a geometric progression. #94

Merged
merged 2 commits into from
Apr 29, 2017

Conversation

airbreather
Copy link
Contributor

@airbreather airbreather commented Apr 27, 2017

Previously, we would always add a fixed quantity of elements during each resize. This is nice when there are few elements, but it results in slower and slower resizes as more elements are added: every time we add O(1) elements, we have to copy O(N) elements, making a sequence of O(N) inserts require copying O(N * N) elements around.
Doubling every time a resize is needed, on the other hand, ensures that a sequence of O(N) inserts only requires us to copy O(N) elements around (amortized), at the expense of "wasting" space by an amount that's no greater than N / 2 elements compared to the prior approach. This "waste" is only temporary while building the graphs; it goes away after a serialize / deserialize round-trip.

^ those were copied from the commit message... other notes:

  • (EDIT): As I noted in the first comment below, the above is actually somewhat incorrect for arrays that are very large relative to an individual array block. I should amend the commit message. I'll leave it as-is until you've had a chance to look at it.
  • I haven't done performance comparisons on just this change yet. I'm also not able to run through Itinero.Test.Functional on this, because even with no changes on my end, it fails to resolve the point (49.5018, 6.06617).
  • I opted to make these all consistent: an "ensure that there's at least this much room in this array" method, which is always called with the maximum index plus 1. An alternative would be "ensure that the array is big enough to contain this index", which wouldn't make the callers have to do that "+1", it just didn't feel right to me.
  • I was very careful to make sure that the "Ensure" methods are really small. With the VS2017 compiler, the CIL for the larger one is 27 bytes long, and it doesn't do anything spooky, so the compiler should, usually inline the method, avoiding an actual function call in all cases except when it's time to actually resize. This seems meaningful enough to bring up, given how many times these methods get called and how I'm competing with the old version which always inlined these checks.
  • I also think UnsignedNodeIndex.SortAndConvertIndex does one more resize than it needs to, but I'm going to leave that one alone for now since its running time will be bounded by the sort anyway.

This took me a bit longer than I'd expected because apparently ArrayBase<T>.Resize must zero out the new slots when growing the array, so I had a lot of false failures on my unmanaged implementation.

I'll get a usable performance comparison later and post it in the comments.

@airbreather
Copy link
Contributor Author

Hmm... thinking about this over dinner, this probably wouldn't have a huge impact on performance by itself with MemoryArray<T> (or, glancing at the Array<T> implementation, for that either) just yet, because that implementation just resizes the array of arrays, up to one individual array that might have been used as a remainder, and then simply allocates new arrays up to the requested size.

So, this isn't really all that useful for performance today, at least not without the other change I'm looking at which uses one huge block. I'll start some performance tests with it anyway to see if it surprises me, but if it doesn't impress just yet, I'll probably just withdraw it for the time being.

@airbreather
Copy link
Contributor Author

So, it's anything but clear. I still think it might be worth doing, if only because I plan to push forward with this UnmanagedMemoryArray<T> idea later. That idea, as I'll try to show, requires this in order to avoid making things actually much worse.

At the end of this, there's some raw output from running Itinero.Test.Functional, with Program.cs replaced by this gist (planet-bits2.osm.pbf came from a BBBike extract, here is a MEGA link).

On runs with UnmanagedMemoryArray<T>, I used essentially this, with the only change being namespaces.

To summarize it, I'll go ahead and split up the two parts for all 4 runs, drop the best and worst time in each part+run, and average the other three. I've elected not to look at the "memory diff" reported, because PerformanceInfoConsumer is missing a GC.WaitForPendingFinalizers() call whenever it does a GC.Collect(), which I think means the UnmanagedMemoryArray<T> runs look worse than they could.

Run Load OSM Data <-- % of Baseline Add Contracted DB <-- % of Baseline
Baseline 57.71s 100.00% 295.58s 100.00%
Only This 32.34s 56.04% 310.81s 105.16%
Only UMA 112.19s 194.41% 429.16s 145.20%
Both 28.09s 48.67% 252.64s 85.47%

So what I'm seeing is that this PR penalizes the running time of building the contracted DB very slightly (but consistently), while making it much quicker to load OSM data... in this specific case, it's a +15s / -15s that cancel out, but I'm not immediately planning to invest more time into exploring what this will do to bigger requests. Further, without this PR, offering a replacement for MemoryArray<T> the way I plan to is a non-starter (but with this PR, that becomes a clear improvement for this test, and a much clearer improvement for the other tests I plan to do to show it off).

I honestly didn't expect this to have such a profound impact on the "Load OSM Data" case on its own, nor did I expect the other part of this to get notably worse, but both effects are definitely there, and it's consistent.

Baseline

[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 7:52 PM, spent 58.9594736s and 391.3164MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 7:53 PM, spent 299.2866269s and -372.1523MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 7:58 PM, spent 58.4134434s and 288.332MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 7:59 PM, spent 299.2475857s and -287.7695MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 8:03 PM, spent 58.0926326s and 358.457MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 8:04 PM, spent 296.5369843s and -358.5MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 8:09 PM, spent 56.6277299s and 355.1523MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 8:10 PM, spent 290.9411224s and -354.7578MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 8:15 PM, spent 56.0126441s and 244.8203MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 8:16 PM, spent 290.1613754s and -244.7578MB of memory diff.

Only This Change

[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 8:43 PM, spent 32.6107298s and 381.707MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 8:43 PM, spent 312.6048686s and -357.4141MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 8:49 PM, spent 32.4881125s and 233.8047MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 8:49 PM, spent 310.857198s and -233.3711MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 8:54 PM, spent 31.9726186s and 317.7188MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 8:55 PM, spent 311.1184482s and -317.7148MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 9:00 PM, spent 31.9310817s and 391.7656MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 9:01 PM, spent 310.4678277s and -391.4492MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 9:06 PM, spent 32.5706915s and 304.3516MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 9:06 PM, spent 310.4037635s and -303.9883MB of memory diff.

Only UnmanagedMemoryArray<T>

[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 11:40 PM, spent 108.7406394s and 164.5625MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 11:42 PM, spent 427.0179441s and 174.668MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 11:49 PM, spent 111.6549307s and -155.4766MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 11:51 PM, spent 428.4803441s and 197.8398MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 11:58 PM, spent 112.9096288s and -159.5312MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 12:00 AM, spent 429.7235346s and 197.7656MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 12:07 AM, spent 112.0162793s and -142.9844MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 12:09 AM, spent 429.2831088s and 181.2969MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 12:16 AM, spent 113.0732887s and -145.2227MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 12:18 AM, spent 430.0042971s and 183.5352MB of memory diff.

Both This Change and UnmanagedMemoryArray<T>

[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 9:52 PM, spent 28.3951926s and 219.8984MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 9:53 PM, spent 252.600411s and 226.3242MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 9:57 PM, spent 27.669498s and -123.9844MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 9:58 PM, spent 252.5638793s and 186.9141MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 10:02 PM, spent 27.9737891s and -93.9336MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 10:02 PM, spent 252.8066054s and 159.7109MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 10:06 PM, spent 28.46526s and -137.4453MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 10:07 PM, spent 252.5708789s and 200.3828MB of memory diff.
[Test] information - Loading OSM data:Started!
[Test] information - Loading OSM data:Ended at at 10:11 PM, spent 27.8917107s and -132.7617MB of memory diff.
[Test] information - Adding contracted db:Started!
[Test] information - Adding contracted db:Ended at at 10:12 PM, spent 252.7455461s and 198.4219MB of memory diff.

@airbreather
Copy link
Contributor Author

Of course, performance benefits aside, it might be considered an improvement solely because all of the "ensure that this array is large enough to hold this many elements" logic now goes through the same code...

@xivk
Copy link
Contributor

xivk commented Apr 28, 2017

This is impressive and definetly one of the best pullrequests this project ever had! 👍 🥇

I agree with your analysis and will accept the pull request. As it's against the master branch we need to take care of breaking changes. When looking at this I don't really see any breaking changes, if you agree with there being no breaking changes it's just a matter of bumping the version # and merging this.

The only issue may be that memory usage during loadosm may increase? And the tests, if I'm correct this is covered but not the new code specifically...

@airbreather
Copy link
Contributor Author

airbreather commented Apr 28, 2017

I wanted to make sure it would work against those 6 commits in master that haven't yet been merged to develop, since one of the unmerged commits touches one of the files I also touched here.

After those commits are merged from master to develop, I'll go ahead and re-submit it against develop, with a fixed commit message that doesn't mislead about the importance of this in isolation.

Memory is expected to increase somewhat, but it's only temporary: the extra consumption should go away after a Trim(), or worst-case after a serialize / deserialize.

edit: this is covered, but I'll add some tests for this specifically in a minute.

@xivk
Copy link
Contributor

xivk commented Apr 28, 2017

FYI, I just merged in the latest master changes to develop.

@airbreather airbreather changed the base branch from master to develop April 28, 2017 17:55
@airbreather
Copy link
Contributor Author

  1. Changed the commit message of the main one to better reflect where things stood without it.
  2. Added a unit test for the new extension method.
  3. Changed the target branch to develop.
  4. Added another commit with a change to the test project that Visual Studio 2017 seems to keep trying to add.

I had to do some history rewriting to make that first one work anyway, so I just rebased it at the same time to account for the changes you've done in the past few hours without needing an extra merge commit. This should make the future better, but you may have to fight with Git a bit to pull it down if you have the older version. Sorry about that, if it happens.

@airbreather
Copy link
Contributor Author

Dangitall... "60% faster" should have been "40% faster", I'm going to just do one more rewrite...

Previously, we would always add a fixed quantity of elements during each resize.  While today's ArrayBase<T> subclasses don't seem to have a huge problem with this, it limits the kinds of ArrayBase<T> subclasses that work with this.  For example, a sequence of inserts into an implementation that uses a single contiguous array in virtual memory (similar to List<T>) will perform O(N) copies of most elements, causing those inserts to take O(N * N) time to complete.  This is impractical if the arrays are managed by the CLR, but there are other approaches that are showing promising results in a lab setting.
Instead, we now double every time a resize is needed.  This with a new helper extension method that all previous "add a new thing" callers now use.  This ensures that for the aforementioned kinds of implementations, a sequence of O(N) inserts only requires us to copy O(N) elements around (amortized), at the expense of "wasting" space by an amount that's no greater than N / 2 elements compared to the prior approach.  This "waste" is only temporary while building the graphs; it goes away after a serialize / deserialize round-trip, or after a Trim() that some affected classes have.
Despite not being strictly necessary for making large operations fast in Itinero today (in fact, building a contracted graph is actually looking slightly but consistently *slower* for some data sets), there are some operations that are considerably faster: loading a significant, but not enormous, OSM data set seems to be 40% faster in some cases.
It appears to have something to do with identifying this as a test project.
@airbreather
Copy link
Contributor Author

OK, what's there now is "final" from my perspective. Sorry about the spam.

@xivk xivk merged commit 7baa815 into itinero:develop Apr 29, 2017
@airbreather airbreather deleted the perf-easy branch April 29, 2017 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants