-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resize arrays following a geometric progression. #94
Conversation
Hmm... thinking about this over dinner, this probably wouldn't have a huge impact on performance by itself with So, this isn't really all that useful for performance today, at least not without the other change I'm looking at which uses one huge block. I'll start some performance tests with it anyway to see if it surprises me, but if it doesn't impress just yet, I'll probably just withdraw it for the time being. |
So, it's anything but clear. I still think it might be worth doing, if only because I plan to push forward with this At the end of this, there's some raw output from running Itinero.Test.Functional, with Program.cs replaced by this gist (planet-bits2.osm.pbf came from a BBBike extract, here is a MEGA link). On runs with To summarize it, I'll go ahead and split up the two parts for all 4 runs, drop the best and worst time in each part+run, and average the other three. I've elected not to look at the "memory diff" reported, because
So what I'm seeing is that this PR penalizes the running time of building the contracted DB very slightly (but consistently), while making it much quicker to load OSM data... in this specific case, it's a +15s / -15s that cancel out, but I'm not immediately planning to invest more time into exploring what this will do to bigger requests. Further, without this PR, offering a replacement for I honestly didn't expect this to have such a profound impact on the "Load OSM Data" case on its own, nor did I expect the other part of this to get notably worse, but both effects are definitely there, and it's consistent. Baseline
Only This Change
Only
|
Of course, performance benefits aside, it might be considered an improvement solely because all of the "ensure that this array is large enough to hold this many elements" logic now goes through the same code... |
This is impressive and definetly one of the best pullrequests this project ever had! 👍 🥇 I agree with your analysis and will accept the pull request. As it's against the master branch we need to take care of breaking changes. When looking at this I don't really see any breaking changes, if you agree with there being no breaking changes it's just a matter of bumping the version # and merging this. The only issue may be that memory usage during loadosm may increase? And the tests, if I'm correct this is covered but not the new code specifically... |
I wanted to make sure it would work against those 6 commits in After those commits are merged from Memory is expected to increase somewhat, but it's only temporary: the extra consumption should go away after a edit: this is covered, but I'll add some tests for this specifically in a minute. |
FYI, I just merged in the latest master changes to develop. |
I had to do some history rewriting to make that first one work anyway, so I just rebased it at the same time to account for the changes you've done in the past few hours without needing an extra merge commit. This should make the future better, but you may have to fight with Git a bit to pull it down if you have the older version. Sorry about that, if it happens. |
Dangitall... "60% faster" should have been "40% faster", I'm going to just do one more rewrite... |
Previously, we would always add a fixed quantity of elements during each resize. While today's ArrayBase<T> subclasses don't seem to have a huge problem with this, it limits the kinds of ArrayBase<T> subclasses that work with this. For example, a sequence of inserts into an implementation that uses a single contiguous array in virtual memory (similar to List<T>) will perform O(N) copies of most elements, causing those inserts to take O(N * N) time to complete. This is impractical if the arrays are managed by the CLR, but there are other approaches that are showing promising results in a lab setting. Instead, we now double every time a resize is needed. This with a new helper extension method that all previous "add a new thing" callers now use. This ensures that for the aforementioned kinds of implementations, a sequence of O(N) inserts only requires us to copy O(N) elements around (amortized), at the expense of "wasting" space by an amount that's no greater than N / 2 elements compared to the prior approach. This "waste" is only temporary while building the graphs; it goes away after a serialize / deserialize round-trip, or after a Trim() that some affected classes have. Despite not being strictly necessary for making large operations fast in Itinero today (in fact, building a contracted graph is actually looking slightly but consistently *slower* for some data sets), there are some operations that are considerably faster: loading a significant, but not enormous, OSM data set seems to be 40% faster in some cases.
It appears to have something to do with identifying this as a test project.
OK, what's there now is "final" from my perspective. Sorry about the spam. |
Previously, we would always add a fixed quantity of elements during each resize. This is nice when there are few elements, but it results in slower and slower resizes as more elements are added: every time we add O(1) elements, we have to copy O(N) elements, making a sequence of O(N) inserts require copying O(N * N) elements around.
Doubling every time a resize is needed, on the other hand, ensures that a sequence of O(N) inserts only requires us to copy O(N) elements around (amortized), at the expense of "wasting" space by an amount that's no greater than N / 2 elements compared to the prior approach. This "waste" is only temporary while building the graphs; it goes away after a serialize / deserialize round-trip.
^ those were copied from the commit message... other notes:
UnsignedNodeIndex.SortAndConvertIndex
does one more resize than it needs to, but I'm going to leave that one alone for now since its running time will be bounded by the sort anyway.This took me a bit longer than I'd expected because apparently
ArrayBase<T>.Resize
must zero out the new slots when growing the array, so I had a lot of false failures on my unmanaged implementation.I'll get a usable performance comparison later and post it in the comments.