Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Investigate performance slow down #97
As discussed here u-turn avoidance should cause at most a 4x slowdown (at most double number of candidates means at most 4 times number of transitions). But since some candidates are tower nodes and query results are deduped (#91) the slowdown should be less than 4x. Actually, @kodonnell only measured a 2.5x slowdown after u-turn avoidance was implemented (see first link), which was even before #91.
Oh, ugly. Thanks a lot @michaz - this got lost in issues :( !
Should we make initial collection size configurable in GH core or should we use a better heuristic like small initial size for small beeline distance? Also Dijstra and A* should have a different initialization, plus the different initialization size for bidirectional vs. unidirectional looks unsynched (Dijkstra 2K vs. 150K for bidir Dijkstra).
Probably the best would be to introduce a new variable in AlgorithmOptions and make it configurable, like "expected minimum nodes" or something.
I think in library code one should not add a special assumption, an overridden method, or even a configuration parameter, anywhere, for a 10% speed-up. We would probably overfit it to the particular cases we are looking at, plus we make the code bad by increasing its surface for errors, and by diverting attention from what it actually does.
Except maybe the authors of the JVM, because when they are successful, half the world gets 10% faster, so the investment in maintenance and testing and code-ugliness pays off.
Look at the formula for the initial collection size (I had to set a breakpoint to understand where the value actually comes from, because it goes back and forth so often between super and subclasses):
It is almost evil, because the typical graph size is "the world", or "a country", except in toy scenarios and test cases. So the result of the formula is almost always 150000, except in toy scenarios and test cases, so you never detect the problem created by 150000 with your toy scenarios and test cases, so it stays undetected. And the formula looks clever so nobody suspects anything, whereas if you had just written
We'll probably meet in the middle, but you were asking: I would remove all of it.
Or, if necessary, introduce a "hint" (as you say), but the default should always be the default behavior of the collection classes.
Also, lets remove the subclassing of the collection classes (