New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linear implementation for Fulkerson-Chen-Anstee #2537
Conversation
d4fcc86
to
7b46439
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2537 +/- ##
=======================================
Coverage 84.37% 84.38%
=======================================
Files 376 376
Lines 61633 61691 +58
Branches 12060 12074 +14
=======================================
+ Hits 52004 52059 +55
- Misses 9629 9632 +3
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, nothing conclusive yet. I've talked to @szhorvat and he will also want to chime in and review later.
I edited the theorem into the top post so it's easier to follow your description, and what the LHS and RHS refer to. |
I added some rudimentary benchmarks. EDIT: First version of the benchmark was buggy, now fixed the results below. Before this PR:
After this PR:
|
It seems that at about 1000 vertices the old implementation is still doing better in some cases (PA) despite the bad complexity. This might be because there's a lot of memory allocation when using a list of vectors, and memory allocation is expensive. Although the system time (3rd column) doesn't increase significantly. Let's discuss next week what might be going on. |
@Tagl I refactored the counting sort to use a single array instead of an array of arrays. Can you please have a look to make sure I didn't break anything? Efficiency is greatly improved now:
|
@ntamas I think we're almost good to go with this. Any other comments? I ended up not using the single exit point you suggested because it makes it difficult to de-allocate memory early when we don't need it. |
In principle a further optimization is possible since we already compute the cumulative in-degree sums, which are the LHS, during the counting sort, except that this code does it in the wrong order (from smallest to largest instead of largest to smallest, which the This should not block merging though, as it's not even clear if it improves performance. For the final version and explanation of the algorithm we should sort it out though. |
Looks good to me. The benchmark numbers are also much more in line with the measurements of my C++ implementation. |
Closes #1452
The current implementation of the Fulkerson-Chen-Anstee simple directed graphicality check is quadratic.
This is my own approach to the linear time version of Fulkerson-Chen-Anstee.
I (with @szhorvat) plan to do a write-up for this approach.
After implementing the method I read section 5.1 in:
H. Kim, C. I. Del Genio, K. E. Bassler, and Z. Toroczkai, New J. Phys. 14, 023012 (2012). https://dx.doi.org/10.1088/1367-2630/14/2/023012
The methods follow the same general approach.
The key difference is they use recurrence relations to prove that the method works.
I leave here an explanation of how my method works and as in the other methods, the first step is sorting.
The theorem (screenshotted from Wikipedia):
The left hand side is easy to compute in linear time for each value of k.
On the right hand side we have two sums of outdegrees.$< k-1$ or $>= k-1$ .$< k$ or $>= k$ .$1$ to $n$ , and therefore they will not individually be impacted by future changes.
For the first sum we separate outdegrees based on whether they are
Similarly, for the second sum we separate outdegrees based on whether they are
In both cases we have a value which we can think of as the splitting point of a sum.
We only need to track the sum of outdegrees below the splitting point, because the value of k is incrementing from
The outdegrees that are equal to or above the splitting point must be tracked, and we do so with an ordered multiset.$k$ increases, either add/remove the outdegree at index k-1 to/from the sum or ordered set, depending on which side of the splitting point it lies.$k-1$ or $k$ , depending on which sum we are considering, multiplied by the size of the ordered multiset.
The ordered set for the first sum is initially empty and the one for the second sum initially contains all of the outdegrees.
As
We may also need to remove the smallest outdegrees from the ordered sets and move them over to the sums of outdegrees below the splitting point.
The contribution of the values in the ordered multiset to the right hand side is the value of
Using a balanced binary search tree for the ordered set means adding and removing from the set takes$\mathcal{O}(\log n)$ time.
Then the time complexity will match that of the sort operation.
We can push this a little further.$\mathcal{O}(n)$ .$\mathcal{O}(1)$ and therefore the time complexity of the algorithm is $\mathcal{O}(n)$ in total.$1$ to $2,000,000$ and there is a noticeable difference in running time between the linear and the linearithmic approaches.
Since the values in the set are bounded by n, we can switch the quicksort out for a bucket sort which is
Additionally, it allows us to switch out a balanced binary search tree for an array of integers, where the index in to the array represents the degree and the value at that index represents how many times that degree appears in the ordered multiset.
All operations within the main loop are
I have not done any benchmarking yet, but I have tested an equivalent C++ implementation on sequences of size