fix(algorithm): correct bfs to not abort on previously visited node #822
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Improve performance and fix flaw in BFS algorithm
Rationale
I was testing a complex Git Flow repository and found that if the breadth first search reached a previously visited node even if it didn't find the version it was looking for it would abort with None. This was incorrect as the version it was looking for was deeper on the tree. This appears when you a have a branch that has more than 1 commit on it and has merged back into a base branch. The other branch also must have more commits on it than the branch prior to the version it is looking for. The graph looks like this:
During a breadth first search looking for
v1.0.0
, a textbook BFS algorithm will visit the nodes in the following order:Prior to this PR, the PSR's BFS algorithm would attempt to visit nodes in the order:
Since 3 has already been visited, the algorithm returns
None
and the queue is technically not empty as when inspecting 1, it addedv1.0.0
to the queue.Furthermore, when reviewing online references, the BFS algorithm best matches to the use of a queue data structure which is what we use, however, recursion is not recommended due to the performance hit of each consecutive function call and the finite size of a process call stack. As projects reach long term duration and sustainment, the number of git commits will increase and approach the maximum stack size call.
How I tested
I created a unit test that mocks the Git Commits & tags of a tree that matches the above diagram. When you run this unit-test on the base branch, you will see that it throws a None even though the tag commit exists. After this PR's changes the test succeeds since it does not try to revisit a commit it already visited.
How to verify
And of course, all tests work as with the CI results below.