fix(algorithm): correct bfs to not abort on previously visited node #822

codejedi365 · 2024-01-30T05:32:02Z

Purpose

Improve performance and fix flaw in BFS algorithm

Rationale

I was testing a complex Git Flow repository and found that if the breadth first search reached a previously visited node even if it didn't find the version it was looking for it would abort with None. This was incorrect as the version it was looking for was deeper on the tree. This appears when you a have a branch that has more than 1 commit on it and has merged back into a base branch. The other branch also must have more commits on it than the branch prior to the version it is looking for. The graph looks like this:

* merge commit 6 (start)
|\
| * commit 5
| * commit 4
|/
* commit 3
* commit 2
* commit 1
* v1.0.0

During a breadth first search looking for v1.0.0, a textbook BFS algorithm will visit the nodes in the following order:

queue = [ 6, 3, 5, 2, 4, 1, "v1.0.0" ]

Prior to this PR, the PSR's BFS algorithm would attempt to visit nodes in the order:

queue = [ 6, 3, 5, 2, 4, 1, 3, "v1.0.0" ]
                            ^

Since 3 has already been visited, the algorithm returns None and the queue is technically not empty as when inspecting 1, it added v1.0.0 to the queue.

Furthermore, when reviewing online references, the BFS algorithm best matches to the use of a queue data structure which is what we use, however, recursion is not recommended due to the performance hit of each consecutive function call and the finite size of a process call stack. As projects reach long term duration and sustainment, the number of git commits will increase and approach the maximum stack size call.

How I tested

I created a unit test that mocks the Git Commits & tags of a tree that matches the above diagram. When you run this unit-test on the base branch, you will see that it throws a None even though the tag commit exists. After this PR's changes the test succeeds since it does not try to revisit a commit it already visited.

How to verify

git fetch origin pull/822/head:pr-822

# checkout the PR as a detached HEAD state with only the unit tests
git checkout pr-822~2 --detach

# execute unit-test (see 1 failure)
pytest tests -k test_bfs_for_latest_version_in_history

# update to the HEAD of the PR (with fixes)
git merge --ff-only pr-822

# run pytest again to see success
pytest tests -k test_bfs_for_latest_version_in_history

And of course, all tests work as with the CI results below.

codejedi365 · 2024-02-03T20:02:09Z

@relekang, @bernardcooke53, pr ready for review for when you have time, thanks!

bernardcooke53

To this day I think this is one of the gnarliest bits of code I've ever written, I agree that recursion doesn't scale that well here - thanks again @codejedi365 🚀

codejedi365 mentioned this pull request Feb 2, 2024

All pre-release changelog on release. #817

Open

codejedi365 marked this pull request as ready for review February 3, 2024 18:24

codejedi365 force-pushed the fix/algorithm-bfs branch 3 times, most recently from b8426d2 to 5e97ac3 Compare February 3, 2024 19:52

bernardcooke53 approved these changes Feb 6, 2024

View reviewed changes

codejedi365 added 3 commits February 6, 2024 20:09

test(algorithm): add bfs unit test on fake git history

84580e7

fix(algorithm): correct bfs to not abort on previously visited node

ce3a8dc

perf(algorithm): refactor bfs search to use queue rather than recursion

d6daab9

codejedi365 force-pushed the fix/algorithm-bfs branch from 5e97ac3 to d6daab9 Compare February 7, 2024 01:09

relekang merged commit 8b742d3 into python-semantic-release:master Feb 8, 2024
7 checks passed

codejedi365 deleted the fix/algorithm-bfs branch February 9, 2024 04:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(algorithm): correct bfs to not abort on previously visited node #822

fix(algorithm): correct bfs to not abort on previously visited node #822

codejedi365 commented Jan 30, 2024 •

edited

codejedi365 commented Feb 3, 2024

bernardcooke53 left a comment

fix(algorithm): correct bfs to not abort on previously visited node #822

fix(algorithm): correct bfs to not abort on previously visited node #822

Conversation

codejedi365 commented Jan 30, 2024 • edited

Purpose

Rationale

How I tested

How to verify

codejedi365 commented Feb 3, 2024

bernardcooke53 left a comment

Choose a reason for hiding this comment

codejedi365 commented Jan 30, 2024 •

edited