Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(algorithm): correct bfs to not abort on previously visited node #822

Merged

Conversation

codejedi365
Copy link
Contributor

@codejedi365 codejedi365 commented Jan 30, 2024

Purpose

Improve performance and fix flaw in BFS algorithm

Rationale

I was testing a complex Git Flow repository and found that if the breadth first search reached a previously visited node even if it didn't find the version it was looking for it would abort with None. This was incorrect as the version it was looking for was deeper on the tree. This appears when you a have a branch that has more than 1 commit on it and has merged back into a base branch. The other branch also must have more commits on it than the branch prior to the version it is looking for. The graph looks like this:

* merge commit 6 (start)
|\
| * commit 5
| * commit 4
|/
* commit 3
* commit 2
* commit 1
* v1.0.0

During a breadth first search looking for v1.0.0, a textbook BFS algorithm will visit the nodes in the following order:

queue = [ 6, 3, 5, 2, 4, 1, "v1.0.0" ]

Prior to this PR, the PSR's BFS algorithm would attempt to visit nodes in the order:

queue = [ 6, 3, 5, 2, 4, 1, 3, "v1.0.0" ]
                            ^

Since 3 has already been visited, the algorithm returns None and the queue is technically not empty as when inspecting 1, it added v1.0.0 to the queue.

Furthermore, when reviewing online references, the BFS algorithm best matches to the use of a queue data structure which is what we use, however, recursion is not recommended due to the performance hit of each consecutive function call and the finite size of a process call stack. As projects reach long term duration and sustainment, the number of git commits will increase and approach the maximum stack size call.

How I tested

I created a unit test that mocks the Git Commits & tags of a tree that matches the above diagram. When you run this unit-test on the base branch, you will see that it throws a None even though the tag commit exists. After this PR's changes the test succeeds since it does not try to revisit a commit it already visited.

How to verify

git fetch origin pull/822/head:pr-822

# checkout the PR as a detached HEAD state with only the unit tests
git checkout pr-822~2 --detach

# execute unit-test (see 1 failure)
pytest tests -k test_bfs_for_latest_version_in_history

# update to the HEAD of the PR (with fixes)
git merge --ff-only pr-822

# run pytest again to see success
pytest tests -k test_bfs_for_latest_version_in_history

And of course, all tests work as with the CI results below.

@codejedi365 codejedi365 marked this pull request as ready for review February 3, 2024 18:24
@codejedi365 codejedi365 force-pushed the fix/algorithm-bfs branch 3 times, most recently from b8426d2 to 5e97ac3 Compare February 3, 2024 19:52
@codejedi365
Copy link
Contributor Author

@relekang, @bernardcooke53, pr ready for review for when you have time, thanks!

Copy link
Contributor

@bernardcooke53 bernardcooke53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this day I think this is one of the gnarliest bits of code I've ever written, I agree that recursion doesn't scale that well here - thanks again @codejedi365 🚀

@relekang relekang merged commit 8b742d3 into python-semantic-release:master Feb 8, 2024
7 checks passed
@codejedi365 codejedi365 deleted the fix/algorithm-bfs branch February 9, 2024 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants