Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a more aggressive sorting mechanism for git tags #9

Merged
merged 4 commits into from
Dec 11, 2020

Conversation

mariusvniekerk
Copy link
Contributor

@mariusvniekerk mariusvniekerk commented Dec 8, 2020

This should ensure that tags that are more recent topologically will always take precedence. Additionally we prefer using the timestamp from the tag (if annotated), which ensures that annotated tags created more recently for the same commit are selected.

This should prevent edge cases arising from multiple timezones being used to commit to the same git repository.

This should ensure that tags that are more recent topologically will always
take precendence.  Additionally we prefer using the timestamp from the tag
annotation if present.
@@ -113,6 +113,9 @@ def test__version__from_git__with_annotated_tags(tmp_path) -> None:
with pytest.raises(ValueError):
from_vcs(latest_tag=True)

# check that we find the expected tag that has the most recent tag creation time
avoid_identical_ref_timestamps()
run("git tag v0.2.0b1 -m Annotated")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously in this case v0.2.0b1 would be matched instead of v0.2.0

@mtkennerly
Copy link
Owner

Thanks for this! I like the fix for the v0.2.0b1 vs v0.2.0 matching, and the time zone comment sounds promising, but I'd like to make sure I understand exactly how this improvement works and maybe if we can simplify some of it.

Dates

In my testing, the creatordate and taggerdate were equivalent for annotated tags, and taggerdate is undefined for lightweight tags, so I'm not sure why you added a check for taggerdate. Is there some case where it helps?

$ git init
$ echo foo > foo.txt
$ git add .
$ git commit -m "init"
$ git tag lightweight
$ git tag annotated -m hi
$ git for-each-ref "refs/tags/*" --merged HEAD --format "%(refname)  |  %(creatordate:iso-strict)  |  %(*committerdate:iso-strict)  |  %(taggerdate:iso-strict)"
refs/tags/annotated  |  2020-12-08T21:27:20-05:00  |  2020-12-08T21:26:55-05:00 |  2020-12-08T21:27:20-05:00
refs/tags/lightweight  |  2020-12-08T21:26:55-05:00  |    |

Time zones

I thought the original implementation would have also handled different time zones because of the %z in _parse_git_timestamp_iso_strict. Where/how were time zones being handled incorrectly before?

Test with v0.2.0b1

I verified that adding v0.2.0b1 made test__version__from_git__with_annotated_tags fail with the original implementation, but I can't tell the exact difference in your changes that fixes it. Could you help explain the key differences that solve it? I thought it might be related to --topo-order, but I don't currently understand why the existing implementation's timestamp check was insufficient.

@mariusvniekerk
Copy link
Contributor Author

mariusvniekerk commented Dec 10, 2020

So there are a two parts going on here

dates

The new priority of dates results in

  1. @{%(taggerdate:iso-strict) explicitly returns the creation time for an annotated tag, empty otherwise.
  2. %(*committerdate:iso-strict) dereferences to the underlying commit so you get the timestamp that the commit was created not the tag's date.
    • i've kept this behavior, but it is probably not what we actually want._
    • odds are we can just remove that.
  3. @{%(creatordate:iso-strict) returns the creation time for an annotated tag, and the creation time for a commit for a lightweight tag.

topo oder

The --topo-order thing is separate from the timestamps and useful for preventing you from detecting tags that have been created in the past

Say we have the following commits

c1----c2-----c3----c4
       |            |
       v1.0.0       HEAD

We'd detect HEAD correctly as 1.0.0 with a distance of 2

if c1 now gets tagged as v0.9.9 and we relied purely on timestamps we would now incorrectly detect the version as 0.9.9 with a distance of 3. The --topo-order changes gets us to behaving in the same way that git --describe does.

The topology changes means we'd always prefer tags closer to the current HEAD, and the timestamp logic is used mostly to resolve ties where multiple tags exist on a single commit.

@mtkennerly
Copy link
Owner

mtkennerly commented Dec 10, 2020

Ah, I think you're using --topo-order how I meant to use *committerdate, but I just realized where I went wrong. I sorted by either *committerdate or creatordate depending on which was set, whereas your approach works because you're essentially combining two sort keys.

This minimal change also fixes the v0.2.0b1 test:

diff --git a/dunamai/__init__.py b/dunamai/__init__.py
index 0cd245f..714fff3 100644
--- a/dunamai/__init__.py
+++ b/dunamai/__init__.py
@@ -393,16 +393,17 @@ class Version:
         detailed_tags = []
         for line in msg.strip().splitlines():
             parts = line.split("@{")
+            creator_date = _parse_git_timestamp_iso_strict(parts[1])
             detailed_tags.append(
                 (
                     parts[0].replace("refs/tags/", "", 1),
-                    _parse_git_timestamp_iso_strict(parts[1]),
-                    None if parts[2] == "" else _parse_git_timestamp_iso_strict(parts[2]),
+                    creator_date,
+                    creator_date if parts[2] == "" else _parse_git_timestamp_iso_strict(parts[2]),
                 )
             )
         tags = [
             t[0]
-            for t in reversed(sorted(detailed_tags, key=lambda x: x[1] if x[2] is None else x[2]))
+            for t in reversed(sorted(detailed_tags, key=lambda x: (x[2], x[1])))
         ]
         tag, base, stage, unmatched, tagged_metadata = _match_version_pattern(
             pattern, tags, latest_tag
  • For annotated tags, sort first by the date of the pointed commit (so we get the latest commit with a tag), then break any ties based on which tag is newer.
  • For lightweight tags, sort only by the commit date.

Can you think of any cases where the above would behave differently than your proposed changes?

@mariusvniekerk
Copy link
Contributor Author

Since you can specify and amend both committer and author dates for git commits it could lead to some strange cases if you just ignore the graph structure.

@mariusvniekerk
Copy link
Contributor Author

mariusvniekerk commented Dec 10, 2020

Additionally any repo that makes use of git-am or other similar patch based workflow would not have commits in chronological order.

Rebases can also easily do this.

@mtkennerly
Copy link
Owner

Got it; that's a great point. Thanks for the explanation 👍

Copy link
Owner

@mtkennerly mtkennerly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this improvement! I'll plan to add some non-chronological commit tests and maybe tweak some stylistic stuff before releasing, but it shouldn't take long.

@mtkennerly mtkennerly merged commit 0b07029 into mtkennerly:master Dec 11, 2020
@mariusvniekerk mariusvniekerk deleted the better-git-tag-ordering branch December 11, 2020 15:43
@mtkennerly mtkennerly added this to the v1.5.1 milestone Dec 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants