Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround git log --since missing commits #94

Merged
merged 1 commit into from
Jul 3, 2023
Merged

Conversation

peterwaller-arm
Copy link
Contributor

Git's walking mechanism stops as soon as it sees a commit which is older than the date specified by --since.

This is problematic in LLVM's commit history in August 2022, because there is a commit in the middle of it with a CommitDate in 2021.

Work around this by considering a large number of commits and taking the oldest one matching the target date. I'll do the August pack build and confirm that everything is OK before merging this.

Fix #89.

Signed-off-by: Peter Waller peter.waller@arm.com

Git's walking mechanism stops as soon as it sees a commit which is older
than the date specified by --since.

This is problematic in LLVM's commit history in August 2022, because
there is a commit in the middle of it with a CommitDate in 2021.

Work around this by considering a large number of commits and taking the
oldest one matching the target date.

Fix #89.

Signed-off-by: Peter Waller <peter.waller@arm.com>
@peterwaller-arm
Copy link
Contributor Author

I've shipped the 202208 pack using this, now requesting review but also happy to merge it as-is.

)

TZ=UTC git rev-list "${ARGS[@]}" | \
egrep "^$DATE_LOWER" | \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we assume that there is a commit with that date? And if there is not, what output do we see? Seemingly git log ..${BRANCH} from list() which is same as git log HEAD..${BRANCH}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Guess this could theoretically break on January 1st as an example. Though in practice I think there is always something getting committed to LLVM. Any smart ideas to address that? I might be better off making it month specific rather than date specific 🤔. Not great in any case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've assumed that CommitDates are correct, but actually a commit with a future or past date can pop into the history at any point, although I concur that a clock being ahead is less likely.

git log --since 2021/09/01 --until 2021/10/01 --topo-order --format="%cd %H"
Now includes the commit made in August 2022.

I think the best approach is to enter commit SHAs (ranges) by hand in the future, using our human understanding of git history. Unless we can come up with a clever algorithm that does that for us and does not skip commits on the boundary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing the thinking, indeed skew-to-future could also be a problem and I also can't see an easy way around it. Even the human chosen ranges could be wrong since the method they use to find appropriate commits may rely on this same machinery. The main safety check is either a count of commits by day within a month or more crudely checking the count of commits within the month. Another thing is that assuming we're typically building the most recent pack promptly, we have the more concrete data point that the history just fetched can't contain much 'junk into the future' since it doesn't exist yet.

I'll have to come back to this and do a little bit more thinking, since I want to automate it and leave it running for a long while.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since then I've been doing some level of sanity checking that the commit count is about right. I think we could do better, but I haven't been able to make time for this for a while, so I'm going to defer this for now since it is not a problem which hurts us much in practice and can be worked around manually if necessary.

In an ideal world I'd find a good fix for this but perfect is the enemy is the good, so I'm going to merge this and move on until we re-encounter the issue.

@peterwaller-arm peterwaller-arm merged commit 585c236 into main Jul 3, 2023
@peterwaller-arm peterwaller-arm deleted the issue-89 branch July 3, 2023 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

manyclangs: git-list-between incomplete output for 2022-08
2 participants