Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MagicLink support for abbreviated commit IDs #1895

Closed
br3ndonland opened this issue Jan 3, 2023 · 3 comments · Fixed by #1964
Closed

Add MagicLink support for abbreviated commit IDs #1895

br3ndonland opened this issue Jan 3, 2023 · 3 comments · Fixed by #1964
Labels
P: maybe Pending approval of low priority request. T: feature Feature.

Comments

@br3ndonland
Copy link

br3ndonland commented Jan 3, 2023

Description

Thanks for maintaining these extensions! I have a question about autolinking commit IDs with the MagicLink extension.

I'm aware that MagicLink already supports autolinking the full 40 character commit IDs (SHAs). It recognizes these commit IDs with regexes:

RE_GIT_INT_MICRO_REFS = r'''(?x)
(?P<all>
(?:(?<![a-zA-Z])(?P<issue>(?:\#|!|\?)[1-9][0-9]*)|(?P<commit>(?<![@/])\b[a-f\d]{40})(?:\.{3}(?P<diff>[a-f\d]{40}))?)
)\b
'''

MagicLink converts these full commit IDs into an abbreviated format. The length of these abbreviated commit IDs is the first 7 characters on GitHub and BitBucket, and the first 8 characters on GitLab, as shown in each hash_size item in the PROVIDER_INFO dictionary constant.

link.set(
'title',
'{} {}: {}@{}'.format(label, repo_label, user_repo.rstrip('/'), value[0:hash_size])
)

So, MagicLink creates links from 40 character commit IDs. Could MagicLink also create links from abbreviated commit IDs?

Benefits

Users would be able to link to individual commits by specifying abbreviated commit IDs, which would be helpful when creating changelogs.

The autolinking behavior of MagicLink would more closely match the behavior of platforms like GitHub. For example, GitHub will autolink the commit ID below, even though it's only the first 7 characters 25508d4:

25508d4

And note that the full commit ID is not needed to construct the link. Navigating to 25508d4 points to the same commit.

Solution Idea

Autolink abbreviated commit IDs, based on the provider:, user:, and repo: settings passed in.

Using the example above, and the following options:

  • provider: 'github' (the default)
  • user: facelessuser
  • repo: pymdown-extensions
  • repo_url_shorthand: True

MagicLink would transform 25508d4 into https://github.com/facelessuser/pymdown-extensions/commit/25508d4.

It seems to me that the main challenge would be parsing the abbreviated commit ID.

It's more challenging than the full 40 character ID, because there are few words that are 40 characters, but many more words that are 7 characters, so there's a risk of false positives. Because of the risk of false positives, I would recommend disabling autolinking of abbreviated commit IDs by default, and offering a Boolean configuration option to enable it. Maybe something like link_abbreviated_commit_ids.

@br3ndonland br3ndonland added the T: feature Feature. label Jan 3, 2023
@gir-bot gir-bot added the S: triage Issue needs triage. label Jan 3, 2023
@facelessuser
Copy link
Owner

So, MagicLink creates links from 40 character commit IDs. Could MagicLink also create links from abbreviated commit IDs?

When you click on a commit link in GitHub, it always takes you to a link with the full hash, though the displayed link always abbreviates the hash to 7 chars (GitLab always shows 8 last I checked). I may or may not have been aware that GitHub would infer the hash from only 7 chars, but if I was aware, I'm sure false positives were a driving force to not mimic it.

And note that the full commit ID is not needed to construct the link. Navigating to 25508d4 points to the same commit.

This may not always be the case. There is definitely a potential to have a hash collision in the first 7. The likeliness of it may be low in projects with a much lower commit count, but it is not impossible.

MagicLink would transform 25508d4 into 25508d4.

GitHub can get away with this because they can double check it in their database, but we cannot - or better put, will not - so yes, we could actually transform real words on accident, at least theoretically. Thinking about it practically though, I'm trying to come up with a real-world 7 character word that you could formulate with hex characters that wouldn't be nonsense...

If we were tapping into the public API and doing requests through the API during translation to ensure we had an actual commit hash, then I think it could be reasonable. As stated before, this is not planned.

With all that said, the likeliness of us actually getting a false positive with a commit hash of 7 chars may be pretty unlikely, but such a change may break people's expectations as they may have references to smaller hashes or user handles in their documents that previously didn't auto-link to GitHub that would now link. To prevent such breakage, I could see it being beneficial to lock it behind a feature switch as we will not be verifying the hash with the GitHub API.

I will tag this with a maybe for now and will think about it.

@gir-bot add P: maybe
@gir-bot remove S: triage

@gir-bot gir-bot added P: maybe Pending approval of low priority request. and removed S: triage Issue needs triage. labels Jan 3, 2023
@facelessuser
Copy link
Owner

One thing I think we could also do is relax link shortening to accept full links with hashes from 7 to 40: https://github.com/facelessuser/pymdown-extensions/commit/25508d4 to 25508d4

@facelessuser
Copy link
Owner

There are indeed 7 letter words that can be created with hex letters. As an example: effaced, acceded, and defaced. So yes, we could get false positives if we aren't checking them against the database. If we were integrating with the GitHub API (which requires us to allow the user to specify tokens and for us to put the logic to query the REST API), only then could we safely approach this.

I think, for now, we will relax URL shortening though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P: maybe Pending approval of low priority request. T: feature Feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants