New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MagicLink support for abbreviated commit IDs #1895
Comments
When you click on a commit link in GitHub, it always takes you to a link with the full hash, though the displayed link always abbreviates the hash to 7 chars (GitLab always shows 8 last I checked). I may or may not have been aware that GitHub would infer the hash from only 7 chars, but if I was aware, I'm sure false positives were a driving force to not mimic it.
This may not always be the case. There is definitely a potential to have a hash collision in the first 7. The likeliness of it may be low in projects with a much lower commit count, but it is not impossible. GitHub can get away with this because they can double check it in their database, but we cannot - or better put, will not - so yes, we could actually transform real words on accident, at least theoretically. Thinking about it practically though, I'm trying to come up with a real-world 7 character word that you could formulate with hex characters that wouldn't be nonsense... If we were tapping into the public API and doing requests through the API during translation to ensure we had an actual commit hash, then I think it could be reasonable. As stated before, this is not planned. With all that said, the likeliness of us actually getting a false positive with a commit hash of 7 chars may be pretty unlikely, but such a change may break people's expectations as they may have references to smaller hashes or user handles in their documents that previously didn't auto-link to GitHub that would now link. To prevent such breakage, I could see it being beneficial to lock it behind a feature switch as we will not be verifying the hash with the GitHub API. I will tag this with a maybe for now and will think about it. |
One thing I think we could also do is relax link shortening to accept full links with hashes from 7 to 40: |
There are indeed 7 letter words that can be created with hex letters. As an example: I think, for now, we will relax URL shortening though. |
Description
Thanks for maintaining these extensions! I have a question about autolinking commit IDs with the MagicLink extension.
I'm aware that MagicLink already supports autolinking the full 40 character commit IDs (SHAs). It recognizes these commit IDs with regexes:
pymdown-extensions/pymdownx/magiclink.py
Lines 118 to 122 in 25508d4
MagicLink converts these full commit IDs into an abbreviated format. The length of these abbreviated commit IDs is the first 7 characters on GitHub and BitBucket, and the first 8 characters on GitLab, as shown in each
hash_size
item in thePROVIDER_INFO
dictionary constant.pymdown-extensions/pymdownx/magiclink.py
Line 215 in 25508d4
pymdown-extensions/pymdownx/magiclink.py
Lines 436 to 439 in 25508d4
So, MagicLink creates links from 40 character commit IDs. Could MagicLink also create links from abbreviated commit IDs?
Benefits
Users would be able to link to individual commits by specifying abbreviated commit IDs, which would be helpful when creating changelogs.
The autolinking behavior of MagicLink would more closely match the behavior of platforms like GitHub. For example, GitHub will autolink the commit ID below, even though it's only the first 7 characters
25508d4
:25508d4
And note that the full commit ID is not needed to construct the link. Navigating to 25508d4 points to the same commit.
Solution Idea
Autolink abbreviated commit IDs, based on the
provider:
,user:
, andrepo:
settings passed in.Using the example above, and the following options:
provider
:'github'
(the default)user
:facelessuser
repo
:pymdown-extensions
repo_url_shorthand
:True
MagicLink would transform
25508d4
intohttps://github.com/facelessuser/pymdown-extensions/commit/25508d4
.It seems to me that the main challenge would be parsing the abbreviated commit ID.
It's more challenging than the full 40 character ID, because there are few words that are 40 characters, but many more words that are 7 characters, so there's a risk of false positives. Because of the risk of false positives, I would recommend disabling autolinking of abbreviated commit IDs by default, and offering a Boolean configuration option to enable it. Maybe something like
link_abbreviated_commit_ids
.The text was updated successfully, but these errors were encountered: