Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't replace underscores in auto-generated IDs in goldmark #12805

Conversation

zeripath
Copy link
Contributor

Fix #12196

Signed-off-by: Andrew Thornton art27@cantab.net

Fix go-gitea#12196

Signed-off-by: Andrew Thornton <art27@cantab.net>
@zeripath
Copy link
Contributor Author

Technically this is breaking.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Sep 10, 2020
@zeripath zeripath added type/bug topic/ui Change the appearance of the Gitea UI pr/breaking Merging this PR means builds will break. Needs a description what exactly breaks, and how to fix it! labels Sep 10, 2020
@codecov-commenter
Copy link

codecov-commenter commented Sep 10, 2020

Codecov Report

Merging #12805 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12805      +/-   ##
==========================================
- Coverage   43.13%   43.13%   -0.01%     
==========================================
  Files         654      654              
  Lines       72205    72205              
==========================================
- Hits        31146    31143       -3     
+ Misses      36011    36009       -2     
- Partials     5048     5053       +5     
Impacted Files Coverage Δ
modules/markup/common/footnote.go 21.77% <100.00%> (ø)
modules/indexer/stats/db.go 43.47% <0.00%> (-17.40%) ⬇️
modules/indexer/stats/queue.go 64.70% <0.00%> (-11.77%) ⬇️
modules/git/utils.go 73.77% <0.00%> (-3.28%) ⬇️
services/pull/check.go 47.69% <0.00%> (-0.77%) ⬇️
modules/git/repo.go 49.23% <0.00%> (-0.51%) ⬇️
services/pull/pull.go 41.57% <0.00%> (-0.47%) ⬇️
modules/log/event.go 59.43% <0.00%> (+1.88%) ⬆️
modules/queue/workerpool.go 60.81% <0.00%> (+2.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55e05ad...64037e4. Read the comment docs.

@GiteaBot GiteaBot added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Sep 10, 2020
@silverwind
Copy link
Member

silverwind commented Sep 11, 2020

Are these underscores coming out of Goldmark like that or are we actually replacing them elsewhere ourselves? I have a suspicion that the latter is the case and then this fix may not be right.

BTW it may be possible to add compat with old links in JS thought I'm fine with not adding compat.

@zeripath
Copy link
Contributor Author

@silverwind this is the place we replace them in goldmark. Outside of goldmark we simply assert that ids need the prefix user-content-

@silverwind
Copy link
Member

silverwind commented Sep 11, 2020

So they are coming out like that of Goldmark? If so, may be a bug there because I think it strives to be compatible with GFM.

The user-content- prefix should be mostly be a thing of the past since #11903, only JS compat for it should remain.

@zeripath
Copy link
Contributor Author

We have to protect ids - without prefixing these how do you propose this?

@zeripath
Copy link
Contributor Author

FYI IDs are generated within our variant of goldmark using:

// Generate generates a new element id.
func (p *prefixedIDs) GenerateWithDefault(value []byte, dft []byte) []byte {
result := common.CleanValue(value)
if len(result) == 0 {
result = dft
}
if !bytes.HasPrefix(result, []byte("user-content-")) {
result = append([]byte("user-content-"), result...)
}
if _, ok := p.values[util.BytesToReadOnlyString(result)]; !ok {
p.values[util.BytesToReadOnlyString(result)] = true
return result
}
for i := 1; ; i++ {
newResult := fmt.Sprintf("%s-%d", result, i)
if _, ok := p.values[newResult]; !ok {
p.values[newResult] = true
return []byte(newResult)
}
}
}

Goldmark's default generator is:

func (s *ids) Generate(value []byte, kind ast.NodeKind) []byte {
value = util.TrimLeftSpace(value)
value = util.TrimRightSpace(value)
result := []byte{}
for i := 0; i < len(value); {
v := value[i]
l := util.UTF8Len(v)
i += int(l)
if l != 1 {
continue
}
if util.IsAlphaNumeric(v) {
if 'A' <= v && v <= 'Z' {
v += 'a' - 'A'
}
result = append(result, v)
} else if util.IsSpace(v) || v == '-' || v == '_' {
result = append(result, '-')
}
}
if len(result) == 0 {
if kind == ast.KindHeading {
result = []byte("heading")
} else {
result = []byte("id")
}
}
if _, ok := s.values[util.BytesToReadOnlyString(result)]; !ok {
s.values[util.BytesToReadOnlyString(result)] = true
return result
}
for i := 1; ; i++ {
newResult := fmt.Sprintf("%s-%d", result, i)
if _, ok := s.values[newResult]; !ok {
s.values[newResult] = true
return []byte(newResult)
}
}
}

when I moved us to Goldmark that generator was significantly different from our previous blackfriday generator therefore to make the changes as non-breaking as possible I created the above one.


Going back to why we have to prefix IDs:

I cannot stress this enough, we absolutely have to prefix ids.

Github also does this.

Our JS code and our CSS relies on IDs in several places - if you do not prefix user content ids you will get conflicts with multiple ids and will break rendering and potentially create security holes.

@silverwind
Copy link
Member

silverwind commented Sep 11, 2020

we absolutely have to prefix ids.

I never understood the reason behind this. Why are user-generated ids so bad?

conflicts with multiple ids

I think browsers handle that gracefully and scroll to the first id

security holes

What benefit would an attacker gain from this exactly? As I see it, they can't really do anything dangerous when controlling ids, but of course it depends on our JS and I'd rather prefer to use classnames only in JS anyways.

@GiteaBot GiteaBot added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Sep 11, 2020
@techknowlogick
Copy link
Member

🚀

@techknowlogick techknowlogick merged commit f91bb74 into go-gitea:master Sep 12, 2020
@techknowlogick techknowlogick added this to the 1.13.0 milestone Sep 12, 2020
@zeripath zeripath deleted the fix-12196-do-not-munge-underscores-in-anchors branch September 12, 2020 19:52
@go-gitea go-gitea locked and limited conversation to collaborators Nov 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. pr/breaking Merging this PR means builds will break. Needs a description what exactly breaks, and how to fix it! topic/ui Change the appearance of the Gitea UI type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replacement of underscores with dashes in auto-generated IDs for titles in markdown doesn't match Github
7 participants