Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashes in Wiki page titles are replaced with spaces #7570

Closed
2 of 7 tasks
qwertfisch opened this issue Jul 22, 2019 · 17 comments · Fixed by #24143
Closed
2 of 7 tasks

Dashes in Wiki page titles are replaced with spaces #7570

qwertfisch opened this issue Jul 22, 2019 · 17 comments · Fixed by #24143
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.

Comments

@qwertfisch
Copy link

Description

The Gitea internal wiki does not allow dash characters in page titles. I can enter it on creation, but it is replaced with a space on display.

I know that the direct cause is that spaces are replaced with dashes on filename level (also in the URL), so there is no distinction between spaces and dashes when reopening a page. Hence it will all be escaped and displayed as a space character.

Attempted workaround with URL-escaped character

All non-ASCII Unicode characters are displayed as URL-escaped variants of their Unicode code point, and even / is replaced with %2F. So I would expect that a filename containing %2D (stands for the dash) could be loaded and displayed with its correct title. I had manually created such a file, pushed it, but the %2D-escaped dash is still further escaped to a space. Even worse: the URL with the page won’t load anymore.

This looks like a deeper error in escape handling with the dash character.

@zeripath
Copy link
Contributor

If what I suspect is happening is, then I bet if you use %252D it will work.

However that's not really a solution...

@lunny
Copy link
Member

lunny commented Jul 23, 2019

It seems we already have a PR to fix this.

@qwertfisch
Copy link
Author

@lunny In the list of open PRs I could not find anything related. Can you reference the PR in mind?

@qwertfisch
Copy link
Author

qwertfisch commented Jul 24, 2019

@zeripath Sorry, that does not work either. The replacement of a dash character to a space seems to be executed right after URI decoding, but there is no double decoding.

I made an overview table to see the different behaviour. The last column means that a page can be loaded on the given URL, which does not work in the second and third case. It is only visible in the list of all pages.

Description Filename Resulting URL Display Page is found
Normal URI encoded chars colon%3A-try-it.md colon%3A-try-it colon: try it
Space URI encoded space%20test.md space-test space test
Dash URI encoded page-with-one%2Ddash.md page-with-one-dash page with one dash
Dash double URI encoded page-with-one%252Ddash.md page-with-one%2Ddash page with one%2Ddash

The second and third rows are kind of confusing. It looks as if %20 and %2D are URI decoded from the filename, then there is the default dash/space conversion to create a page title. But when creating the URL, it seems as if it’s not created from filename, but from the (previously decoded) page title. Which of course results in - characters for all spaces (and no dash at all), so neither the correct file with %2D nor %20 can be found.

Suggestion

Can’t we just remove skip the errornous space/dash conversion and store the page with URI encoding? This would solve every problem case, with the slight disadvantage of spaces encoded %20 in the URL …

@mrsdizzie
Copy link
Member

Haven't looked too close but this is probably happening here:

gitea/models/wiki.go

Lines 43 to 53 in d4667a4

func WikiFilenameToName(filename string) (string, error) {
if !strings.HasSuffix(filename, ".md") {
return "", ErrWikiInvalidFileName{filename}
}
basename := filename[:len(filename)-3]
unescaped, err := url.QueryUnescape(basename)
if err != nil {
return "", err
}
return NormalizeWikiName(unescaped), nil
}

So its first unescaping the file name and then running it through something that just replaces - with " " which matches described behavior.

I agree the filenames should have always been stored encoded but unfortunately they weren't and changing that would put dashes in the title of everyones page where they weren't there before. Would be a breaking change to consider.

@zeripath
Copy link
Contributor

Make it a per repo configurable?

@mrsdizzie
Copy link
Member

mrsdizzie commented Jul 24, 2019

I hesitate to add yet more config options, particularly to keep a behavior that is not good (current replacing of space with dash).

I'd rather try and have some sort of code to handle legacy cases, or even just consider all literal dashes as spaces (breaking the less common case of a dash in the title) and then escape everything going forward and handle that properly.

I guess that would maybe look like first replacing all literal dashes and then escaping the filename. It would only break existing titles with a dash in them which already seem to not work properly anyways per this issue

@zeripath
Copy link
Contributor

Hmm what happens to + ?

@mrsdizzie
Copy link
Member

In what situation? I think in the situation of saving a new filename we should make sure to encode + to %2B and not leave it as +

Are you asking if somebody already has a + in their filename? If so that should just still work since any unencoding wouldn't mess with it.

@mrsdizzie
Copy link
Member

Probably the real problem to solve would be preserving current links or making sure they still work somehow

@qwertfisch
Copy link
Author

qwertfisch commented Jul 25, 2019

@zeripath + and ? are escaped and can be used properly. In fact every ASCII special character is usable. Only -._~ are not encoded but put directly in the filename / URL. (And of course the space is converted to dash before storing as file. This should be %20 normally.)

  • page title: chartest !"#$%&'()*+, ./:;<=>?@[]^_`{|}~§
  • filename: chartest-%21%22%23%24%25%26%27%28%29%2A%2B%2C-.%2F%3A%3B%3C%3D%3E%3F%40%5B%5C%5D%5E_%60%7B%7C%7D~%C2%A7.md

@zeripath
Copy link
Contributor

@mrsdizzie @qwertfisch I was meaning that a plain '+' in an url should map to ' ' by convention, and through my testing it appears that this doesn't get mapped to '-' but rather retains the ' ' when passed through - so you should be able to reach a file with spaces in that way.

Yeesh this is so broken. I've been ignoring the wiki, as like the diff page, as I have been suspicious that it needs a thorough overhaul. This proves my fears.

@mrsdizzie I agree that adding configuration should be avoided as much as is possible. If we don't want to add configuration another option is to from 1.10 do things correctly but fallback to the previous behaviour if the old way would have a different file?

@mrsdizzie
Copy link
Member

Perhaps -- probably something to think about if redesigning this. I'll note that it appears Gitea has just copied this same exact behavior from Github, which does the same thing with dashes and titles/filenames and you can't create a page title with dashes in the title there either.

@lunny lunny added the type/proposal The new feature has not been accepted yet but needs to be discussed first. label Jul 29, 2019
@icetiger1974
Copy link

hello
Is this issue solved?
Or remain in the same situation?

@unchaynd
Copy link

unchaynd commented Dec 17, 2022

You can't put a dash in a Wiki page title?... Seriously??

@zeripath
Copy link
Contributor

You can't put a dash in a Wiki page title?... Seriously??

See #7570 (comment)

Can you put a dash in a wiki page title on Github or Gitlab? If you can we should make that work - but otherwise we won't be compatible with gh.

@rrrutledge
Copy link

😢

silverwind pushed a commit that referenced this issue Apr 19, 2023
#24143)

Close #7570


1. Clearly define the wiki path behaviors, see
`services/wiki/wiki_path.go` and tests
2. Keep compatibility with old contents
3. Allow to use dashes in titles, eg: "2000-01-02 Meeting record"
4. Add a "Pages" link in the dropdown, otherwise users can't go to the
Pages page easily.
5. Add a "View original git file" link in the Pages list, even if some
file names are broken, users still have a chance to edit or remove it,
without cloning the wiki repo to local.
6. Fix 500 error when the name contains prefix spaces.


This PR also introduces the ability to support sub-directories, but it
can't be done at the moment due to there are a lot of legacy wiki data,
which use "%2F" in file names.



![image](https://user-images.githubusercontent.com/2114189/232239004-3359d7b9-7bf3-4ff3-8446-bfb0e79645dd.png)


![image](https://user-images.githubusercontent.com/2114189/232239020-74b92c72-bf73-4377-a319-1c85609f82b1.png)

Co-authored-by: Giteabot <teabot@gitea.io>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type/proposal The new feature has not been accepted yet but needs to be discussed first.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants