Gollum doesn't play nice with un-canonicalized filenames, and breaks offsite links when replacing mediawiki #166
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Firstly I know what Readme.md says about file names with whitespace or forward slashes not being displayed. But the code does not reflect the Readme.
So in fact any spaces forward slashes lt or gt chars in either the request or the filename will be replaced with dashes when the matching happens. meaning its perfectly possible to load all files, even those containing the restricted chars. Unfortunately other page operations still don't work so well, which is something this patch fixes.
When importing mediawiki data into gollum, flat files are created from the page title. page with title "Foo Bar" becomes
Foo Bar.mediawiki
."Foo Bar" can be accessed with /Foo%20Bar or /Foo-Bar, it cannot however be accessed using /Foo_Bar, which breaks all external links to all the wiki pages. When the wiki gets a few thousand hits a week from external linking, this is a big deal.
There are a couple of issues here:
In normal operation no pages created by Gollum will contain whitespace in the on disk filenames. If we assume all files that contain whitespace were created by a different wiki then the standards for page matching can be relaxed without breaking any existing Gollum sites.
With default Gollum install:
my-file.md will match 'my file' and 'my-file' # Created internally
my file.md will match 'my file' and 'my-file' # Created externally
my_file.md will match 'my_file' and nothing else # Created internally
With patched Gollum install:
my-file.md will match 'my file' and 'my-file' # Created internally
my file.md will match 'my file', 'my-file' and 'my_file' # Created externally
my_file.md will match 'my_file' and nothing else # Created internally
The solution is to use the stripped filename (filename without path and extension) to refer to files in the git repo, not the re canonicalized page name when working with existing pages.
This branch passes all the tests that also passed in current master (most of the failures are Markdown issues), and the additional tests added for working with un-canonicalized files. Most of the additions are defining additional tests.