Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gollum doesn't play nice with un-canonicalized filenames, and breaks offsite links when replacing mediawiki #166

Merged
merged 3 commits into from Apr 10, 2012

Conversation

arr2036
Copy link
Contributor

@arr2036 arr2036 commented Jun 1, 2011

Firstly I know what Readme.md says about file names with whitespace or forward slashes not being displayed. But the code does not reflect the Readme.

def self.cname(name)
  name.respond_to?(:gsub) ?
    name.gsub(%r{[ /<>]}, '-') :
    ''
end
def page_match(name, filename)
  if match = self.class.valid_filename?(filename)
    Page.cname(name).downcase == Page.cname(match).downcase
  else
    false
   end
end

So in fact any spaces forward slashes lt or gt chars in either the request or the filename will be replaced with dashes when the matching happens. meaning its perfectly possible to load all files, even those containing the restricted chars. Unfortunately other page operations still don't work so well, which is something this patch fixes.


  • Gollum uses hyphens (-) as a substitute for spaces in links
  • Mediawiki uses underscores (_) as a substitute for spaces in links
  • Gollum uses hyphens (-) as a substitute for spaces in filenames
  • Mediawiki uses an sql-lite instance, and doesn't substitute spaces in titles

When importing mediawiki data into gollum, flat files are created from the page title. page with title "Foo Bar" becomes Foo Bar.mediawiki.

"Foo Bar" can be accessed with /Foo%20Bar or /Foo-Bar, it cannot however be accessed using /Foo_Bar, which breaks all external links to all the wiki pages. When the wiki gets a few thousand hits a week from external linking, this is a big deal.

There are a couple of issues here:

  • When searching for page matches, Gollum substitutes whitespace in filenames for dashes, inline with its canonicalization scheme. This works fine for wikis originally created in Gollum, but other wikis may use different substitutions for whitespace. Mediawiki for example uses underscores.

In normal operation no pages created by Gollum will contain whitespace in the on disk filenames. If we assume all files that contain whitespace were created by a different wiki then the standards for page matching can be relaxed without breaking any existing Gollum sites.

With default Gollum install:
my-file.md will match 'my file' and 'my-file' # Created internally
my file.md will match 'my file' and 'my-file' # Created externally
my_file.md will match 'my_file' and nothing else # Created internally

With patched Gollum install:
my-file.md will match 'my file' and 'my-file' # Created internally
my file.md will match 'my file', 'my-file' and 'my_file' # Created externally
my_file.md will match 'my_file' and nothing else # Created internally

  • All the page manipulation functions in lib/gollum/committer.rb and lib/gollum/wiki.rb are set up to use a canonicalized version of page.name when manipulating existing pages. This breaks when the on disk filenames are not in the canonicalized format Gollum expects. For example when updating the format of the page "my test.md', "my-test.mediawiki" will be created, but "my test.md" will still be left in the working directory. It's also impossible to delete pages which don't fit with Gollum's canonicalization scheme.

The solution is to use the stripped filename (filename without path and extension) to refer to files in the git repo, not the re canonicalized page name when working with existing pages.

This branch passes all the tests that also passed in current master (most of the failures are Markdown issues), and the additional tests added for working with un-canonicalized files. Most of the additions are defining additional tests.

@arr2036
Copy link
Contributor Author

arr2036 commented Jul 15, 2011

Guys this passes all the standard tests, and the additional ones I added. Is there anything stopping it being pulled? Have I missed something major with the filename cannonicalization which is going to cause issues?

Arran Cudbard-Bell and others added 3 commits November 24, 2011 11:07
…hens, underscores and urlencoded spaces. This is for compatibility with mediawiki conversions.
@evert
Copy link

evert commented Jan 27, 2012

Would love to see this merged in

@arr2036
Copy link
Contributor Author

arr2036 commented Jan 27, 2012

+1 ;)

@semikolon
Copy link

+1!

atmos added a commit that referenced this pull request Apr 10, 2012
Gollum doesn't play nice with un-canonicalized filenames, and breaks offsite links when replacing mediawiki
@atmos atmos merged commit aa544f0 into gollum:master Apr 10, 2012
existme pushed a commit to existme/gollum that referenced this pull request Aug 9, 2018
Gollum doesn't play nice with un-canonicalized filenames, and breaks offsite links when replacing mediawiki
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants