New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEBrick directory listings take priority over permalinks #6222

Closed
MattSturgeon opened this Issue Jul 15, 2017 · 13 comments

Comments

Projects
None yet
4 participants
@MattSturgeon
Contributor

MattSturgeon commented Jul 15, 2017

Prefixing the default post permalink and setting page permalink to the same value doesn't work.

Seems to only affect the local jekyll serve WEBrick server. GitHub Pages, for example, isn't affected.


  • I believe this to be a bug, not a question about using Jekyll.
  • I updated to the latest Jekyll (or) if on GitHub Pages to the latest github-pages
  • I ran jekyll doctor to check my configuration
  • I read the CONTRIBUTION file at https://jekyllrb.com/docs/contributing/
  • This is a feature request.

  • I am on (or have tested on) macOS 10+
  • I am on (or have tested on) Debian/Ubuntu GNU/Linux
  • I am on (or have tested on) Fedora GNU/Linux
  • I am on (or have tested on) Arch GNU/Linux
  • I am on (or have tested on) Other GNU/Linux
  • I am on (or have tested on) Windows 10+

  • I was trying to install.
  • There is a broken Plugin API.
  • I had an error on GitHub Pages, and I have reproduced it locally.
  • I had an error on GitHub Pages, and GitHub Support said it was a Jekyll Bug.
  • I had an error on GitHub Pages and I did not test it locally.
  • I was trying to build.
  • It was another bug.

My Reproduction Steps

  • in _config.yml, set permalink to /blog/:title
  • create a page and set its permalink to /blog
  • run jekyll serve
  • visit localhost:4000/blog
    • Note that you are seeing the WEBrick server's directory index, not your page
  • visit localhost:4000/blog.html
    • Note that the page does exist
  • in _config.yml, set permalink to /notblog/:title
  • rerun jekyll serve
  • visit localhost:4000/blog
    • Note that it works correctly now

The Output I Wanted

I expected that visiting the permalink defined in the page's front matter would display that page.

Related to #3452

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Jul 15, 2017

Member

So, Webrick doesn't redirect /blog to /blog.html?

Member

pathawks commented Jul 15, 2017

So, Webrick doesn't redirect /blog to /blog.html?

@MattSturgeon

This comment has been minimized.

Show comment
Hide comment
@MattSturgeon

MattSturgeon Jul 15, 2017

Contributor

That's right. If you remove/modify the /blog/ prefix from the /blog/:title permalink, it works fine. I'm guessing WEBrick only redirects blog to blog.html if a blog directory doesn't actually exist.

Contributor

MattSturgeon commented Jul 15, 2017

That's right. If you remove/modify the /blog/ prefix from the /blog/:title permalink, it works fine. I'm guessing WEBrick only redirects blog to blog.html if a blog directory doesn't actually exist.

@MattSturgeon

This comment has been minimized.

Show comment
Hide comment
@MattSturgeon

MattSturgeon Jul 15, 2017

Contributor

Disabling directory indexing by setting :FancyIndexing to false for WEBrick::HTTPServlet::FileHandler (somewhere in here?) may help fix this.

While I know gh-pages probably uses a different http server (maybe nginx or apache), I couldn't reproduce this bug when I tested on github.io.

Using python's http.server module, which shows directory indexes too, I reproduced this.

Contributor

MattSturgeon commented Jul 15, 2017

Disabling directory indexing by setting :FancyIndexing to false for WEBrick::HTTPServlet::FileHandler (somewhere in here?) may help fix this.

While I know gh-pages probably uses a different http server (maybe nginx or apache), I couldn't reproduce this bug when I tested on github.io.

Using python's http.server module, which shows directory indexes too, I reproduced this.

@MattSturgeon MattSturgeon changed the title from permalink cannot be the same as part of another to WEBrick directory listings take priority over permalinks Jul 15, 2017

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Jul 15, 2017

Member

I wonder if there is some way we could configure one to take priority over the other

Member

pathawks commented Jul 15, 2017

I wonder if there is some way we could configure one to take priority over the other

@MattSturgeon

This comment has been minimized.

Show comment
Hide comment
@MattSturgeon

MattSturgeon Jul 18, 2017

Contributor

Ok, looking into WEBrick::HTTPServlet::FileHandler a little more, and learning some ruby as I go. The directory indexing stuff only happens after the normal GET request handling has failed, so disabling :FancyIndexing won't fix this bug anyway.

It may be possible to extend search_index_files(req, res) to fallback to filename + '.html' after its default check of filename + '/', but this may mess with the path_info magic that set_filename(req, res) is doing, so we may end up having to override that too.

Thinking of something along these lines:

# untested code. TODO: setup jekyll dev environment!
def search_index_file(req, res)

  # First try the default check for index files in the directory
  if file = super
      return file
  end
  
  # Otherwise, try a custom check for directory.html in the parent directory
  
  # Extract basename from the end of res.filename
  path_arr = res.filename.scan(%r{/[^/]*})

  # Get the last non-empty section of the filename
  while basename = path_arr.pop
      break unless basename == '/'
  end

  if file = search_file(req, res, basename + '.html')
      res.filename = path_arr.join
      return file
  end
  
  return nil

end
Contributor

MattSturgeon commented Jul 18, 2017

Ok, looking into WEBrick::HTTPServlet::FileHandler a little more, and learning some ruby as I go. The directory indexing stuff only happens after the normal GET request handling has failed, so disabling :FancyIndexing won't fix this bug anyway.

It may be possible to extend search_index_files(req, res) to fallback to filename + '.html' after its default check of filename + '/', but this may mess with the path_info magic that set_filename(req, res) is doing, so we may end up having to override that too.

Thinking of something along these lines:

# untested code. TODO: setup jekyll dev environment!
def search_index_file(req, res)

  # First try the default check for index files in the directory
  if file = super
      return file
  end
  
  # Otherwise, try a custom check for directory.html in the parent directory
  
  # Extract basename from the end of res.filename
  path_arr = res.filename.scan(%r{/[^/]*})

  # Get the last non-empty section of the filename
  while basename = path_arr.pop
      break unless basename == '/'
  end

  if file = search_file(req, res, basename + '.html')
      res.filename = path_arr.join
      return file
  end
  
  return nil

end

MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 18, 2017

Fix serving files that clash with directories
Jekyll extends WEBrick to serve file.html if file is requested but
doesn't exist (see #3452). This doesn't work if file exists as a
directory though.

This commit extends WEBrick further so that if file exists as a
directory, we first try and serve file/index.html (as normal), but
fallback to file.html if no index can be found within the file
directory (reported as #6222).

It is slightly hacky, to avoid having to re-implement the entire
set_filename method, we instead extend search_index_file and temporarily
mutate res.filename. This is all clearly commented.

See base class search_index_file implementation:

https://github.com/ruby/webrick/blob/3fb6a011d620f1c19fce71bc3447740553e4fa80/lib/webrick/httpservlet/filehandler.rb#L365-L385

MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 19, 2017

Fix serving files that clash with directories
Jekyll extends WEBrick to serve file.html if file is requested but
doesn't exist (see #3452). This doesn't work if file exists as a
directory though.

This commit extends WEBrick further so that if file exists as a
directory, we first try and serve file/index.html (as normal), but
fallback to file.html if no index can be found within the file
directory (reported as #6222).

It is slightly hacky, to avoid having to re-implement the entire
set_filename method, we instead extend search_index_file and temporarily
mutate res.filename. This is all clearly commented.

See base class search_index_file implementation:

https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L356-L363

MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 19, 2017

Fix serving files that clash with directories
Jekyll extends WEBrick to serve file.html if file is requested but
doesn't exist (see #3452). This doesn't work if file exists as a
directory though.

This commit extends WEBrick further so that if file exists as a
directory, we first try and serve file/index.html (as normal), but
fallback to file.html if no index can be found within the file
directory (reported as #6222).

It is slightly hacky, to avoid having to re-implement the entire
set_filename method, we instead extend search_index_file and mutate
res.filename, reverting the change if it isn't needed.

See base class search_index_file implementation:

https://github.com/ruby/webrick/blob/3fb6a011/lib/webrick/httpservlet/filehandler.rb#L356-L363

MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 20, 2017

MattSturgeon added a commit to MattSturgeon/jekyll that referenced this issue Jul 20, 2017

@jekyllbot jekyllbot closed this in #6231 Jul 25, 2017

@janpio

This comment has been minimized.

Show comment
Hide comment
@janpio

janpio Oct 21, 2017

Contributor

Awesome that this was fixed in 3.6.x, but I think there still might something slightly broken maybe:

(I hope it is ok just to reply to this issue, parts of the PR was so salty that I am a bit scared...)

I have a Jekyll project with permalink: /:path that has a page.md and a page/second-page.md. This should be served as /page and /page/second-page.

In 3.5.2 going to /page showed the directory listing. Now in 3.6.0 this is fixed thanks to this PR (if I understood correctly). Problem is while the page works, /page is still redirected to /page/ (which is different to expected behaviour and also different to how it works on Github pages).

Here is a screenshot of Firefox devtools of the redirection:
image

Did I find the correct issue connected to this?
What would be an appropriate next step?

Contributor

janpio commented Oct 21, 2017

Awesome that this was fixed in 3.6.x, but I think there still might something slightly broken maybe:

(I hope it is ok just to reply to this issue, parts of the PR was so salty that I am a bit scared...)

I have a Jekyll project with permalink: /:path that has a page.md and a page/second-page.md. This should be served as /page and /page/second-page.

In 3.5.2 going to /page showed the directory listing. Now in 3.6.0 this is fixed thanks to this PR (if I understood correctly). Problem is while the page works, /page is still redirected to /page/ (which is different to expected behaviour and also different to how it works on Github pages).

Here is a screenshot of Firefox devtools of the redirection:
image

Did I find the correct issue connected to this?
What would be an appropriate next step?

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Oct 21, 2017

Member

@janpio So /page should not show a directory listing, and should not redirect to /page/, but should display /page.html, is that correct?

That seems like a regression, if this changed from 3.5.x to 3.6.0.

Could you open a new issue explaining all of this? This deserves it's own conversation.

Member

pathawks commented Oct 21, 2017

@janpio So /page should not show a directory listing, and should not redirect to /page/, but should display /page.html, is that correct?

That seems like a regression, if this changed from 3.5.x to 3.6.0.

Could you open a new issue explaining all of this? This deserves it's own conversation.

@janpio

This comment has been minimized.

Show comment
Hide comment
@janpio

janpio Oct 21, 2017

Contributor

Almost.

/page should show the content.
In 3.5.x it showed the directly listing after redirecting to /page/.
Since 3.6.0 it shows the correct content but still redirects to /page/. Content good, redirect still bad.

But just now I noticed that /page/second-page does actually render the content that should be at /page, which is of course really bad.

I will create a new issue with repro etc.

Contributor

janpio commented Oct 21, 2017

Almost.

/page should show the content.
In 3.5.x it showed the directly listing after redirecting to /page/.
Since 3.6.0 it shows the correct content but still redirects to /page/. Content good, redirect still bad.

But just now I noticed that /page/second-page does actually render the content that should be at /page, which is of course really bad.

I will create a new issue with repro etc.

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Oct 21, 2017

Member

What is on the disk, in the _site directory?

If the content is at /page/index.html then it seems like a redirect from /page to /page/ is acceptable. But yes, do open an issue so we can give this the attention it needs.

Member

pathawks commented Oct 21, 2017

What is on the disk, in the _site directory?

If the content is at /page/index.html then it seems like a redirect from /page to /page/ is acceptable. But yes, do open an issue so we can give this the attention it needs.

@janpio

This comment has been minimized.

Show comment
Hide comment
@janpio

janpio Oct 21, 2017

Contributor

No, content is in page.html. (Only page/index.md should be rendered to /page/index.html, right?)

See this new issue for complete description and reproduction repo:
#6459

Contributor

janpio commented Oct 21, 2017

No, content is in page.html. (Only page/index.md should be rendered to /page/index.html, right?)

See this new issue for complete description and reproduction repo:
#6459

@MattSturgeon

This comment has been minimized.

Show comment
Hide comment
@MattSturgeon

MattSturgeon Oct 21, 2017

Contributor
Contributor

MattSturgeon commented Oct 21, 2017

@janpio

This comment has been minimized.

Show comment
Hide comment
@janpio

janpio Oct 21, 2017

Contributor

That is one part, yes.

Unfortunately there is another, pages inside the folder show the same content as the page clashing with the folder name. This only started with 3.6.x. See #6459 for a proper bug report with repro etc.

Contributor

janpio commented Oct 21, 2017

That is one part, yes.

Unfortunately there is another, pages inside the folder show the same content as the page clashing with the folder name. This only started with 3.6.x. See #6459 for a proper bug report with repro etc.

@janpio

This comment has been minimized.

Show comment
Hide comment
@janpio

janpio Oct 23, 2017

Contributor

To make it less confusing I extracted the both problem (unnecessary redirect, wrong content rendered) into to seperate issues: #6459 + new #6475

Contributor

janpio commented Oct 23, 2017

To make it less confusing I extracted the both problem (unnecessary redirect, wrong content rendered) into to seperate issues: #6459 + new #6475

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment