always include file extension on destination files #3490

Merged
merged 1 commit into from Mar 3, 2015

Conversation

Projects
None yet
7 participants
@willnorris
Contributor

willnorris commented Feb 21, 2015

This is an initial take at always including file extensions on destination files written to disk. This is a continuation of the discussion that's happened on #2294 and #3463, and coincides with the WEBrick changes in #3452.

This should only have any impact when the site or post permalink setting is configured without either a trailing slash or the file extension for that file type. In that case, the file written to disk will contain the correct extension for that type, while the URL will remain as specified in the permalink setting.

Notably, this does break the test that was added in #2925 to support files ending in ".htm" rather than ".html" (cc @pathawks to discuss), because this is using the output_ext value to determine the expected extension for the destination file. I believe this is the correct behavior. I think that any attempt to support alternate extensions (like ".htm") should make sure one way or another that output_ext returns the desired alternate extension. Not doing so seems likely to cause more problems down the road. Either way, we certainly need to resolve this breaking test before this can be safely merged.

@parkr

View changes

lib/jekyll/document.rb
@@ -164,6 +164,8 @@ def destination(base_directory)
dest = site.in_dest_dir(base_directory)
path = site.in_dest_dir(dest, URL.unescape_path(url))
path = File.join(path, "index.html") if url =~ /\/$/
+ output_ext = Jekyll::Renderer.new(site, self).output_ext
+ path += output_ext if path !~ /#{output_ext}$/

This comment has been minimized.

@parkr

parkr Feb 21, 2015

Member

if path.end_with?(output_ext)?

@parkr

parkr Feb 21, 2015

Member

if path.end_with?(output_ext)?

This comment has been minimized.

@willnorris

willnorris Feb 21, 2015

Contributor

done.

@willnorris

willnorris Feb 21, 2015

Contributor

done.

@parkr

View changes

lib/jekyll/document.rb
@@ -164,6 +164,8 @@ def destination(base_directory)
dest = site.in_dest_dir(base_directory)
path = site.in_dest_dir(dest, URL.unescape_path(url))
path = File.join(path, "index.html") if url =~ /\/$/
+ output_ext = Jekyll::Renderer.new(site, self).output_ext

This comment has been minimized.

@parkr

parkr Feb 21, 2015

Member

we should really be caching this value

@parkr

parkr Feb 21, 2015

Member

we should really be caching this value

This comment has been minimized.

@willnorris

willnorris Feb 21, 2015

Contributor

done (I think the right way... double check)

@willnorris

willnorris Feb 21, 2015

Contributor

done (I think the right way... double check)

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 21, 2015

Member

So dope. Thanks for the PR!

Member

parkr commented Feb 21, 2015

So dope. Thanks for the PR!

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 21, 2015

Member

What about checking for the existence of a file extension before adding this one?

>> File.extname("path/to/file.htm")
=> ".htm"
>> File.extname("path/to/file.htm").empty?
=> false
>> File.extname("path/to/file")
=> ""
>> File.extname("path/to/file").empty?
=> true

If it's empty, then add the output extension. Otherwise, pass.

Also, let's use << instead of +=.

Member

parkr commented Feb 21, 2015

What about checking for the existence of a file extension before adding this one?

>> File.extname("path/to/file.htm")
=> ".htm"
>> File.extname("path/to/file.htm").empty?
=> false
>> File.extname("path/to/file")
=> ""
>> File.extname("path/to/file").empty?
=> true

If it's empty, then add the output extension. Otherwise, pass.

Also, let's use << instead of +=.

@parkr parkr added the Enhancement label Feb 21, 2015

@parkr

View changes

test/source/_posts/2015-02-20-extensionless-permalink.markdown
+permalink: /:title
+---
+
+Test

This comment has been minimized.

@parkr

parkr Feb 21, 2015

Member

Can we output {{ page.url }} here to ensure the Liquid URL is being set properly?

@parkr

parkr Feb 21, 2015

Member

Can we output {{ page.url }} here to ensure the Liquid URL is being set properly?

This comment has been minimized.

@willnorris

willnorris Feb 21, 2015

Contributor

Done.

@willnorris

willnorris Feb 21, 2015

Contributor

Done.

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Feb 21, 2015

Contributor

What about checking for the existence of a file extension before adding this one?

So I did think about that. It will certainly work, but there's two things I didn't like about it:

  1. it prevents having a period in your post title if you have an extensionless permalink structure. Perhaps a theoretical argument, as I'm not sure how often people actually use periods in post titles (I know I never do)
  2. it doesn't address the issue of always being able to trust output_ext to accurately identify the extension of the destination file. Based on how it's being used today, I don't actually see any immediate issues, but it just feels wrong to me.

That said, I'm happy to add it in if you'd like.

Also, let's use << instead of +=.

Done.

Contributor

willnorris commented Feb 21, 2015

What about checking for the existence of a file extension before adding this one?

So I did think about that. It will certainly work, but there's two things I didn't like about it:

  1. it prevents having a period in your post title if you have an extensionless permalink structure. Perhaps a theoretical argument, as I'm not sure how often people actually use periods in post titles (I know I never do)
  2. it doesn't address the issue of always being able to trust output_ext to accurately identify the extension of the destination file. Based on how it's being used today, I don't actually see any immediate issues, but it just feels wrong to me.

That said, I'm happy to add it in if you'd like.

Also, let's use << instead of +=.

Done.

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Feb 22, 2015

Contributor

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

Contributor

willnorris commented Feb 22, 2015

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 22, 2015

Member

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

You make good points... people should be able to use dots in their files, and we don't want to limit file extensions... Hm...

Member

parkr commented Feb 22, 2015

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

You make good points... people should be able to use dots in their files, and we don't want to limit file extensions... Hm...

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 22, 2015

Member

Would you mind adding a test for Document and Page as well? Just until they're all one class...

Member

parkr commented Feb 22, 2015

Would you mind adding a test for Document and Page as well? Just until they're all one class...

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Feb 22, 2015

Contributor

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

You make good points... people should be able to use dots in their files, and we don't want to limit file extensions... Hm...

okay, yeah... I now see that there are test cases that specifically focus on files with dots in their name, so this is definitely something we'll want to support. So now I'm back to feeling like we will need to back out the test case from #2925 and find another way to support that (if at all). As @pathawks mentioned on that bug, he could setup redirects for the legacy URLs, which may be the better option here.

Right now, I'm fairly certain that Pages still end up with URLs contain the ".html" extension. That never actually impacted me, so I sort of forgot about it. I'll look into what it will take to fix that, and add tests for that and Document (as well as additional tests for dotted filenames).

Contributor

willnorris commented Feb 22, 2015

okay, thinking more about it, I agree that respecting existing extensions in permalink is probably the way to go for right now at least. This next commit should be safe to merge I think.

You make good points... people should be able to use dots in their files, and we don't want to limit file extensions... Hm...

okay, yeah... I now see that there are test cases that specifically focus on files with dots in their name, so this is definitely something we'll want to support. So now I'm back to feeling like we will need to back out the test case from #2925 and find another way to support that (if at all). As @pathawks mentioned on that bug, he could setup redirects for the legacy URLs, which may be the better option here.

Right now, I'm fairly certain that Pages still end up with URLs contain the ".html" extension. That never actually impacted me, so I sort of forgot about it. I'll look into what it will take to fix that, and add tests for that and Document (as well as additional tests for dotted filenames).

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 25, 2015

Member

I think we should be able to support htm extensions, even if just to maintain backwards compatability with the hundreds of thousands (guessing, maybe more or less) of GitHub Pages sites that are presently out there... Perhaps we do check against a list of possible file extensions? That may slow down this method considerably...

One idea is to release this in a beta, have @pathawks take a look, and come up with a fix for that later.

Right now, I'm fairly certain that Pages still end up with URLs contain the ".html" extension. That never actually impacted me, so I sort of forgot about it. I'll look into what it will take to fix that, and add tests for that and Document (as well as additional tests for dotted filenames).

Yes, I believe they are presently just as flexible as Posts, so it would be good to have them under test. Thank you!!

Member

parkr commented Feb 25, 2015

I think we should be able to support htm extensions, even if just to maintain backwards compatability with the hundreds of thousands (guessing, maybe more or less) of GitHub Pages sites that are presently out there... Perhaps we do check against a list of possible file extensions? That may slow down this method considerably...

One idea is to release this in a beta, have @pathawks take a look, and come up with a fix for that later.

Right now, I'm fairly certain that Pages still end up with URLs contain the ".html" extension. That never actually impacted me, so I sort of forgot about it. I'll look into what it will take to fix that, and add tests for that and Document (as well as additional tests for dotted filenames).

Yes, I believe they are presently just as flexible as Posts, so it would be good to have them under test. Thank you!!

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Feb 25, 2015

Member

I'd be happy to test 👍

You want I should test this pull request?

Member

pathawks commented Feb 25, 2015

I'd be happy to test 👍

You want I should test this pull request?

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 25, 2015

Member

@pathawks Yes please.

Member

parkr commented Feb 25, 2015

@pathawks Yes please.

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Feb 25, 2015

Contributor

Yes, I believe they are presently just as flexible as Posts, so it would be good to have them under test. Thank you!!

Actually pages are the worst in terms of permalink flexibility (from _config.yaml that is). If you're not using :pretty, then pages always use a template of /:path/:basename:output_ext. This is a large part of #2691, which I'm trying to fix here as well (haven't uploaded that commit yet.. that's what prompted #3507)

Contributor

willnorris commented Feb 25, 2015

Yes, I believe they are presently just as flexible as Posts, so it would be good to have them under test. Thank you!!

Actually pages are the worst in terms of permalink flexibility (from _config.yaml that is). If you're not using :pretty, then pages always use a template of /:path/:basename:output_ext. This is a large part of #2691, which I'm trying to fix here as well (haven't uploaded that commit yet.. that's what prompted #3507)

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Feb 25, 2015

Member

Ok then this PR looks good to me.

@pathawks, let me know what you figure out. It's hacky, but we could also check for .htm...

Member

parkr commented Feb 25, 2015

Ok then this PR looks good to me.

@pathawks, let me know what you figure out. It's hacky, but we could also check for .htm...

@parkr

View changes

lib/jekyll/post.rb
@@ -280,6 +280,7 @@ def destination(dest)
# The url needs to be unescaped in order to preserve the correct filename
path = site.in_dest_dir(dest, URL.unescape_path(url))
path = File.join(path, "index.html") if self.url =~ /\/$/

This comment has been minimized.

@parkr

parkr Feb 25, 2015

Member

also this should totally be .end_with?("/")... not sure if that's for this PR but it's an optimization we should take advantage of. i'll write a benchmark to ensure this

@parkr

parkr Feb 25, 2015

Member

also this should totally be .end_with?("/")... not sure if that's for this PR but it's an optimization we should take advantage of. i'll write a benchmark to ensure this

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Mar 2, 2015

Contributor

@parkr: as previously noted, we do have tests that ensure that files with dots in them work, which this method of simply looking for a file extension doesn't work with. As a result, this next commit switches back to enforcing output_ext exactly. This also necessitates removing @pathawks' feature test for .htm permalinks. Supporting non-standard file extensions like this will likely need to be done in a plugin, which I'll take a look at separately, just to show how it would be done for those that are interested.

This next commit also includes tests for posts and documents.

Contributor

willnorris commented Mar 2, 2015

@parkr: as previously noted, we do have tests that ensure that files with dots in them work, which this method of simply looking for a file extension doesn't work with. As a result, this next commit switches back to enforcing output_ext exactly. This also necessitates removing @pathawks' feature test for .htm permalinks. Supporting non-standard file extensions like this will likely need to be done in a plugin, which I'll take a look at separately, just to show how it would be done for those that are interested.

This next commit also includes tests for posts and documents.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 2, 2015

Member

What about allowing configuration of HTML files' file extension? is that crazy?

Member

parkr commented Mar 2, 2015

What about allowing configuration of HTML files' file extension? is that crazy?

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Mar 2, 2015

Contributor

What about allowing configuration of HTML files' file extension? is that crazy?

personally I think so, yes. :)

just make it a setting

Also keep in mind that I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content. I don't think there is any real demand for having .htm versus .html long-term, we just need a solution that is good enough to handle Pat's imported content (which I personally would still argue that just redirecting to the new URLs would be the much simpler approach)

Contributor

willnorris commented Mar 2, 2015

What about allowing configuration of HTML files' file extension? is that crazy?

personally I think so, yes. :)

just make it a setting

Also keep in mind that I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content. I don't think there is any real demand for having .htm versus .html long-term, we just need a solution that is good enough to handle Pat's imported content (which I personally would still argue that just redirecting to the new URLs would be the much simpler approach)

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 2, 2015

Member

Also keep in mind that I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content. I don't think there is any real demand for having .htm versus .html long-term

👍

Member

parkr commented Mar 2, 2015

Also keep in mind that I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content. I don't think there is any real demand for having .htm versus .html long-term

👍

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 2, 2015

Member

also /cc @benbalter for that great image above. ben's always talking about our 🎄 of options.

Member

parkr commented Mar 2, 2015

also /cc @benbalter for that great image above. ben's always talking about our 🎄 of options.

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Mar 2, 2015

Member

I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content.

Yup.

just redirecting to the new URLs would be the much simpler approach

Yup.

Member

pathawks commented Mar 2, 2015

I'm fairly certain @pathawks was only interested in having ".htm" extension for legacy, imported-in content.

Yup.

just redirecting to the new URLs would be the much simpler approach

Yup.

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Mar 3, 2015

Contributor

@pathawks: in my brief testing, this plugin seems to do the trick of preserving the ".htm" extension when a post's custom permalink contains that extension:

module Jekyll
  module Convertible
    def output_ext_with_htm
      ext = output_ext_without_htm
      ext = ".htm" if self.permalink.to_s.end_with?(".htm")
      ext
    end

    alias_method :output_ext_without_htm, :output_ext
    alias_method :output_ext, :output_ext_with_htm
  end
end

Or of course, you could also just redirect from within the web server itself.

Assuming that is okay and we're fine removing this behavior from jekyll core, then this should be ready for final review. @parkr could you take another look (there's a few additional tests now as well)

Contributor

willnorris commented Mar 3, 2015

@pathawks: in my brief testing, this plugin seems to do the trick of preserving the ".htm" extension when a post's custom permalink contains that extension:

module Jekyll
  module Convertible
    def output_ext_with_htm
      ext = output_ext_without_htm
      ext = ".htm" if self.permalink.to_s.end_with?(".htm")
      ext
    end

    alias_method :output_ext_without_htm, :output_ext
    alias_method :output_ext, :output_ext_with_htm
  end
end

Or of course, you could also just redirect from within the web server itself.

Assuming that is okay and we're fine removing this behavior from jekyll core, then this should be ready for final review. @parkr could you take another look (there's a few additional tests now as well)

always include file extension on destination files
This ensures that destination files for HTML posts, pages and
collections always include the proper file extension (as defined by
output_ext) regardless of permalink structure.  This allows for URLs
that contain no extension or trailing slash to still result in proper
destination files with .html extensions.

Because this change relies so heavily on output_ext accurately
identifying the extension of the destination file, this change also
removes the feature test that tested support for permalinks with a .htm
extension.  In order to support alternate file extensions, a future
patch or plugin will need to modify the output_ext value, at which point
everything else should work as expected.
@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 3, 2015

Member

I'm fine with removing this feature. We'll need to include it on our upgrading docs page.

Member

parkr commented Mar 3, 2015

I'm fine with removing this feature. We'll need to include it on our upgrading docs page.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 3, 2015

Member

This looks great to me! :shipit:

Thanks, Will!

Member

parkr commented Mar 3, 2015

This looks great to me! :shipit:

Thanks, Will!

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Mar 3, 2015

Contributor

I don't think it necessarily needs mentioning in the upgrade docs, since this never actually worked until #3014, which has only shipped with 3.0.0.beta1.

(and just to make sure, "ship-it squirrel" == "go ahead and merge"? Or should I wait for you or someone else to merge?)

Contributor

willnorris commented Mar 3, 2015

I don't think it necessarily needs mentioning in the upgrade docs, since this never actually worked until #3014, which has only shipped with 3.0.0.beta1.

(and just to make sure, "ship-it squirrel" == "go ahead and merge"? Or should I wait for you or someone else to merge?)

@pathawks

This comment has been minimized.

Show comment
Hide comment
@pathawks

pathawks Mar 3, 2015

Member

👍

Member

pathawks commented Mar 3, 2015

👍

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Mar 3, 2015

Member

(and just to make sure, "ship-it squirrel" == "go ahead and merge"? Or should I wait for you or someone else to merge?)

Yep, :shipit: means I'm cool merging it. Thanks for checking :)

Member

parkr commented Mar 3, 2015

(and just to make sure, "ship-it squirrel" == "go ahead and merge"? Or should I wait for you or someone else to merge?)

Yep, :shipit: means I'm cool merging it. Thanks for checking :)

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Mar 3, 2015

Member

Looks good! Though this only works for Markdown and converted files, right?

So if I had a a-page.html:

---
layout: default
permalink: /a-page
---
<strong>This is a page</strong>

It would output as /a-page, not /a-page.html?

Member

alfredxing commented Mar 3, 2015

Looks good! Though this only works for Markdown and converted files, right?

So if I had a a-page.html:

---
layout: default
permalink: /a-page
---
<strong>This is a page</strong>

It would output as /a-page, not /a-page.html?

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Mar 3, 2015

Contributor

It works just as well for non-converted files. If there is no converter registered for the file extension, then output_ext returns the extension of the source file. So in your example, the page would have a url of /a-page, but the destination file would be /a-page.html.

Contributor

willnorris commented Mar 3, 2015

It works just as well for non-converted files. If there is no converter registered for the file extension, then output_ext returns the extension of the source file. So in your example, the page would have a url of /a-page, but the destination file would be /a-page.html.

willnorris added a commit that referenced this pull request Mar 3, 2015

Merge pull request #3490 from willnorris/ext
always include file extension on destination files

@willnorris willnorris merged commit 5d1a24e into jekyll:master Mar 3, 2015

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

willnorris added a commit that referenced this pull request Mar 3, 2015

@willnorris willnorris deleted the willnorris:ext branch Mar 3, 2015

@alfredxing

This comment has been minimized.

Show comment
Hide comment
@alfredxing

alfredxing Mar 3, 2015

Member

@willnorris Right, forgot about that. Thanks!

Member

alfredxing commented Mar 3, 2015

@willnorris Right, forgot about that. Thanks!

@snickers0129

This comment has been minimized.

Show comment
Hide comment
@snickers0129

snickers0129 Apr 27, 2015

I like the idea of configuring the html extension for .html files as well... I'm hosting on Amazon S3, which supports serving files that are extension-less and doesn't really offer configuration

I like the idea of configuring the html extension for .html files as well... I'm hosting on Amazon S3, which supports serving files that are extension-less and doesn't really offer configuration

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Apr 27, 2015

Contributor

@snickers0129 you could do a variation of #3490 (comment) to customize the extension for HTML output files.

Contributor

willnorris commented Apr 27, 2015

@snickers0129 you could do a variation of #3490 (comment) to customize the extension for HTML output files.

@willnorris

This comment has been minimized.

Show comment
Hide comment
@willnorris

willnorris Apr 27, 2015

Contributor

err, you might actually need to patch the Markdown converter, but it would still be the output_ext method.

Contributor

willnorris commented Apr 27, 2015

err, you might actually need to patch the Markdown converter, but it would still be the output_ext method.

@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Apr 27, 2015

Contributor

Why not just do folder based navigation? The AWS docs states that's supported?

Contributor

envygeeks commented Apr 27, 2015

Why not just do folder based navigation? The AWS docs states that's supported?

@snickers0129

This comment has been minimized.

Show comment
Hide comment
@snickers0129

snickers0129 Apr 27, 2015

@willnorris Thanks! I saw that... I just thought I'd cast my vote!

@envygeeks You mean putting folders with index.html in them? That causes 302 redirects from url/page to url/page/

@willnorris Thanks! I saw that... I just thought I'd cast my vote!

@envygeeks You mean putting folders with index.html in them? That causes 302 redirects from url/page to url/page/

@envygeeks

This comment has been minimized.

Show comment
Hide comment
@envygeeks

envygeeks Apr 27, 2015

Contributor

Oh fair enough, I didn't notice it would do that crazy mess, but I guess I shouldn't be surprised it is a CDN.

Contributor

envygeeks commented Apr 27, 2015

Oh fair enough, I didn't notice it would do that crazy mess, but I guess I shouldn't be surprised it is a CDN.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.