Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to configure trailing slash redirect behavior #66

Merged
merged 7 commits into from Apr 10, 2015

Conversation

jeremydw
Copy link
Member

@jeremydw jeremydw commented Apr 6, 2015

Here's a pull request that's an alternative implementation to #65.

A few high-level notes:

  • I realized that Grow already has a way to specify whether a resource should be served at a path with a trailing slash or not. This configuration is specified in a content file's frontmatter or blueprint. Here's an example: https://github.com/jeremydw/grow-redirect-slashes-demo/blob/master/content/pages/about.html – this makes the problem in Object store-specific temporary deployment extensions #65 much more about how Grow configures trailing slash behavior for object stores rather than how Grow builds filesets.
  • When a pod is deployed, we should follow this configuration as best we can. This means for object store deployments (S3 and GCS) – ones that allow objects and directories to share the same basename – we can deploy resources with a configuration following exactly what the user has specified. (Local deployments, where objects and directories cannot share the same name, use the "suffix" configuration.)
  • In the current version of Grow, there is no way for the developer to specify trailing slash behavior for object store deployments.

This patch adds a way for the developer to control trailing slash behavior, independently of how resources are built by Grow. This matches werkzeug's strict_slashes parameter (http://werkzeug.pocoo.org/docs/0.10/routing/#maps-rules-and-adapters) or Django's APPEND_SLASH (https://docs.djangoproject.com/en/1.7/ref/settings/#append-slash).

In this patch, when a developer configures an object store deployment with redirect_trailing_slashes: true – any "leaf" resource (that is, any resource whose path doesn't end in a trailing slash) will be redirected using the object store's built-in redirect rule (for GCS, this redirects to $PATH/$MAIN_PAGE_SUFFIX). redirect_trailing_slashes: true is the current default behavior of Grow.

If a developer specifies redirect_trailing_slashes: false – resources will be deployed to the object store exactly as they're specified in the pod's configuration. This means when a user visits a path /foo, that resource will only be served if it exists. /foo/ causes a 404, as does /foo/index.html (for example).

Here's a sample pod that demonstrates this configuration:
https://github.com/jeremydw/grow-redirect-slashes-demo

With redirect_trailing_slashes: false (new behavior):
http://slashes-off.growlaunches.com/about-us (serves)
http://slashes-off.growlaunches.com/about-us/nested (serves)
http://slashes-off.growlaunches.com/about-us/ (404s)
http://slashes-off.growlaunches.com/about-us/index.html (404s)

With redirect_trailing_slashes: true (current/default behavior):
http://slashes-off.growlaunches.com/about-us (redirects)
http://slashes-off.growlaunches.com/about-us/ (serves)
http://slashes-off.growlaunches.com/about-us/index.html (serves)

@vitorio – sorry for sitting on this so long, but I wanted to think this through, and I knew that we already had a way to specify whether or not a path should have a trailing slash or not. This configuration format matches Jinja2's and Django's configuration options, which I'm happy about.

What do you think? Does this meet your needs? Thanks!

@vitorio
Copy link

vitorio commented Apr 6, 2015

This sounds like something that is semantically correct for Grow, and would support the URL structure I'm looking for.

Your grow-redirect-slashes-demo repo seems to generate the right structure, which is great.

However, the codelab pod doesn't seem to like it so much. Where grow-redirect-slashes-demo has hardcoded $path in each page, the codelab (and my test pod) doesn't: they're using the generated slug URLs in _blueprint.yaml, and those seem to have broken completely.

With the stock codelab podspec.yaml, no changes (so redirect_trailing_slashes: true), deployment paths like these show up:

Action                                        Path
====================================================
add      /404.html/index.html
add      /static/css/main.min.css/index.html
add      /static/images/banner1.jpg/index.html

If I set redirect_trailing_slashes: false:

Action                                        Path
====================================================
add      /
add      /about-us/
add      /archive/

These generate and upload no index.html, just an empty folder.

It looks like it's breaking your existing functionality in the codelab demo. (My test pod just has path: /{slug} in _blueprint.yaml and no path in the individual content pages.)

Unfortunately, I don't have time tonight to dig into where it's breaking or why, but perhaps just initing and trying to deploy a new codelab pod is data enough?

@vitorio
Copy link

vitorio commented Apr 6, 2015

I guess the broken situation is this:

In this patch, when a developer configures an object store deployment with redirect_trailing_slashes: true – any "leaf" resource (that is, any resource whose path doesn't end in a trailing slash) will be redirected using the object store's built-in redirect rule (for GCS, this redirects to $PATH/$MAIN_PAGE_SUFFIX). redirect_trailing_slashes: true is the current default behavior of Grow.

That should read: "any 'leaf' resource (that is, any resource whose path doesn't end in either a trailing slash or a filename extension)"

That's the reason for the very specific if cases in lines 236-239 here: https://github.com/grow/pygrow/pull/65/files#diff-ae194024d049ab76b5a2ab62ec64d186R236

(Both #66 and #65 break in the case of a non-HTML leaf resource without an extension, in that both will give it an HTML MIME type, but that can be fixed independently, with better MIME type divination.)

@jeremydw
Copy link
Member Author

jeremydw commented Apr 6, 2015

That's right. I've got a fix incoming, along with a quick unit test to verify.

@jeremydw
Copy link
Member Author

jeremydw commented Apr 6, 2015

Okay – I've pushed up the original check which only adds the suffix for leaf paths that do not have file extensions. Better mimetype divination for leaf paths is a related issue, but unimplemented currently! We currently assume leaf paths are HTML, as you saw.

@vitorio
Copy link

vitorio commented Apr 6, 2015

Cool, I'll test it out locally tonight!

@jeremydw
Copy link
Member Author

jeremydw commented Apr 6, 2015

Thanks, and thanks for working through this with me!

As an aside, we're a dev shop that uses Grow for our customers, and we've got a few big projects coming up in the next few months. Grow is still pretty nascent, but these projects will give us room to enhance the system, make it more mature, and improve the community and documentation. Looking forward to it!

@vitorio
Copy link

vitorio commented Apr 7, 2015

Hi, okay! Really close this time.

In my test pod, I have slugs that have no trailing slash, and also slugs that do have a trailing slash. Even when redirect_trailing_slashes: false, there still needs to be an index document/main page on resources that end in a slash, because otherwise that just creates an empty folder. (There just shouldn't be extra index documents/main pages).

That fails in your current code:

add      /zh/                                                         (wrong)
add      /zh/about-us                                                 (right)
add      /zh/archive                                                  (right)
add      /zh/diam-a-ac-sed-suscipit-netus/                            (wrong)

The fix looks simple, though. In grow/pods/pods.py, line 239, remove the append_slashes condition:

if append_slashes and path.endswith('/') and suffix:

to:

if path.endswith('/') and suffix:

This appears to work on both the stock codelab pod and my test pod, with both values for redirect_trailing_slashes, for both URLs without trailing slashes and with.

add      /zh/index.html                                               (right)
add      /zh/about-us                                                 (right)
add      /zh/archive                                                  (right)
add      /zh/diam-a-ac-sed-suscipit-netus/index.html                  (right)

I really appreciate you considering this use case. Grow was on my short list of static site generators to evaluate (along with Cactus, Hugo, Pelican, PyBlosxom, Handroll and Acrylamid), but I'd have to have a separate, path-rewriting deployment tool for most of them. I'm looking forward to not having to maintain my URLs by hand.

@jeremydw
Copy link
Member Author

jeremydw commented Apr 7, 2015

Interesting. I think you've spotted a difference in serving behavior between GCS and S3. It appears one can actually write branch nodes to GCS that are served as HTML. It seems as if this is allowed by GCS.

Check out this example (it's explicitly configured as a branch node with $path: /contact-us/ and redirect_trailing_slashes: off):
http://slashes-off.growlaunches.com/contact-us/ (works)
http://slashes-off.growlaunches.com/contact-us (404s)
http://slashes-off.growlaunches.com/contact-us/index.html (404)

Peeking into the web Cloud Storage browser:
image

image

Quick question – when you say /zh/diam-a-ac-sed-suscipit-netus/ fails with the current code, how does it fail? Does it fail to deploy to S3?

@vitorio
Copy link

vitorio commented Apr 7, 2015

Oh, how interesting.

No, it says it deploys everything successfully to S3, but http://vitorio-test-grow.s3-website-us-east-1.amazonaws.com/posts/diam-a-ac-sed-suscipit-netus/ 404s. There's no diam-a-ac-sed-suscipit-netus or diam-a-ac-sed-suscipit-netus/index.html resource. Cyberduck says it's a 0 byte folder. It has no properties in the AWS S3 web console.

However, if I don't access it through the S3 website endpoint, I can get to it: http://vitorio-test-grow.s3.amazonaws.com/posts/diam-a-ac-sed-suscipit-netus/ works as a resource named like a branch node.

So, I guess I'm not sure what the correct behavior for S3 is, here, then. If you were really conscious of your URLs enough to use this option, you probably wouldn't have both ones with and without trailing slashes, making my test case not really a realistic condition. And there are use cases for deploying to S3 without using the website endpoint. I guess your current code is correct?

@vitorio
Copy link

vitorio commented Apr 8, 2015

After thinking on it, I feel like your current behavior is correct. It just has some per-destination side effects which may differ from expectations about a given object store v. a filesystem, which can be covered in documentation, something like:

When redirect_trailing_slashes: false is used with paths ending in trailing slashes, both GCS and S3 will successfully deploy a leaf node with a trailing slash, without needing an index document or main page. GCS will let you access that node directly. However, S3 will report a 404 for that node from the website endpoint, which is the typical deployment scenario; it will only return data on the normal S3 endpoints.

Currently, Grow automatically configures your S3 bucket to have a website endpoint, but the normal S3 endpoints will continue to work. If you are deploying to S3 and need the website endpoint to work, either don't use redirect_trailing_slashes: false with paths ending in trailing slashes, or include the index document in those particular paths.

@jeremydw
Copy link
Member Author

Again, sorry for the delay, and thanks for thinking this through with me.

In response to your observation, my guess is that if we configure an entirely separate domain with a CNAME that points to that S3 bucket, the branch node will work as a named resource (as http://vitorio-test-grow.s3.amazonaws.com/posts/diam-a-ac-sed-suscipit-netus/ does).

I have also thought about it more, and I do think the behavior in this PR is correct now. Primarily because, in the frontmatter, the user has explicitly configured the documents to have the canonical URL at /zh/ (for example), not /zh/index.html.

What do you think about merging this in as-is, and then adding new functionality for specifying whether/how to auto-configure website endpoints for buckets (currently, Grow sets the bucket as a website endpoint every time we deploy – let's make this configurable).

Does this current branch node behavior in S3, create a problem for you, and if so, should we revisit the flags so that folks can configure whether the suffix is appended to the document, independent of the slash behavior? I'm afraid to overcomplicate things, but also want to be flexible enough and make sure it works for you. :)

@vitorio
Copy link

vitorio commented Apr 10, 2015

I think merging as-is is fine. Whether or not to auto-configure S3 endpoints can be a separate issue, as can smarter MIME type determination. Neither get in the way of the common use case here.

The current branch node behavior is not a problem, now that I understand it. I think it's fine given a little documentation, and I wouldn't expect most people to turn off the default lightly.

Thanks!

@jeremydw
Copy link
Member Author

Thanks! Filing grow/grow.dev#11 for documentation.

jeremydw added a commit that referenced this pull request Apr 10, 2015
Add ability to configure trailing slash redirect behavior
@jeremydw jeremydw merged commit eced78c into develop Apr 10, 2015
@jeremydw jeremydw deleted the develop-slashes branch August 30, 2015 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants