Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Lambda@Edge Origin Request to rewrite URIs, properly host S3 static sites behind CF #1

Open
strogonoff opened this issue May 7, 2018 · 10 comments
Labels
enhancement New feature or request

Comments

@strogonoff
Copy link
Contributor

strogonoff commented May 7, 2018

Rewriting URIs at CloudFront’s request from origin could ensure that:

  • CloudFront gets index.html files where it expects to find them, when paths without index.html are requested
  • Request gets redirected to URI with trailing slash where possible if no trailing slash is supplied

Function

'use strict';

const pointsToFile = uri => /\/[^/]+\.[^/]+$/.test(uri);
const hasTrailingSlash = uri => uri.endsWith('/');
const needsTrailingSlash = uri => !pointsToFile(uri) && !hasTrailingSlash(uri);

exports.handler = (event, context, callback) => {
    // Extract the request from the CloudFront event that is sent to Lambda@Edge 
    var request = event.Records[0].cf.request;

    // Extract the URI and query string from the request
    const olduri = request.uri;
    const qs = request.querystring;

    // If needed, redirect to the same URI with trailing slash, keeping query string
    if (needsTrailingSlash(olduri)) {
        return callback(null, {
            body: '',
            status: '302',
            statusDescription: 'Moved Temporarily',
            headers: {
            location: [{
                key: 'Location',
                value: qs ? `${olduri}/?${qs}` : `${olduri}/`,
            }],
            }
        });
    }

    // Match any '/' that occurs at the end of a URI, replace it with a default index
    const newuri = olduri.replace(/\/$/, '\/index.html');

    // Useful for test runs
    // console.log("Old URI: " + olduri);
    // console.log("New URI: " + newuri);

    // Replace the received URI with the URI that includes the index page
    request.uri = newuri;

    // Return to CloudFront
    return callback(null, request);
};

Setup

The function can use NodeJS runtime 8.10 and needs to be connected (specifying the exact version) to Origin Request Lambda function in CF Origin settings.

Resources

@strogonoff strogonoff self-assigned this May 7, 2018
@ronaldtse
Copy link
Contributor

Thanks @strogonoff ! The code currently converts "/XXX" with "/XXX/index.html", but in our static Jekyll sites, there are two possibilities:

  1. Convert "/XXX" into "/XXX.html" (for normal pages)
  2. Convert "/XXX" into "/XXX/index.html" (for collections, pagination)

I wonder what's the best way to do so?

cc: @ribose-jeffreylau

@ronaldtse
Copy link
Contributor

Actually it's easy to do so with the following modified code:

  1. /XXX becomes /XXX.html
  2. /XXX/ becomes /XXX/index.html
'use strict';

const pointsToFile = uri => /\/[^/]+\.[^/]+$/.test(uri);
const hasTrailingSlash = uri => uri.endsWith('/');

exports.handler = (event, context, callback) => {
    // Extract the request from the CloudFront event that is sent to Lambda@Edge 
    var request = event.Records[0].cf.request;

    // Extract the URI and query string from the request
    const olduri = request.uri;
    const qs = request.querystring;

    if (pointsToFile(olduri)) {
        callback(null, request);
        return;
    }

    // Append ".html" extension
    if (!hasTrailingSlash(olduri)) {
        request.uri = uri + ".html";
    } else {
    // Append "index.html"
        request.uri = uri + "index.html";
    }

    // Return to CloudFront
    return callback(null, request);
};

@strogonoff
Copy link
Contributor Author

@ronaldtse

You’re right, overall this could indeed cause issues with some Jekyll sites, although it didn’t in mine, which didn’t use collections. Tangentially, I found that hooking into Jekyll’s Ruby plugin architecture and generating pages/paths from custom YAML structure as needed provides the required flexibility, while collections are limiting and only suitable for blog-like sites.

In the end it might not make sense to design a one-size-fits-all function and instead leverage Terraform’s architecture to supply the best simplest function for each specific site (e.g., Ribose Open might end up using one, and the static site another, if any). I’ll test one for Ribose Open specifically.

The code currently converts "/XXX" with "/XXX/index.html",

It’s a technicality but the code will not straight up convert /XXX to /XXX/index.html. The code is supposed to treat /XXX as a path that is missing a trailing slash, and therefore redirect user from /XXX to /XXX/ (this ensures each canonical URL is the one with the slash, so that both third parties won’t get 404 if they forget a slash, and search engines don’t get confused with same content available both with and without slash). The subsequent request to /XXX/, though, is supposed to get rewritten to /XXX/index.html when CF queries S3 origin.

@ronaldtse
Copy link
Contributor

Indeed, it would be ideal to do it as you described. Hooking in with a Jekyll plugin could work.

The point is we need to be consistent in naming foo/index.html because the Jekyll site structure by default uses foo.html, which is ambiguous for foo vs foo/. If we can say for certainty that everything else is foo/index.html then it is easy. Maybe that is something you can enforce.

@ronaldtse
Copy link
Contributor

In fact, if Lambda@CF is able to query S3 to see whether foo.html or foo/index.html exists, the function can point it to the correct path.

@strogonoff
Copy link
Contributor Author

strogonoff commented May 28, 2018

To clarify, the reason Lambda is well-suited for this is that you probably don’t want to tie site generation logic to any particular hosting. Might be better to have any adapters required by AWS within AWS itself on same abstraction level, if that’s possible.

By the way this doesn’t seem to be an urgent problem (unless I’m mistaken) so I put this item on hold for now. If anyone’s willing feel free to implement this.

What I would do when I have the time is test this setup within concrete full Terraform project for a vanilla Jekyll+S3/CF site, and also check if the configuration is complex enough to warrant a module. (Terraform best practices discourage splitting logic across reusable modules where the stack is simple. If this is only to enable collaboration without sharing credentials, I suspect there may be better ways of doing that than moving everything to modules.) Then I’d iterate on Lambda code and periodically re-provision everything from scratch to ensure it all works properly.

@ronaldtse
Copy link
Contributor

@eugenetaranov would you have time to integrate this? Thanks!

@strogonoff strogonoff removed their assignment Jul 15, 2018
@ronaldtse ronaldtse added the enhancement New feature or request label Apr 7, 2019
@gssy-avsk
Copy link

@ronaldtse I've a bootstrap html website and I need to remove the .html file extensions to occur on my website urls. I used your above code in Lambda@Edge but they are not working in the live feature. Is there anything to keep in mind while implementing your lambda code or there's an update on it for node 10.x. Please help!

@mtoorop-ximedes
Copy link

mtoorop-ximedes commented Dec 24, 2021

@strogonoff Thank you for the example!

Btw, that tinyendian link seems to broken and pointing to various spam/malicious websites. Consider updating the original issue description to remove that link to keep people safe :)

@ribose-jeffreylau
Copy link

@mtoorop-ximedes Thanks for noticing. The link has been updated. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants