Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Static Website] Getting 404 on nested paths, and security headers are not applied #172

Closed
adriencaccia opened this issue Feb 27, 2022 · 8 comments · Fixed by #175
Closed
Labels
bug Something isn't working

Comments

@adriencaccia
Copy link
Contributor

adriencaccia commented Feb 27, 2022

Description

When trying to directly access a nested path of a SPA deployed with the static website construct, we get a 404 response and the security headers are not applied.

Although a 404 is received, the content is the page is still the desired one.

Regarding the 404:

When building a SPA with a modern framework, we want to leave the handling of "bad routes" to the application itself. Since we cannot know in advance which routes are served by the application, I think a better approach would be to always return a 200 on nested routes.

Why we get the 404

The construct is deploying a Cloudfront with a S3 with website hosting as an origin. The default behavior of a S3 with website hosting when trying to request a non-existing file is to return a 404 and display the error document if it is set.
For SPAs, we want this error document to be the same as the root document of our app. Lift is configured to work like this by default.
So, when hitting a nested route, Cloudfront receives a 404 from S3 and then return it to the client.

Why the security headers are not applied

The security headers are applied thanks to a Cloudfront function deployed by Lift, which is set up to be invoked by viewer response events.
Per the Restrictions on edge functions docs, CloudFront does not invoke edge functions for viewer response events when the origin returns HTTP status code 400 or higher.
Thus when trying to get a nested route, Cloudfront will not apply the function because S3 (the origin) is returning a 404.

Proposed solution

Adding a Lambda@Edge triggered by the origin response trigger to rewrite 4xx to 200 will solve the problem.
This is a common use case for Lambda@Edge, per the docs.
I will submit a PR following this issue to implement it !

Kudos to @CorentinDoue and @guillaumem-theodo for finding this bug !

How to Reproduce

A created a single file SPA with react and nested routes to showcase the problem.
Accessing the deployed cloudfront domain with the nested routes /login or /register will result in the bug described above.

Lift config in serverless.yml:

constructs:
  landing:
    type: 'static-website'
    path: 'build'

build/index.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8" />
    <link rel="shortcut icon" type="image/jpg" href="logo.svg" />

    <!-- Note: When deploying to production, replace "development.js"
      with "production.min.js" in each of the following tags -->

    <!-- Load React and React DOM -->
    <!-- See https://reactjs.org/docs/add-react-to-a-website.html to learn more -->
    <script
      src="https://unpkg.com/react@>=17/umd/react.development.js"
      crossorigin
    ></script>
    <script
      src="https://unpkg.com/react-dom@>=17/umd/react-dom.development.js"
      crossorigin
    ></script>

    <!-- Load history -->
    <script
      src="https://unpkg.com/history@5/umd/history.development.js"
      crossorigin
    ></script>

    <!-- Load React Router and React Router DOM -->
    <script
      src="https://unpkg.com/react-router@6/umd/react-router.development.js"
      crossorigin
    ></script>
    <script
      src="https://unpkg.com/react-router-dom@6/umd/react-router-dom.development.js"
      crossorigin
    ></script>

    <!-- Load babel standalone -->
    <script src="https://unpkg.com/@babel/standalone/babel.min.js"></script>
  </head>
  <body>
    <div id="root"></div>
    <script type="text/babel" data-type="module">
      const Link = ReactRouterDOM.Link;
      const Route = ReactRouterDOM.Route;
      const BrowserRouter = ReactRouterDOM.BrowserRouter;
      const Routes = ReactRouterDOM.Routes;

      const App = () => (
        <BrowserRouter>
          <Routes>
            <Route path="/" element={<Home />} />
            <Route path="/login" element={<Login />} />
            <Route path="/register" element={<Register />} />
          </Routes>
          <ul>
            <li>
              <Link to="/">Home</Link>
            </li>
            <li>
              <Link to="/login">Login</Link>
            </li>
            <li>
              <Link to="/register">Register</Link>
            </li>
          </ul>
        </BrowserRouter>
      );

      const Home = () => <h1>Home</h1>;
      const Login = () => <h1>Login</h1>;
      const Register = () => <h1>Register</h1>;

      ReactDOM.render(<App />, document.querySelector("#root"));
    </script>
  </body>
</html>

Additional Information

Versions of npm packages:

  npmPackages:
    serverless: ^3.2.0 => 3.2.0 
    serverless-lift: ^1.12.1 => 1.12.1 
@adriencaccia adriencaccia added the bug Something isn't working label Feb 27, 2022
@mnapoli
Copy link
Member

mnapoli commented Feb 27, 2022

Adding a Lambda@Edge triggered by the origin response trigger to rewrite 4xx to 200 will solve the problem.
This is a common use case for Lambda@Edge, per the docs.

Thanks a lot for all these details!

Lambda@Edge is unfortunately very invasive and unpractical, at least from my original experience (creates Lambda functions in many regions, logs in many different regions). For hosting a static website I'd love to avoid that if possible.

Do you think there is any other option than using Lambda@Edge?

@adriencaccia
Copy link
Contributor Author

adriencaccia commented Feb 27, 2022

If we want to keep the S3 set up with website hosting, thus leaving the configuration of error behaviors on its side, I'm afraid we cannot 😕 .

A solution that does not require Lambda@Edge would be to revert #165, manually generate the OAI to allow s3:ListBucket and confgure the 404 behavior on cloudfront side.
But even then, the security headers will still not be applied, because the error behavior (turning 404 into 200) is applied after the attempt to invoke the viewer response cloudfront function.

I am not 100% sure of this, I'm guessing that it is the problem that @CorentinDoue is facing.
He is using serverless-lift@1.11.0 (a version with error behavior on cloudfront side), and he receives 200 on nested routes, but the security headers are not applied.

@CorentinDoue
Copy link

Yes, if cloudfront gets an error from S3 (a 404 for example), the CloudFront functions are not triggered. Even if this error is explicitly handled (returning a 200 and index.html for example).

With serverless-lift@1.11.0 I don't have problems with 404, the errors are handled by CloudFront. But there are no security headers on nested routes because the CloudFront function responsible for adding them is not triggered

@CorentinDoue
Copy link

We also found another possible solution that doesn't require Lambda@Edge on StackOverflow: https://stackoverflow.com/a/69479703/11788663

It adds a viewer request CloudFront function to transform the request from my-site.com/my-sub-route into mysite.com/index.html for all resources except a list of allowed extensions. The S3 will find the index.html and respond a 200. The CloudFront function will add the security headers.

@adriencaccia suggest we split the static-website construct into static-website and spa-website construct because spa requires special treatments

@adriencaccia
Copy link
Contributor Author

@adriencaccia suggest we split the static-website construct into static-website and spa-website construct because spa requires special treatments

Indeed, we can see that to make SPA work we will require special configurations.

What do you think of adding a SPA construct with the solution specified in @CorentinDoue comment ? ie: using a viewer request cloudfront function to rewrite the URI to index.html ?

@mnapoli
Copy link
Member

mnapoli commented Feb 28, 2022

Yeah, the more we progress on this construct the more it seems like a separating SPA from purely static websites makes sense.

This is a big change though (which I personally still prefer over Lambda@Edge), @fredericbarthelet is that OK with you?

@fredericbarthelet
Copy link
Collaborator

Thanks @adriencaccia for opening this issue and @CorentinDoue for the additional insights.
I took a quick look at what Amplify offers as port of their hosting strategy in order to have a best practice point of comparaison of what we're trying to achieve here.

Amplify provides 2 hosting strategies:

  • Amplify console hosting
  • AWS Cloudfront + S3

The later one, which is closer to our current solution does not implement any lambda@edge. It is not possible to add custom headers as port of Cloudfront response.

The first one however supports the requested feature described above. You can:

The SPA redirect strategy uses a dedicated regex pattern to ensure only SPA paths are redirected to index.html. All other static assets are not rewritten. Would such strategy with the correct regex allow to maintain a single construct ? Static site which are not SPA would not be match by the regex and will allow those kind of app to function as expected. It would still require the implementation of a lambda@edge to enforce such redirect.

What I suggest:

  • not split the construct and implement the lamda@edge at viewer request level to redirect requests not made to assets
  • if this configuration works fine both for SPA and static website, we ship it as is
  • it this configuration does not work with both (regex does not allow distinction in some case), we build 2 constructs, but keep core logic as an abstract construct being extended by 2 specific usable constructs - one for SPA, one for static sites. Allowing most of the logic not to be duplicated.

@adriencaccia @CorentinDoue @mnapoli, WDYT ?

@adriencaccia
Copy link
Contributor Author

adriencaccia commented Mar 1, 2022

It seems to me that using the regex for redirects in static website (not SPAs) is risky. You can see that the regex of amplify does redirect *.html files, which is something we do not want to do for static websites.

I think your recommendation to have 2 constructs is great.

not split the construct and implement the lamda@edge at viewer request level to redirect requests not made to assets

A small correction, it will be a cloudfront function and not a lambda@edge

If we all agree, I'd be happy to implement it 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants