Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content Encoding Issue after generating Blitz cache with SSI enabled #564

Closed
MalcolmJohnston opened this issue Oct 3, 2023 · 19 comments
Closed
Labels
question Further information is requested

Comments

@MalcolmJohnston
Copy link

I have recently enabled SSI on our staging site and added includeCached directives to pull in the header, footer and a couple of other includes.

This works well, and I can see the pages and includes being generated as I expect.

When the pages and includes are generated organically the site works as expected.

However, if I generate the cache and then visit the site I get content encoding errors.

If I clear the cache, and visit the page again, then the page renders as expected.

Craft version: Craft Pro 4.5.6.1
Blitz version: 4.5.5
Site: https://staging.mountainkingdoms.com

@MalcolmJohnston MalcolmJohnston added the question Further information is requested label Oct 3, 2023
@MalcolmJohnston
Copy link
Author

Hi @bencroker just seen that you are at the dotall conference for the next few days. This is not urgent as we have not deployed these changes to live. So please feel free to get back to me after dotall.

@bencroker
Copy link
Collaborator

That’s right, thanks for your understanding.

@bencroker
Copy link
Collaborator

Which cache generator are you using, @MalcolmJohnston? And can you show me an example of the encoding errors?

@MalcolmJohnston
Copy link
Author

MalcolmJohnston commented Oct 9, 2023

Apologies for not getting back sooner, I had some other changes to push through approval so had taken the cached includes off of the staging site temporarily.

We are using the HTTP cache generator. I cannot actually show you a content encoding error as the browser doesn't render anything. If you visit https://www.staging.mountainkingdoms.com then you will see what I mean.

However, I think I might have found something interesting.

I curled the home page when it was cached organically, and after the cache had been generated via a queue job/cli.

The difference is that when visiting after the page has been organically cached the cached includes are pulled through correctly. When visiting the page after generating via a queue job or CLI we get the below HTML for each of the includes.

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
<style>
    body {margin: 20px; font-family: helvetica, sans-serif; max-width: 800px;}
    .error {color: #e00;}
    pre {font-size: 16px;}
    h1 {font-size: 28px;}
</style>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>

I attach a ZIP file with examples of the index page being generated organically and via a queue job.
index-generated.zip

As an aside to all that, do you think it is worth trying the Local generator? Is there slightly less overhead using that approach as opposed the HTTP generator?

Thanks in advance for your help with this!

@bencroker
Copy link
Collaborator

Just to clarify, did you clear the cache after updating to or beyond 4.5.0? That was necessary to ensure that cached includes and pages that contain SSI or ESI includes are not compressed.

If that doesn’t help then perhaps using the Local Generator is worth a try. I’m still not sure what the underlying issue is, however, so can’t be sure that it will make any difference.

@MalcolmJohnston
Copy link
Author

To be honest I am not 100% sure whether I did clear the cache or not.

So to be sure I have gone back and cleared the cache, and flushed the cache. Then regenerated organically, which works. Generating (still using HTTP generator) gives the same Content Encoding error as before.

I will let the site generate, and then go back and retry the above process with the Local Generator and let you know how I get on.

Thanks again.

@bencroker
Copy link
Collaborator

Let me know how this went so we can resolve this for you.

@MalcolmJohnston
Copy link
Author

Thanks for the nudge on this.

I have tried the Local generator, but that doesn't work for me. In the Blitz logs I am getting the following.

[2023-10-12 11:12:26] Unable to verify your data submission.

And no pages are generated. Organic generation still works well.

I have then switched back to the HTTP Generator, but when I run that from the queue then the server locks up, and cannot serve any pages.

I restarted apache, and then ran the HTTP Generator from the CLI which works, but we are back to the Content Encoding issues.

In case it helps shed any light the site is hosted on a VPS using ServerPilot. So we are using Nginx as a reverse proxy with Apache.

The relevant section of the Apache config is below, with the includes bits added as per the docs.

<Directory ${DOCUMENT_ROOT}>
    AllowOverride All
    Require all granted

    #- Enable Server-Side Includes
    #- https://httpd.apache.org/docs/current/howto/ssi.html
    Options +Includes
    AddOutputFilter INCLUDES .html .php


    RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-f
    RewriteRule \.php$ - [R=404]
</Directory>

I have made these changes within the application configuration files for Apache, but maybe I need to put the includes changes in the global Apache config files? I will give that a go anyway to see if it makes any difference.

@MalcolmJohnston
Copy link
Author

Hello again @bencroker

Ok so I have been looking at this again this afternoon.

I have noticed that I had a lot of defunct rewrite rules and so on in an .htaccess file on the server.
After reading https://nystudio107.com/blog/stop-using-htaccess-files-no-really linked to from the Blitz docs, I decided this would be a good time to jack that in and have all the Apache configuration in .conf.

That is now done. I have also stripped down my .conf file to just the following, removing a bunch of rewrites that looked off to me (or could be handled more cleanly using Retour).

This leaves me with:

AcceptPathInfo on

DirectoryIndex index.html index.htm index.php

<Directory ${DOCUMENT_ROOT}>
    Options FollowSymLinks

    #- Disallow .htaccess files which are not good for performance
    AllowOverride None
    Require all granted

    #- Enable Server-Side Includes
    #- https://httpd.apache.org/docs/current/howto/ssi.html
    Options +Includes
    AddOutputFilter INCLUDES .html .php

    <IfModule mod_rewrite.c>
        RewriteEngine On

        #- Send 404 requests to PHP
        RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-f
        RewriteRule \.php$ - [R=404]

        # Redirect http to https
        RewriteCond %{HTTP:X-Forwarded-Proto} !=https
        RewriteCond %{SERVER_PORT} 80
        RewriteCond %{HTTP_HOST} !^local\. [NC]
        RewriteCond %{HTTP_HOST} !\.local$ [NC]
        RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

        # Redirect non www to www
        RewriteCond %{HTTP_HOST} !^www\. [NC]
        RewriteCond %{HTTP_HOST} !^local\. [NC]
        RewriteCond %{HTTP_HOST} !\.local$ [NC]
        RewriteRule ^(.*)$ https://www\.%{HTTP_HOST}/$1 [R=301,L]

        #- Blitz cache rewrite
        RewriteCond %{DOCUMENT_ROOT}/cache/blitz/%{HTTP_HOST}/%{REQUEST_URI}/%{QUERY_STRING}/index.html -s
        RewriteCond %{REQUEST_METHOD} GET
        RewriteCond %{QUERY_STRING} !token= [NC]
        RewriteRule .* /cache/blitz/%{HTTP_HOST}/%{REQUEST_URI}/%{QUERY_STRING}/index.html [L]

        # Send would-be 404 requests to Craft
        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteCond %{REQUEST_URI} !^/(favicon\.ico|apple-touch-icon.*\.png)$ [NC]
        RewriteRule (.+) index.php?p=$1 [QSA,L]
    </IfModule>
</Directory>

RewriteEngine On
RewriteCond %{HTTP:Authorization} .+
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]

<Files *.php>
    SetHandler proxy:${PHP_PROXY_URL}
</Files>

<Proxy ${PHP_PROXY_URL}>
    ProxySet timeout=3600 retry=0
</Proxy>

Now we are getting somewhere 👍

I am no longer getting content encoding errors after generating the cache via the CLI / queue job. But we still have issues with the cached includes. Good news is that you can now see the issue more easily.

image

See: https://www.staging.mountainkingdoms.com

This was using the HTTP Generator. I will try the Local generator again now that I have tidied this stuff up and see if that works better.

Will let you know.

Cheers,
Malcolm

@MalcolmJohnston
Copy link
Author

Hi Ben,

Just to let you know I tried again with the Local generator but got the same error message: Unable to verify your data submission.

I then regenerated with the HTTP Generator, expecting to get back to the state in the above comment. However, two things happened.

Firstly the site crashed, restarting Apache bought it back to life. Seconldy, now the cached includes are working on the front end of the site.

So I am at a bit of a loss now.

Reviewing the Blitz log from the last HTTP Generation I had a lot of the following errors:

[2023-10-12 20:36:05] 502 error: Bad Gateway [https://www.staging.mountainkingdoms.com/holiday/everest-base-camp-in-style?token=5cyR02snvlBXNFqM6jDuy0wLQjEd89j3]

However if I visit that page in the browser, with or without the token in the query string then the page returns ok.

@bencroker
Copy link
Collaborator

Thanks for the update @MalcolmJohnston, it sounds like you’ve made some progress. It also sounds like this may be a sporadic, environment-specific issue, which can be difficult to troubleshoot.

My advice going forward is if you get an encoding error again, that you open up the cached file, which should be at web/cache/blitz/domain.com/<uri>/index.html, and look for the SSI tag, which looks like a HTML comment. The included URI should be a valid URI that you can visit.

Can you also ensure that Compress Cached Values is disabled under the Cache Storage tab in the plugin settings, just to be sure that this isn’t playing up? It shouldn’t, but since you’re seeing encoding issues there is a chance.

@MalcolmJohnston
Copy link
Author

MalcolmJohnston commented Oct 13, 2023

Hello again @bencroker sorry on further inspection my previous comment (now deleted) wasn't quite accurate.

As per your suggestion above I have been looking at the SSI tags. This is what I have found:

Original Include generated by Blitz

<!--#include virtual="/_includes?action=blitz%2Finclude%2Fcached&index=1150487015" -->

This page does exist, and can be accessed in the browser at:

https://www.staging.mountainkingdoms.com/cache/blitz/www.staging.mountainkingdoms.com/_includes/action=blitz%252Finclude%252Fcached&index=1150487015/

Notice that I had to make the following replacements:

  1. /includes?action becomes /includes/action
  2. %2F becomes %252F

So far so interesting. Then I took this to the next step and edited the HTML generated by Blitz. If I changed the include tag to the following, then everything starts working!

<!--#include virtual="/cache/blitz/www.staging.mountainkingdoms.com/_includes/action=blitz%252Finclude%252Fcached&index=1150487015/" -->

Notice that I had to:

  1. Prepend /cache/blitz/www.staging.mountainkingdoms.com to the include URL
  2. Add the trailing slash on the end of the URL

So now I think we are getting there. Just as an FYI the /_includes folder on the server looks as follows, in case those %2F were actually meant to be sub-directories?

image

Any thoughts on the above?

@bencroker
Copy link
Collaborator

The URI should read /_includes?action=blitz%2Finclude%2Fcached&index=1150487015 and https://www.staging.mountainkingdoms.com/_includes?action=blitz%2Finclude%2Fcached&index=1150487015 should return a valid response. My guess is that your rewrite rules are still off. Can you please try removing the following lines, as you really want Craft to be handling 404 responses.

        #- Send 404 requests to PHP
        RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI} !-f
        RewriteRule \.php$ - [R=404]

@bencroker
Copy link
Collaborator

Can you also please try adding this following before the rewrite block?

AllowEncodedSlashes On

@MalcolmJohnston
Copy link
Author

@bencroker thanks for getting back to me.

I have tried both of the above and it was AllowEncodedSlashes On that did the job.

You'll see that: https://www.staging.mountainkingdoms.com/_includes?action=blitz%2Finclude%2Fcached&index=1150487015 is now available.

Taking out the redirect for PHP caused some issues for me (possibly because I am running Server Pilot which is using Nginx -> Apache?).

I have regenerated the whole site, and it is working well from what I can see, so I think we can leave this one here 👍

Thanks for your patience in getting this one sorted. Much appreciated.

@bencroker
Copy link
Collaborator

bencroker commented Oct 14, 2023

That’s great to hear! I’d like to move away from having forward slashes in the query string, so that enabling AllowEncodedSlashes isn’t required in future, but SSI can be somewhat fragile in some environments.

@bencroker
Copy link
Collaborator

FYI I’ve fixed this in ec3c2af for the next release so that going forward, AllowEncodedSlashes can remain off.

@bencroker
Copy link
Collaborator

bencroker commented Oct 17, 2023

Released in 4.6.0.

@bencroker
Copy link
Collaborator

bencroker commented Dec 20, 2023

For reference, as of Blitz 4.9.4, enabling AllowEncodedSlashes is required. Since folders cannot contain slashes, I’m not sure there’s any way around this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants