Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config of module.php cache headers #1189

Merged
merged 4 commits into from Sep 11, 2019

Conversation

@melanger
Copy link
Contributor

commented Aug 21, 2019

  • caching in module.php can be configured, but defaults stay the same (as original behavior)
  • new configuration options module.php.cache, module.php.expires, module.php.etag and module.php.directives
  • makes it possible to configure Etags, Cache-control: must-revalidate etc.
  • Last-modified and Content-length are added automatically (by library)
@jornane

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

@melanger Thanks for your update! I'm still a bit confused by this. In #1179 you stated

@jornane You are right, it means a lot of extra requests (even though small).

When hitting module.php, most time is lost on round trip time and running the PHP engine. The actual running of the PHP code is fast, and the transfer of data is also fast (since most resources are small). Etag helps when the file is (1) very large or (2) takes a long time to generate.

Using sane defaults is good (no Etag, reasonable max-age, as you have done), but why make Etag configurable at all? Could you comment on reasons one may have for enabling Etag?

I can only see two cases:

  • Production: use a long max-age and set cache to public
  • Development: use max-age=0 and set cache to must-revalidate; Etag still isn't needed because time used transfering the resources is negligible
@jornane
Copy link
Member

left a comment

I checked the documentation of Symfony, I think this looks pretty good.
Myself, I don't think I'll use Etag for small resources, but I don't see a problem with supporting it.

Just one thing: I think we should call $response->setAutoLastModified(); to add the Last-Modified header.

@tvdijen

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

I wonder though if this behaviour should be global instead of specific for module.php...

@melanger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

I think we should call $response->setAutoLastModified(); to add the Last-Modified header.

In the BinaryFileResponse constructor, there is an autoLastModified parameter with true as default, so this already happens automatically.

@melanger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 22, 2019

the transfer of data is also fast (since most resources are small)

Tools like PageSpeed complain when caching is not used, because some people are still/sometimes on 3G data, low signal or slow Wi-Fi and every extra request and every 10 kB is noticable (this can be emulated in dev console by introducing simulated throttling). CSS frameworks and pictures easily have hundreds of kB, which is a pain to re-download on slow networks.
I think a reasonable production setup might also be, for example, public, max-age=86400 with Etag (or Last-modified, but Etag is more accurate). This means that the browser does not re-download the asset every 24 hours if it does not change, it just gets 304 Not modified.
I understand your concerns and the best caching policy may vary between deployments, that is why I changed the PR to be configurable, with the defaults staying the same.

@jornane

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

I think we should call $response->setAutoLastModified(); to add the Last-Modified header.

In the BinaryFileResponse constructor, there is an autoLastModified parameter with true as default, so this already happens automatically.

Ah I had overlooked that. Then it's all good :)

CSS frameworks and pictures easily have hundreds of kB, which is a pain to re-download on slow networks.

Sad, but true.

I think a reasonable production setup might also be, for example, public, max-age=86400 with Etag (or Last-modified, but Etag is more accurate).

I think Last-Modified would be better so that the server doesn't have to shasum the static files all the time (this is how Symfony calculates the Etag). It can just do a stat on the file. But both solve the problem at hand. As long as it's configurable it should be okay. :)

I wonder though if this behaviour should be global instead of specific for module.php...

@tvdijen - is there a place in SimpleSAMLphp where static resources are served outside module.php?

@tvdijen

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

www/assets

@jornane

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

www/assets

Where is the PHP code reading from there?

@tvdijen

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

It's not, sorry for the noise!

@melanger

This comment has been minimized.

Copy link
Contributor Author

commented Aug 26, 2019

I think Last-Modified would be better so that the server doesn't have to shasum the static files all the time (this is how Symfony calculates the Etag). It can just do a stat on the file.

@jornane You're right, but in my setup (multiple instances) the timestamps are not guaranteed to be synchronized, so Etag is a better option for me.

@tvdijen

This comment has been minimized.

Copy link
Member

commented Aug 26, 2019

Are we happy with this now? @jaimeperez how do you feel about this?

@jornane jornane self-requested a review Sep 4, 2019

@jornane
Copy link
Member

left a comment

I think we should call $response->setAutoLastModified(); to add the Last-Modified header.

In the BinaryFileResponse constructor, there is an autoLastModified parameter with true as default, so this already happens automatically.

Ah I had overlooked that. Then it's all good :)

On second thought, we don't want to risk the default changing in Symfony. So I think it's better to be explicit and add the function call anyway.

I have some ideas on improving the configuration:

  • It's not really clear that the module.php.* configuration directives have to do with cache, maybe give it a more describing name. Something like module.staticFileCache.* although I'm open for better ideas. ;)

  • Do we really need to be able to configure whether the information is publicly cacheable? We're serving static resources here, and the old code always marked this as public. The difference between public and private is about proxies, so I can't really think of a good reason to make this configurable.

  • If we drop making public configurable, we can merge module.php.cache and module.php.expires to one configuration setting, amount of seconds to cache.

  • Etag as a boolean setting is good I think.

  • The module.php.directives setting requires the admin to know about Symfony, how does HeaderBag::addCacheControlDirective($key,$value) work? What is the use-case for providing this configuration? Is there a way we can do this that is more obvious to configure?

@jaimeperez
Copy link
Member

left a comment

Hi @melanger!

Thanks a lot for your work! I think this is a useful contribution, and we are close to having it merged. I agree with Jørn though that we should make some modifications to the PR to make it a bit more intuitive for our users. In particular, they don't need to know that static assets are served via a module.php script, and therefore the configuration options are very confusing, and I would also like to add a layer of indirection on top of Symfony, to make sure any eventual migration away from it in the future doesn't affect our configuration.

I've attempted to draft an idea on how a configuration for this could look like. Please, don't consider it final, but just a suggestion. We can probably come up with a better naming.

Also, don't hesitate to ask if you have doubts on how you could manage such configuration structure with the SimpleSAML\Configuration class.

*
* Defaults to ['public' => true, 'max_age' => 86400]
*/
// 'module.php.cache' => ['public' => true, 'max_age' => 86400],

This comment has been minimized.

Copy link
@jaimeperez

jaimeperez Sep 4, 2019

Member

I think the configuration could use some refactor and rewording. Naming is always hard. In this case, I don't think "module.php.*" tells anything of what this config options do to someone who's not looking at this PR. Also, our current practice is to try to group together configuration options that are related (e.g. the application option or the debug option). This makes it easier to configure the software, in my opinion, and to document options that are related. I would follow the same approach.

As I said, naming is hard, but since this is related to caching of assets, would it make sense to call it just assets, then add a caching option inside? In my head, this makes sense as we might want to configure other stuff about assets in the future (e.g. the base URL for them, if we want to offload assets to their own server). In this case, and as far as I can see, almost all the available options that can be passed to symfony are set in the Cache-Control HTTP header, and there are directives for ETag and Last-Modified. I think it makes sense to have something like this then:

    /**
     * Explanation of the assets directive
     */
    'assets' => [
        'caching' => [
           /**
            * Explanation of the supported caching-related directives
            */
            'public' => true,
            'max_age' => 86400,
            'etag' => false,
            'last_modified' => ...
        ],
    ],
*
* Defaults to 10 * 60 (ten minutes).
*/
// 'module.php.expires' => 10 * 60,

This comment has been minimized.

Copy link
@jaimeperez

jaimeperez Sep 4, 2019

Member

I think this should go as well inside a general caching group (name again is orientative).

*
* Defaults to false.
*/
// 'module.php.etag' => false,

This comment has been minimized.

Copy link
@jaimeperez

jaimeperez Sep 4, 2019

Member

This one would be covered already by my previous suggestion.

*
* Defaults to [] (empty array).
*/
// 'module.php.directives' => [],

This comment has been minimized.

Copy link
@jaimeperez

jaimeperez Sep 4, 2019

Member

This is very confusing. I think we need to either move every directive we want to support to the general caching configuration group, or remove it. My main problem with this is that it forces you to know Symfony in order to configure it. Flexibility is nice, but I wouldn't depend on any Symfony-specific code.

$response->setCache(['public' => true, 'max_age' => 86400]);
$response->setExpires(new \DateTime(gmdate('D, j M Y H:i:s \G\M\T', time() + 10 * 60)));
$response->setLastModified(new \DateTime(gmdate('D, j M Y H:i:s \G\M\T', filemtime($path))));
$cacheSettings = $config->getArray('module.php.cache', ['public' => true, 'max_age' => 86400]);

This comment has been minimized.

Copy link
@jaimeperez

jaimeperez Sep 4, 2019

Member

Here we could just get the caching config group, process it to validate the supported options, and then go ahead acting accordingly for each of them. Those that cannot be passed to setCache() could be processed first and removed from the array, and then the remaining could be passed to setCache() directly. It might be necessary to have some validation in place, although I believe Symfony would handle that for us.

lib/SimpleSAML/Module.php Show resolved Hide resolved
@jaimeperez

This comment has been minimized.

Copy link
Member

commented Sep 4, 2019

  • If we drop making public configurable, we can merge module.php.cache and module.php.expires to one configuration setting, amount of seconds to cache.

That could be a good idea as well, if we don't see a good reason to give so much flexibility. If we do, though, we could still simplify it for most people like this, and still support an advanced configuration format that allows you to set every detail (e.g. supporting both an integer and an array with the set of options that could be configured). This would make processing the configuration a bit more difficult, though.

@jornane

This comment has been minimized.

Copy link
Member

commented Sep 6, 2019

I've implemented the configuration as @jaimeperez suggested, but due to a bug in Configuration.php I had to trick around with default values. We can fix that later when #1196 is merged.

@jornane
jornane approved these changes Sep 6, 2019
Copy link
Member

left a comment

The configuration is simplified now. What do you think @jaimeperez ?

@jaimeperez jaimeperez added this to the 1.18 milestone Sep 11, 2019

@jaimeperez jaimeperez merged commit ba843ff into simplesamlphp:master Sep 11, 2019

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage decreased (-0.02%) to 35.071%
Details
@jaimeperez

This comment has been minimized.

Copy link
Member

commented Sep 11, 2019

Merged! Thanks a lot @melanger and @jornane!

jornane added a commit to jornane/simplesamlphp that referenced this pull request Sep 11, 2019

@melanger melanger deleted the melanger:configurableCache branch Sep 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.