Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

As a Content author I don't want my binary files to be cached for 2 weeks #121

Closed
paulkilla opened this issue Dec 9, 2015 · 9 comments
Closed

Comments

@paulkilla
Copy link
Contributor

Currently our .htaccess file set's ExpiresDefault A1209600

Which means we are caching it for 2 weeks... many files get updated with the same name (unfortuantly) or crawled by google with a link directly to the file, rather than /file/ so we need to strike a happy medium between high cache times and happy content authors.

@paulkilla
Copy link
Contributor Author

Created a branch that makes the following files only cache for 30mins (pdf|doc|docx|txt|xls|xlsx|ppt|pptx|pps|ppsx|odt|ods|odp|mp3|mov|mp4|m4a|m4v|mpeg|avi|ogg|oga|ogv|weba|webp|webm):
https://github.com/govCMS/govCMS/tree/binary_cache_times

Just needs a PR and approval etc.

@fiasco
Copy link
Contributor

fiasco commented Dec 9, 2015

Hey @paulkilla , we shouldn't need to have custom branches in the main govCMS repo for changes. Instead, contributors should fork govCMS to their own account and make changes there. That way they can submit back easily as a pull request. This is also a viable workflow for non-maintainer contributors.

You can go ahead and remove that branch from the core repo.

@paulkilla
Copy link
Contributor Author

See govCMS/GovCMS#122

@fiasco
Copy link
Contributor

fiasco commented Dec 9, 2015

If files need to be updated then it sounds like a Digital Asset Management (DAM) requirement? We should note that govCMS is a CMS and not a DAM. We do support the use of files PDF but the not the ability to manage them with such advanced capabilities.

It would be great to get some context around why these files must be files and cannot be HTML pages instead (e.g. its because they're images or audio). Just want to be sure we're meeting a requirement that doesn't discourage use of structured data that help make the site more semantic.

@gollyg - probably good to get your insights here too.

@teamglenny
Copy link

Josh's points around structured data not withstanding, there is an immediate need to support agencies that publish in file formats such as .pdf that are consumed by their audience on a regular basis. These files are frequently updated with changing information and if they are to be an authoritative source of information we can't have technology such as cache impeding their consumption.

@WebProject2015
Copy link

Our issue with file caching is not the length of time the files are cached for, but our limited ability to remove individual URLs from the cache on demand on a daily basis. In fact, an even longer cache period would be absolutely fine if we could remove individual files from the cache immediately. Our perfect world scenario looks like this:

  1. A user unpublishes a page or file
  2. An automatic call is made to Varnish to remove that object from the Varnish cache.
  3. An automatic call is made to Akamai to remove that object from the Akamai cache

We actually think this scenario is a publisher's expected result from any CMS when unpublishing a document.

The second best scenario would be our admins having the ability to remove specific objects from both caches without having to raise a ticket or clear an entire cache.

@gollyg
Copy link
Contributor

gollyg commented Jan 19, 2016

I think that the real issue seems to be the ability to purge a file from all levels of cache on demand (or on file update). I don't think that reducing the length of time that files are in cache is necessarily going to provide a solution, and it may be detrimental to the platform as a whole.

At the moment there are workarounds available, but the real solution seems to be providing that purge ability to the site editor.

@aleayr
Copy link
Contributor

aleayr commented Jan 22, 2016

I agree with @WebProject2015 and @gollyg, a way to purge from the cache and putting that power back into the hands on each site owner would be ideal.

Either an automated process when the content is updated, a purge request is sent, followed by a reindexing, or at least allowing content editors to enter an asset URL to purge would be a good way forward.

Some options to think about:
https://www.drupal.org/project/akamai
https://www.drupal.org/project/acquia_purge
https://www.drupal.org/project/expire

@paulkilla, @invisigoth: Thoughts?

@fiasco And potentially, if we decided to do this, since these modules are specific to the govCMS SaaS environment, should they be left out of the distribution itself, and moved into the Acquia build/deploy process for the SaaS environment?

If PaaS users (or those using the distro on other hosting platforms) wanted to add them in, no worries, but would reduce the bloat of the distro if it wasn't the default?

@fiasco
Copy link
Contributor

fiasco commented Jan 24, 2016

I think there is value in illustrating the shortcomings of a govcms implementation not on the hosted platform. However, we shouldn't make it hard for non-hosted users to use govCMS.

So in places where we integrate govCMS with the platform (acquia search, acquia connector, acquia SF and akamai), we should wrap these into features than can be turned on or off. Off by default for the open source distribution and on by default for use on the hosted platform.

So in non-hosted use of govCMS, the features are available but off. This shows an ability they cannot utilise but also doesn't make govcms usage outside the hosted platform harder.

@fiasco fiasco closed this as completed Mar 7, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants