Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Don't update Last-Modified when cache-extending #1085

Closed
stevemayhew opened this issue May 26, 2015 · 7 comments
Closed

Don't update Last-Modified when cache-extending #1085

stevemayhew opened this issue May 26, 2015 · 7 comments

Comments

@stevemayhew
Copy link

When fetching a local resource, pagespeed is adding a Last-Modifed header. This breaks extend-cache function completely

[ec2-user@ip-172-31-6-55 ~]$ curl -s -D - http://localhost/assets/tve/common/js/utils/i18n_manager.js  -o /dev/null 
HTTP/1.1 200 OK
Date: Tue, 26 May 2015 19:45:00 GMT
Server: Apache
Last-Modified: Thu, 14 May 2015 12:01:20 GMT
Accept-Ranges: bytes
Content-Length: 3404
Works: abc
Cache-control: public, max-age=600
Content-Type: text/javascript

[ec2-user@ip-172-31-6-55 ~]$ curl -s -D - http://localhost/assets/tve/common/js/utils/i18n_manager.js  -o /dev/null 
HTTP/1.1 200 OK
Date: Tue, 26 May 2015 19:45:04 GMT
Server: Apache
Content-Length: 3404
Accept-Ranges: bytes
Works: abc
Cache-control: public, max-age=600
Etag: W/"PSA-D7c4hwVIR0"
Last-Modified: Tue, 26 May 2015 19:45:04 GMT
X-Content-Type-Options: nosniff
Content-Type: text/javascript

[ec2-user@ip-172-31-6-55 ~]$ curl -s -D - http://localhost/assets/tve/common/js/utils/i18n_manager.js  -o /dev/null 
HTTP/1.1 200 OK
Date: Tue, 26 May 2015 19:45:09 GMT
Server: Apache
Content-Length: 3404
Accept-Ranges: bytes
Works: abc
Cache-control: public, max-age=600
Etag: W/"PSA-D7c4hwVIR0"
Last-Modified: Tue, 26 May 2015 19:45:09 GMT
X-Content-Type-Options: nosniff
Content-Type: text/javascript

As you can see from the last 2 curl's that pagespeed added a Last-Modified header with date/time as now (it keeps changing)

This faithfully is forwarded along to the service of the hashed file

[ec2-user@ip-172-31-6-55 ~]$ curl -s -D - http://localhost/assets/tve/common/js/utils/i18n_manager.js.pagespeed.ce.D7c4hwVIR0.js  -o /dev/null 
HTTP/1.1 200 OK
Date: Tue, 26 May 2015 19:47:06 GMT
Server: Apache
Accept-Ranges: bytes
Works: abc
Expires: Wed, 25 May 2016 19:47:06 GMT
Cache-Control: max-age=31536000
Etag: W/"0"
Last-Modified: Tue, 26 May 2015 19:47:06 GMT
Content-Length: 3404
Vary: Accept-Encoding
Content-Type: text/javascript

[ec2-user@ip-172-31-6-55 ~]$ curl -s -D - http://localhost/assets/tve/common/js/utils/i18n_manager.js.pagespeed.ce.D7c4hwVIR0.js  -o /dev/null 
HTTP/1.1 200 OK
Date: Tue, 26 May 2015 19:47:08 GMT
Server: Apache
Accept-Ranges: bytes
Works: abc
Expires: Wed, 25 May 2016 19:47:08 GMT
Cache-Control: max-age=31536000
Etag: W/"0"
Last-Modified: Tue, 26 May 2015 19:47:08 GMT
Content-Length: 3404
Vary: Accept-Encoding
Content-Type: text/javascript

Thus rendering the Expires useless.

Looks like it is re-writing the cache file each time a request is made to the hashed URL:

[ec2-user@ip-172-31-6-55 ~]$ ls -l /var/cache/mod_pagespeed/rname/ce_FKCogvqqf3LdXAitqK4V/http,3A/,2Flocalhost/assets/tve/common/js/utils/i18n_manager.js,40,40_,
-rw------- 1 webapp webapp 123 May 26 19:51 /var/cache/mod_pagespeed/rname/ce_FKCogvqqf3LdXAitqK4V/http,3A/,2Flocalhost/assets/tve/common/js/utils/i18n_manager.js,40,40_,

[ec2-user@ip-172-31-6-55 ~]$ ls -l /var/cache/mod_pagespeed/rname/ce_FKCogvqqf3LdXAitqK4V/http,3A/,2Flocalhost/assets/tve/common/js/utils/i18n_manager.js,40,40_,
-rw------- 1 webapp webapp 123 May 26 19:52 /var/cache/mod_pagespeed/rname/ce_FKCogvqqf3LdXAitqK4V/http,3A/,2Flocalhost/assets/tve/common/js/utils/i18n_manager.js,40,40_,

It would be best of PageSpeed completely ignored Last-Modified and relied on the shasum of the file for a check period.

@eldk
Copy link
Contributor

eldk commented Jun 2, 2015

Hello,

I've check this for one JS that push a lot of "waiting ..." #1048 , it's the same : most of call to that js file give the error messages + Last-Modified is modified with the last call time.

This file is not memcached, size > 1024 ko, but in file cache, and is LoadFromFile with IPRO on.

Thanks,

Eric

@jmarantz
Copy link
Contributor

jmarantz commented Jun 2, 2015

you mean 1024 kb, (1M?) That is a large JS file.

What is the exact configuration of your file-cache? Is it on a local disk?
What is the exact configuration of your LoadFromFile directive? Is it on a
local disk?

-Josh

On Tue, Jun 2, 2015 at 8:21 AM, eldk notifications@github.com wrote:

Hello,

I've check this for one JS that push a lot of "waiting ..." #1048
#1048 , it's the same
: most of call to that js file give the error messages + Last-Modified is
modified with the last call time.

This file is not memcached, size > 1024 ko, but in file cache, and is
LoadFromFile with IPRO on.

Thanks,

Eric


Reply to this email directly or view it on GitHub
#1085 (comment)
.

@eldk
Copy link
Contributor

eldk commented Jun 2, 2015

Hello,

you mean 1024 kb, (1M?) That is a large JS file.

Yes, 1 Mbytes. I Know, it's a script based on openlayers : http://api.ign.fr/tech-docs-js/2.1.2/jsdoc/index.html .

What is the exact configuration of your file-cache? Is it on a local disk?

Yes, on local disk, files up to 1024 Mbytes to file cache, others to memcached.

What is the exact configuration of your LoadFromFile directive? Is it on a local disk?

Yes, it's on local disk, only some folders : images which never changed (thumbnails, nav ...), and the folder which is dedicated to "openlayers" that contain script (one), styles, and images.

The concern domain is now set apart on its own server.

Thanks,

Eric

HTTP/1.1 200 OK
Date: Tue, 02 Jun 2015 15:59:38 GMT
Server: Apache/2.2.22
Content-Length: 2500853
Etag: W/"PSA-aj-ukSWKBM8LR"
Expires: Sun, 29 Nov 2015 15:15:45 GMT
Cache-Control: max-age=15549367
Last-Modified: Tue, 02 Jun 2015 15:59:38 GMT
X-Content-Type-Options: nosniff
Vary: Accept-Encoding
Content-Type: application/javascript

[Tue Jun 02 17:59:43 2015] [warn] [mod_pagespeed 1.9.32.3-4448 @7464] Waiting for completion of URL http://domain/folder/GeoportalExtended.js for 5 sec

2501116 juin 2 15:01 GeoportalExtended.js.pagespeed.jm.ukSWKBM8LRJEFBEfqhgO.js,

Metadata cache key:rname/aj_-f4KQ7tMTbqJuUqi_x6E/http://domain/folder/GeoportalExtended.js@@_
cache_ok:true
can_revalidate:false
partitions:partition {
optimizable: true
index: 0
type: FILE_BASED
last_modified_time_ms: 1432475299000
}
}

@stevemayhew
Copy link
Author

Hmm.. How are these comments related to the issue I am seeing? We are instructing pagespeed Serf to fetch using the localhost. Here's the salient parts of our config (with default pagespeed.conf, this comes after):

NOTE MapPagespeedOriginFromFile is not defined.

<IfModule pagespeed_module>
    # Needed because our SLB for the web terminates SSL and we are on a non-SSL port.
    # see this: https://developers.google.com/speed/pagespeed/module/https_support
    #    
    ModPagespeedRespectXForwardedProto on

    # For SSL localhost server, we need to have a file fetch path for pagespeed to use as the origin
    # server, for SSL original request we need a non-SSL localhost access path.  See the description
    # in here: https://developers.google.com/speed/pagespeed/module/domains
    #
    <IfDefine MapPagespeedOriginFromFile>
        ModPagespeedLoadFromFileMatch "^https?://.*?/assets/(.*)" "${sandbox}/assets/\\1"
    </IfDefine>
    <IfDefine !MapPagespeedOriginFromFile>
        ModPagespeedMapOriginDomain http://localhost https://*
    </IfDefine>

    # The LoadFromFile match causes page speed to avoid http fetches from origin webserver,
    # The header mods below set The origin caching policy to 10 minutes.  If we use MapOriginDomain
    # and a CDN then this dictates how frequently mod_pagespeed must re-read the content files and recompute
    # the content-hash.  As long as the content doesn't actually change,
    # the content-hash will remain the same, and the resources stored
    # in browser caches will stay relevant.
    <Location /assets/tve >
        Header unset Etag
        Header set Cache-control "public, max-age=600"
    </Location>

@stevemayhew
Copy link
Author

Here's the bug....

Looking at a pcap analysis of the packets for a pagespeed fetch (numbers are the TCP frame number in the trace)

INITIAL REQUEST - 1846
GET /assets/tve/common/js/utils/i18n_manager.js.pagespeed.ce.D7c4hwVIR0.js HTTP/1.1
Host: 52.8.56.250
User-Agent: curl/7.42.1
Accept: */*

PAGESPEED ISSUES REQUEST - 1851 
GET /assets/tve/common/js/utils/i18n_manager.js HTTP/1.1
Host: 52.8.56.250
If-Modified-Since: Tue, 02 Jun 2015 15:53:22 GMT
PageSpeed: off
User-Agent: Serf/1.1.0 mod_pagespeed/1.9.32.3-4448

PAGESPEED GETS RESPONSE - 1853
HTTP/1.1 304 Not Modified
Date: Thu, 04 Jun 2015 20:56:00 GMT
Server: Apache
Cache-control: public, max-age=600

PAGESPED GENERATES RESPONSE - 1858
HTTP/1.1 200 OK
Date: Thu, 04 Jun 2015 20:56:00 GMT
Server: Apache
Accept-Ranges: bytes
Expires: Fri, 03 Jun 2016 20:56:00 GMT
Cache-Control: max-age=31536000
Etag: W/"0"
Last-Modified: Thu, 04 Jun 2015 20:56:00 GMT
Content-Length: 3404
Vary: Accept-Encoding
Content-Type: text/javascript

Because the 304 response does not have LastModified, this code in pagespeed, ads the "now" Last-Modified header:
https://github.com/pagespeed/mod_pagespeed/blob/baf75289f2da765257c4c5dff34f7fe8466aa74b/net/instaweb/rewriter/server_context.cc

If no such header exists in the request. It should check the 304

@jmarantz
Copy link
Contributor

jmarantz commented Aug 5, 2015

@elkd I think your comment makes a lot more sense in the context of Issue 1048. Please follow up there (to the extent you haven't already).

@stevemayhew I am trying to repro your problem but no success yet. I've managed to repro the problem:

wget -q -O - --save-headers localhost:8080/mod_pagespeed_test/load_from_file/index.html?PageSpeedFilters=extend_cache
...
wget -q -S localhost:8080/mod_pagespeed_test/load_from_file/web_dir/A.example.ssp.css.pagespeed.cf.mFYddG9PMG.css

each time I repeat the second wget I get a new Last-Modified header. The problem is specific to extend_cache, because that filter doesn't bother cache the rewritten file (it is usually identical or trivially transformed from the origin file).

For other filters, the rewritten file and its computed Last-Modified header is saved, so the Last-Modified reflects the timestamp when PageSpeed optimized it, which seems correct to me :) But extend_cache it re-"optimizes" it on every request, which does not really give us what we want.

This is something we should fix, but I'm trying to understand what the symptom is that this exhibits. The filename is already signed and cached for a year, so we don't really need if-modified-since behavior or anything like that.

@jmarantz jmarantz changed the title Pagespeed adding Last-Modified header with "now" extend_cache filter adding Last-Modified header with "now" for Aug 5, 2015
@jmarantz jmarantz self-assigned this Aug 5, 2015
@jmarantz
Copy link
Contributor

jmarantz commented Aug 6, 2015

Sorry; garbled message above; I did get the repro case. I have a candidate fix which needs to be reviewed.

Also in the comments above there is an observation that we rewrite an entry in the cache:

[ec2-user@ip-172-31-6-55 ~]$ ls -l /var/cache/mod_pagespeed/rname/ce_FKCogvqqf3LdXAitqK4V/http,3A/,2Flocalhost/assets/tve/common/js/utils/i18n_manager.js,40,40_,
-rw------- 1 webapp webapp 123 May 26 19:51 /var/cache/mod_pagespeed/rname/ce_FKCogvqqf3LdXAitqK4V/http,3A/,2Flocalhost/assets/tve/common/js/utils/i18n_manager.js,40,40_,

I think this is an inefficiency in our system, but it's separate from the Last-Modified problem. I'll split that off into a separate bug: #1122

@jeffkaufman jeffkaufman changed the title extend_cache filter adding Last-Modified header with "now" for Don't update Last-Modified when cache-extending Nov 13, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants