Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

IPRO with LoadFromFile gives null responses for unknown file extensions #719

Closed
nikolay opened this issue Jun 2, 2014 · 29 comments
Closed

Comments

@nikolay
Copy link
Contributor

nikolay commented Jun 2, 2014

When I turned the feature on, all my web fonts broke having zero length. In messages, I see PageSpeed trying to optimize fonts in-place.

@jeffkaufman
Copy link
Contributor

Do you have an example site I could test this on? Either one running PageSpeed with IPRO enabled or just one where I could copy the html down and test on my own machine.

@nikolay
Copy link
Contributor Author

nikolay commented Jun 3, 2014

@jeffkaufman I will try to create an isolated environment for you - having my config file and WordPress set up in there. In this particular case, it seems that when using LoadFromFile, no content type is returned, and this confuses PageSpeed, I guess. This is from the messages:

[Mon, 02 Jun 2014 17:39:01 GMT] [Info] [140062] Trying to serve rewritten resource in-place: http://www.example.com/content/themes/example/fonts/franklin-gothic/example.woff
[Mon, 02 Jun 2014 17:39:01 GMT] [Error] [140062] Loaded resource with no Content-Type http://www.example.com/content/themes/example/fonts/franklin-gothic/example.woff

@nikolay
Copy link
Contributor Author

nikolay commented Jun 3, 2014

@jeffkaufman Would you prefer a Vagrant or a Docker image?

@jeffkaufman
Copy link
Contributor

Vagrant would probably be easier.

@jeffkaufman
Copy link
Contributor

Before you send me an, actually, let me play around with the combination of IPRO and LoadFromFile. There might be something weird there when a resource has an extension we don't recognize, because in the LoadFromFile flow PageSpeed is responsible for assigning content types.

@jeffkaufman
Copy link
Contributor

I've reproduced this locally; working on it.

@jeffkaufman jeffkaufman changed the title InPlaceResourceOptimization breaking web fonts IPRO with LoadFromFile gives null responses for unknown file extensions Jun 3, 2014
@jeffkaufman
Copy link
Contributor

To reproduce:

test/pagespeed_test.conf.template:

+   server {
+     listen @@SECONDARY_PORT@@;
+    server_name lff-ipro.example.com;
+    pagespeed FileCachePath "@@FILE_CACHE@@";
+
+    pagespeed LoadFromFile
+      "http://lff-ipro.example.com/mod_pagespeed_example/lff_ipro"
+      "@@SERVER_ROOT@@/mod_pagespeed_example/lff_ipro";
+  }

$ echo "this isn't a woff file" > /apache/install/mod_pagespeed_example/lff_ipro/fake.woff

$  http_proxy=localhost:8051 curl -D- -s lff-ipro.example.com/mod_pagespeed_example/lff_ipro/fake.woff
HTTP/1.1 200 OK
Server: nginx/1.7.2
Transfer-Encoding: chunked
Connection: keep-alive
Date: Tue, 03 Jun 2014 21:22:52 GMT
Expires: Tue, 03 Jun 2014 21:27:52 GMT
Cache-Control: max-age=300
Last-Modified: Tue, 03 Jun 2014 21:16:30 GMT

No content is served.

The error log gets:

2014/06/03 17:23:28 [info] 22185#0: [ngx_pagespeed 1.7.0.0-4018] Trying to serve rewritten resource in-place: http://lff-ipro.example.com/mod_pagespeed_example/lff_ipro/fake.woff
2014/06/03 17:23:28 [error] 22185#0: [ngx_pagespeed 1.7.0.0-4018] Loaded resource with no Content-Type http://lff-ipro.example.com/mod_pagespeed_example/lff_ipro/fake.woff
2014/06/03 17:23:28 [info] 22185#0: [ngx_pagespeed 1.7.0.0-4018] [0603/172327:INFO:google_message_handler.cc(35)] Serving rewritten resource in-place: http://lff-ipro.example.com/mod_pagespeed_example/lff_ipro/fake.woff

@jeffkaufman
Copy link
Contributor

Looking at the NgxBaseFetch, I see a call to NgxBaseFetch::HandleHeadersComplete followed by a call to NgxBaseFetch::HandleDone without any calls to NgxBaseFetch::HandleWrite.

@oschaaf
Copy link
Member

oschaaf commented Jun 4, 2014

I also see this in error.log when reproducing:

2014/06/03 20:37:05 [debug] 24839#0: [ngx_pagespeed 1.7.0.0-4018] [0603/203705:VERBOSE2:in_place_rewrite_context.cc(235)] CanInPlaceRewrite false. Content-Type is not defined. Url: http://lffx.example.com/mod_pagespeed_example/test.woff

That makes me think that with this https://code.google.com/p/modpagespeed/source/browse/trunk/src/net/instaweb/rewriter/in_place_rewrite_context.cc#114 - which states in comments:

      // If we are the origin, we do not have to pass through bytes
      // if we aren't rewriting --- the caller is expected to fall back to
      // the server's native method if FetchInPlaceResource fails.

A backtrace of when NgxBaseFetch::HandleHeadersDone() is called confirms this:

#0  net_instaweb::NgxBaseFetch::HandleHeadersComplete (this=0x2ddd810) at /home/oschaaf/Code/google/ngxps-ttpr/ngx_pagespeed/src/ngx_base_fetch.cc:140
#1  0x00000000005d4dc3 in net_instaweb::AsyncFetch::HeadersComplete (this=0x2ddd810) at net/instaweb/http/async_fetch.cc:103
#2  0x00000000005d4e88 in net_instaweb::AsyncFetch::Done (this=0x2ddd810, success=false) at net/instaweb/http/async_fetch.cc:116
#3  0x00000000004b9cbd in net_instaweb::SharedAsyncFetch::HandleDone (this=0x7fcc6c000bb0, success=false)
    at ./net/instaweb/http/public/async_fetch.h:278
#4  0x00000000009208b6 in net_instaweb::RecordingFetch::HandleHeadersComplete (this=0x7fcc6c000bb0)
    at net/instaweb/rewriter/in_place_rewrite_context.cc:133
#5  0x00000000005d4dc3 in net_instaweb::AsyncFetch::HeadersComplete (this=0x7fcc6c000bb0) at net/instaweb/http/async_fetch.cc:103
#6  0x00000000005d4bb8 in net_instaweb::AsyncFetch::Write (this=0x7fcc6c000bb0, sp=..., handler=0x2dd2bb0) at net/instaweb/http/async_fetch.cc:78
#7  0x00000000009231fa in net_instaweb::(anonymous namespace)::NonHttpResourceCallback::Done (this=0x7fcc6c0008c0, lock_failure=false, 
    resource_ok=true) at net/instaweb/rewriter/in_place_rewrite_context.cc:559
#8  0x000000000090b7b2 in net_instaweb::FileInputResource::LoadAndCallback (this=0x2de3b90, 
    not_cacheable_policy=net_instaweb::Resource::kLoadEvenIfNotCacheable, request_context=..., callback=0x7fcc6c0008c0)
    at net/instaweb/rewriter/file_input_resource.cc:150
#9  0x00000000005177b6 in net_instaweb::Resource::LoadAsync (this=0x2de3b90, not_cacheable_policy=net_instaweb::Resource::kLoadEvenIfNotCacheable, 
    request_context=..., callback=0x7fcc6c0008c0) at net/instaweb/rewriter/resource.cc:102
#10 0x00000000009236d3 in net_instaweb::InPlaceRewriteContext::StartFetchReconstruction (this=0x2de3840)
    at net/instaweb/rewriter/in_place_rewrite_context.cc:616
#11 0x000000000093fa4d in net_instaweb::RewriteContext::FetchCacheDone (this=0x2de3840, cache_result=0x7fcc6c000920)
    at net/instaweb/rewriter/rewrite_context.cc:2889
#12 0x000000000094ae75 in net_instaweb::MemberFunction1<net_instaweb::RewriteContext, net_instaweb::RewriteContext::CacheLookupResult*>::Run (
    this=0x7fcc6c000b60) at ./pagespeed/kernel/base/function.h:171
#13 0x000000000061bb9a in net_instaweb::Function::CallRun (this=0x7fcc6c000b60) at pagespeed/kernel/base/function.cc:51
#14 0x00000000006ce495 in net_instaweb::QueuedWorkerPool::Run (this=0x2cf8540, sequence=0x2de0710, worker=0x2de3af0)
    at pagespeed/kernel/thread/queued_worker_pool.cc:157

@oschaaf
Copy link
Member

oschaaf commented Jun 4, 2014

@jeffkaufman
Looking at RecordingFetch::HandleHeadersComplete():
This looks wrong?:

...
      streaming_ = false;
      set_request_headers(NULL);
      set_response_headers(NULL);
      set_extra_response_headers(NULL);
      response_headers()->set_status_code(HttpStatus::kNotFound);
...

response_headers() constructs a new ResponseHeaders object when none exists, so setting the statuscode is still possible after calling set_response_headers(NULL);

The response status code is set to HttpStatus::kNotFound after they are reset to NULL.
Would something like this work?:

...
      streaming_ = false;
      set_request_headers(NULL);
      response_headers()->set_status_code(HttpStatus::kNotFound);
      set_response_headers(NULL);
      set_extra_response_headers(NULL);
...

@jeffkaufman
Copy link
Contributor

@oschaaf That change almost works, but gives memory errors. I saw Cache-Control: ax-age=300 and got suspicious, and then testing under valgrind got:

==30262== Invalid read of size 1
==30262==    at 0x4C2D1A0: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30262==    by 0x441E8C: ngx_http_header_filter (string3.h:52)
==30262==    by 0x455B0E: ngx_http_chunked_header_filter (ngx_http_chunked_filter_module.c:96)
==30262==    by 0x4568AF: ngx_http_range_header_filter (ngx_http_range_filter_module.c:268)
==30262==    by 0x4807AE: net_instaweb::(anonymous namespace)::ps_etag_header_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2077)
==30262==    by 0x45C065: ngx_http_gzip_header_filter (ngx_http_gzip_filter_module.c:257)
==30262==    by 0x483FDA: net_instaweb::(anonymous namespace)::in_place::ps_in_place_check_header_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2259)
==30262==    by 0x482475: net_instaweb::(anonymous namespace)::fix_headers::ps_html_rewrite_fix_headers_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2731)
==30262==    by 0x4865CC: net_instaweb::(anonymous namespace)::html_rewrite::ps_html_rewrite_header_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2120)
==30262==    by 0x45CCA8: ngx_http_ssi_header_filter (ngx_http_ssi_filter_module.c:341)
==30262==    by 0x4621ED: ngx_http_charset_header_filter (ngx_http_charset_filter_module.c:236)
==30262==    by 0x462E09: ngx_http_userid_filter (ngx_http_userid_filter_module.c:209)
==30262==  Address 0xdb157c2 is 34 bytes inside a block of size 36 free'd
==30262==    at 0x4C2A4BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30262==    by 0x5D4563: net_instaweb::NameValue::SharedDtor() (basic_string.h:244)
==30262==    by 0x5D45E6: net_instaweb::NameValue::~NameValue() (http.pb.cc:88)
==30262==    by 0x5D4618: net_instaweb::NameValue::~NameValue() (http.pb.cc:89)
==30262==    by 0x5D55A6: google::protobuf::RepeatedPtrField<net_instaweb::NameValue>::~RepeatedPtrField() (repeated_field.h:336)
==30262==    by 0x5D595F: net_instaweb::HttpResponseHeaders::~HttpResponseHeaders() (http.pb.cc:325)
==30262==    by 0x5D59A8: net_instaweb::HttpResponseHeaders::~HttpResponseHeaders() (http.pb.cc:327)
==30262==    by 0x5B5707: net_instaweb::Headers<net_instaweb::HttpResponseHeaders>::~Headers() (scoped_ptr.h:137)
==30262==    by 0x5BDE98: net_instaweb::ResponseHeaders::~ResponseHeaders() (response_headers.cc:66)
==30262==    by 0x558046: net_instaweb::AsyncFetch::~AsyncFetch() (async_fetch.cc:61)
==30262==    by 0x47D24D: net_instaweb::NgxBaseFetch::~NgxBaseFetch() (ngx_base_fetch.cc:47)
==30262==    by 0x47D25C: net_instaweb::NgxBaseFetch::~NgxBaseFetch() (ngx_base_fetch.cc:49)
==30262==
==30262== Invalid read of size 1
==30262==    at 0x4C2D1AD: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30262==    by 0x441E8C: ngx_http_header_filter (string3.h:52)
==30262==    by 0x455B0E: ngx_http_chunked_header_filter (ngx_http_chunked_filter_module.c:96)
==30262==    by 0x4568AF: ngx_http_range_header_filter (ngx_http_range_filter_module.c:268)
==30262==    by 0x4807AE: net_instaweb::(anonymous namespace)::ps_etag_header_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2077)
==30262==    by 0x45C065: ngx_http_gzip_header_filter (ngx_http_gzip_filter_module.c:257)
==30262==    by 0x483FDA: net_instaweb::(anonymous namespace)::in_place::ps_in_place_check_header_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2259)
==30262==    by 0x482475: net_instaweb::(anonymous namespace)::fix_headers::ps_html_rewrite_fix_headers_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2731)
==30262==    by 0x4865CC: net_instaweb::(anonymous namespace)::html_rewrite::ps_html_rewrite_header_filter(ngx_http_request_s*) (ngx_pagespeed.cc:2120)
==30262==    by 0x45CCA8: ngx_http_ssi_header_filter (ngx_http_ssi_filter_module.c:341)
==30262==    by 0x4621ED: ngx_http_charset_header_filter (ngx_http_charset_filter_module.c:236)
==30262==    by 0x462E09: ngx_http_userid_filter (ngx_http_userid_filter_module.c:209)
==30262==  Address 0xdb157c0 is 32 bytes inside a block of size 36 free'd
==30262==    at 0x4C2A4BC: operator delete(void*) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==30262==    by 0x5D4563: net_instaweb::NameValue::SharedDtor() (basic_string.h:244)
==30262==    by 0x5D45E6: net_instaweb::NameValue::~NameValue() (http.pb.cc:88)
==30262==    by 0x5D4618: net_instaweb::NameValue::~NameValue() (http.pb.cc:89)
==30262==    by 0x5D55A6: google::protobuf::RepeatedPtrField<net_instaweb::NameValue>::~RepeatedPtrField() (repeated_field.h:336)
==30262==    by 0x5D595F: net_instaweb::HttpResponseHeaders::~HttpResponseHeaders() (http.pb.cc:325)
==30262==    by 0x5D59A8: net_instaweb::HttpResponseHeaders::~HttpResponseHeaders() (http.pb.cc:327)
==30262==    by 0x5B5707: net_instaweb::Headers<net_instaweb::HttpResponseHeaders>::~Headers() (scoped_ptr.h:137)
==30262==    by 0x5BDE98: net_instaweb::ResponseHeaders::~ResponseHeaders() (response_headers.cc:66)
==30262==    by 0x558046: net_instaweb::AsyncFetch::~AsyncFetch() (async_fetch.cc:61)
==30262==    by 0x47D24D: net_instaweb::NgxBaseFetch::~NgxBaseFetch() (ngx_base_fetch.cc:47)
==30262==    by 0x47D25C: net_instaweb::NgxBaseFetch::~NgxBaseFetch() (ngx_base_fetch.cc:49)

@jeffkaufman
Copy link
Contributor

With @oschaaf's change ps_in_place_check_header_filter now gets an r->headers_out.status of 404 when before it got a 200. So it decides it can't serve IPRO in place, gives up on handling the content, and tries to fall back to the normal body filter flow.

@jeffkaufman
Copy link
Contributor

Looking at the corrupted Cache-Control header, it's .ax-age=300 where . is the null byte (00).

@jeffkaufman
Copy link
Contributor

This sounds like it might be a conflict between null terminated strings and length-specified ones?

@jeffkaufman
Copy link
Contributor

In copy_response_headers_to_ngx we don't actually make copies of the strings in the ResponseHeaders. This looks wrong. I think perhaps the ResponseHeaders is being freed when the BaseFetch is deleted but the headers haven't made it through the whole system yet?

@jeffkaufman
Copy link
Contributor

Yeah, I think there's a lifetime issue. I see (selected log entries):

Loaded resource with no Content-Type
CanInPlaceRewrite false. Content-Type is not defined
NgxBaseFetch::HandleHeadersComplete
NgxBaseFetch::HandleDone
ps fetch handler for fake.woff
copy_response_headers_to_ngx // pointers into pagespeed headers copied into nginx headers
ps_in_place_check_header_filter
Could not rewrite resource in-place
NgxBaseFetch::~NgxBaseFetch // pagespeed headers freed
ps_html_rewrite_header_filter
ps_html_rewrite_fix_headers_filter
ps_in_place_check_header_filter
ngx_http_header_filter // nginx headers, containing pointers into freed pagespeed headers, read

@jeffkaufman
Copy link
Contributor

A simple fix would be to make copy_response_headers_to_ngx actually copy the strings instead of just setting up ngx_str_ts to point into the existing ones. But do we have to do that? Maybe something is just slightly misconfigured?

Normally we delete the base fetch after everything else is complete, with ps_release_request_context calling ps_release_base_fetch. In this case we're calling ps_release_base_fetch from ps_decline_request, which means we're freeing the headers before sending them out to nginx.

But why do we need to release the base fetch when we decline the request? Can't we just wait and let it be released when nginx cleans us up with ps_release_request_context?

@jeffkaufman
Copy link
Contributor

Ok: I think this fixes it:

  1. @oschaaf's change to reorder response_headers()->set_status_code and set_response_headers(NULL) in RecordingFetch::HandleHeadersComplete
  2. Remove the call to ps_release_base_fetch from ps_decline_request.

I'll make a pull request after running full tests.

@jeffkaufman
Copy link
Contributor

It looks like the reordering change, (1) above, breaks TEST: In-place resource optimization. The size of test_image_dont_reuse.png stays 15131 and doesn't drop below 13000.

@jeffkaufman
Copy link
Contributor

Here's another potential solution: when deciding whether to load something from file, considering LoadFromFileMatch etc, only load things from file if they have an extension in our table of recognized content types.

@nikolay
Copy link
Contributor Author

nikolay commented Jun 4, 2014

This would be nice as it will save me the whitelisting I have in place.
On Jun 4, 2014 11:25 AM, "Jeff Kaufman" notifications@github.com wrote:

Here's another potential solution: when deciding whether to load something
from file, considering LoadFromFileMatch etc, only load things from file
if they have an extension in our table of recognized content types.


Reply to this email directly or view it on GitHub
#719 (comment)
.

@jeffkaufman
Copy link
Contributor

Right; limiting to known file extensions is probably something we should do whether or not we also fix other things in response to this bug.

@oschaaf
Copy link
Member

oschaaf commented Jun 4, 2014

@jeffkaufman "It looks like the reordering change, (1) above, breaks TEST: In-place resource optimization"
Just for my understanding - does changing the order fix an actual problem? Or did I misinterpret the code?

@oschaaf
Copy link
Member

oschaaf commented Jun 4, 2014

@jeffkaufman "Right; limiting to known file extensions is probably something we should do whether or not we also fix other things in response to this bug."
Sounds good to me as well. A thought occurred to me - what happens when we fetch a http response without a content-type? Might be good to (double) check?

@jeffkaufman
Copy link
Contributor

@oschaaf The existing IPRO test passes with the existing code and fails with your change. The new test fails with the existing code and passes with your change.

I haven't gotten into the details of why your change breaks the existing IPRO test; it looks reasonable to me. I'm now going off in a different direction looking at content types, though, if you're interested at pushing your change further.

@oschaaf
Copy link
Member

oschaaf commented Jun 5, 2014

@jeffkaufman I looked at the change I proposed some more: #721 passes all tests

@jeffkaufman
Copy link
Contributor

This bug applies to mod_pagespeed as well:

$ telnet localhost 8083
GET /mod_pagespeed_example/lff_ipro/fake.woff HTTP/1.1
Host: lff-ipro.example.com

HTTP/1.1 403 Forbidden
Date: Thu, 05 Jun 2014 14:07:56 GMT
Server: Apache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/1.0.1 DAV/2 PHP/5.3.10-1ubuntu3.11 with Suhosin-Patch
Expires: Thu, 05 Jun 2014 14:13:05 GMT
Last-Modified: Tue, 03 Jun 2014 21:16:30 GMT
X-Content-Type-Options: nosniff
Cache-Control: max-age=0, no-cache
ETag: "d20df5-1e-4faf5068aed97"
Accept-Ranges: bytes
Content-Length: 80
Content-Type: text/html

Missing Content-Type required for proxied resourceThis isn't really a woff file.

It's kind of unsettling that Missing Content-Type required for proxied resource would be sent out to the body of the html...

Either #721 or limiting LoadFromFile to known extensions is sufficient to fix both nginx and apache.

@jeffkaufman
Copy link
Contributor

It's kind of unsettling that "Missing Content-Type required for proxied resource" would be sent out to the body of the html...

Actually, it looks like that's supposed to happen. The intent of the code that wrote it is to serve a 403 and give the viewer that note about the missing content type. What's going wrong here is without #721 ApacheFetch::HandleHeadersComplete is seeing the response header status as "ok" and doing its content type check. So this is a symptom of a bug #721 fixes, and not an additional problem.

@jeffkaufman
Copy link
Contributor

Fixed in trunk with r4067.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants