Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Improved content_handler #189

Merged
merged 1 commit into from
Apr 3, 2013
Merged

Conversation

chaizhenhua
Copy link
Contributor

ps_content_handler should be called before other content handler.

@jeffkaufman
Copy link
Contributor

What sort of problem were you having with pagespeed? Is this an attempt to fix the problem described in #102?

@chaizhenhua
Copy link
Contributor Author

Is the priority of pagespeed higher then others? If so it is yes.

@jeffkaufman
Copy link
Contributor

@yaoweibin any chance you could look at this?

@jeffkaufman
Copy link
Contributor

This looks very promising, I think it might fix #102, but I should test it more. Thanks for writing it!

Could you sign our CLA? https://developers.google.com/open-source/cla/individual

@chaizhenhua
Copy link
Contributor Author

now the calling sequence is try_files > ps_content_handler > others,
i think you desired is ps_content_handler > try_files > others,
and i will add a new patch to change the priority later, so you can test later.

@yaoweibin
Copy link
Contributor

I had a look at the pull request, It saves the original first content_length handler and replaces it with pagespeed handler. It's the similar solution as Piotr Sikora did.

@chaizhenhua can you look at the Piotr Sikora's patch and test your pull request. The test method can be found in the README of this project.

Thanks.

@chaizhenhua
Copy link
Contributor Author

@yaoweibin where is Piotr Sikora's patch?

@yaoweibin
Copy link
Contributor

@chaizhenhua
Copy link
Contributor Author

the patch(FRiCKLE/ngx_cache_purge@822a28c) looks similar, i think there is following solutions to #102:

  1. in merge_conf phase set clcf->handler: many module can change clcf->handler in this phase, so which handler does clcf->handler point to is undefined.
  2. in postconfigure phase set clcf->handler: few module change clcf->handler in this phase.
  3. in preaccess phase, change r->content_handler and set unprocessed flag pass, then in content_handler check unprocessed and do the right thing. in this phase we can grantee ps handler has highest priority, but we need create pagespeed context in this phase.

ps_create_request_context directly process the request so 3. is abandoned.

I will test it.

@chaizhenhua
Copy link
Contributor Author

hello,
I test it use the commit 46055b4, there is a coredump if add two http block to nginx.conf.

#0  0x0000000000554963 in net_instaweb::RewriteDriver::SetResourceManager(net_instaweb::ServerContext*) ()
#1  0x00000000005937bd in net_instaweb::ServerContext::NewUnmanagedRewriteDriver(net_instaweb::RewriteDriverPool*, net_instaweb::RewriteOptions*, net_instaweb::RefCountedPtr<net_instaweb::RequestContext> const&) ()
#2  0x000000000059475f in net_instaweb::ServerContext::NewRewriteDriverFromPool(net_instaweb::RewriteDriverPool*, net_instaweb::RefCountedPtr<net_instaweb::RequestContext> const&) ()
#3  0x00000000004d520c in ngx_psol::(anonymous namespace)::ps_create_request_context (r=0x27db1d0, is_resource_fetch=false)
    at ../ngx_pagespeed/src/ngx_pagespeed.cc:1286
#4  0x00000000004d3da7 in ngx_psol::(anonymous namespace)::ps_header_filter (r=0x27db1d0) at ../ngx_pagespeed/src/ngx_pagespeed.cc:1500
#5  0x000000000048db2e in ngx_http_not_modified_header_filter (r=0x27db1d0) at src/http/modules/ngx_http_not_modified_filter_module.c:70
#6  0x0000000000453504 in ngx_http_send_header (r=0x27db1d0) at src/http/ngx_http_core_module.c:1897
#7  0x0000000000488a5d in ngx_http_static_handler (r=0x27db1d0) at src/http/modules/ngx_http_static_module.c:245
#8  0x0000000000452b5e in ngx_http_core_content_phase (r=0x27db1d0, ph=0x27cbc08) at src/http/ngx_http_core_module.c:1403
#9  0x000000000044ff14 in ngx_http_core_run_phases (r=0x27db1d0) at src/http/ngx_http_core_module.c:877
#10 0x000000000044fe7a in ngx_http_handler (r=0x27db1d0) at src/http/ngx_http_core_module.c:860
#11 0x00000000004635cc in ngx_http_process_request (r=0x27db1d0) at src/http/ngx_http_request.c:1687
#12 0x0000000000463ddf in ngx_http_process_request_headers (rev=0x27f5f00) at src/http/ngx_http_request.c:1135
#13 0x0000000000462a97 in ngx_http_process_request_line (rev=0x27f5f00) at src/http/ngx_http_request.c:933
#14 0x000000000045e708 in ngx_http_init_request (rev=0x27f5f00) at src/http/ngx_http_request.c:519
#15 0x00000000004492b4 in ngx_epoll_process_events (cycle=0x27a7a60, timer=60000, flags=1) at src/event/modules/ngx_epoll_module.c:683
#16 0x000000000043844f in ngx_process_events_and_timers (cycle=0x27a7a60) at src/event/ngx_event.c:247
#17 0x0000000000447f57 in ngx_worker_process_cycle (cycle=0x27a7a60, data=0x0) at src/os/unix/ngx_process_cycle.c:807
#18 0x00000000004433ce in ngx_spawn_process (cycle=0x27a7a60, proc=0x447d70 <ngx_worker_process_cycle>, data=0x0, name=0xa9eecc "worker process", respawn=0)
    at src/os/unix/ngx_process.c:198
#19 0x0000000000445d09 in ngx_reap_children (cycle=0x27a7a60) at src/os/unix/ngx_process_cycle.c:619
#20 0x00000000004450e7 in ngx_master_process_cycle (cycle=0x27a7a60) at src/os/unix/ngx_process_cycle.c:180
#21 0x000000000040bc7a in main (argc=1, argv=0x7fff5fc185d8) at src/core/nginx.c:412

@oschaaf
Copy link
Member

oschaaf commented Mar 20, 2013

@chaizhenhua: "I test it use the commit 46055b4, there is a coredump if add two http block to nginx.conf."
/cc @jeffkaufman

Confirmed, I can reproduce that, and am looking into it. To be honest, I was not aware that you could do that. Could you explain when is it practical to have two http{} blocks?

@oschaaf
Copy link
Member

oschaaf commented Mar 20, 2013

@jeffkaufman
/cc @chaizhenhua

With two http{} blocks, ps_create_main_conf gets to be called twice, which includes this code:

net_instaweb::NgxRewriteOptions::Initialize();                      
net_instaweb::NgxRewriteDriverFactory::Initialize();                
cfg_m->driver_factory = new net_instaweb::NgxRewriteDriverFactory();

The static initialisers seem straightforward to fix by only calling them once; but I'm not sure how we should handle having two main conf's. For example, ngx_http_conf_get_module_main_conf is used later on, and will only return a single main configuration, while it seems we have created two of them in this case. Any thoughts?

@chaizhenhua
Copy link
Contributor Author

I just want to test pagespeed and do not want to change my nginx.conf, so i just paste testing config in front of nginx.conf.

@chaizhenhua
Copy link
Contributor Author

@yaoweibin is following message means OK?

Image gets rewritten by default.
      Fetching http://localhost:8050/mod_pagespeed_example/styles/W.rewrite_css_images.css.pagespeed.cf.Hash.css --header='X-PSA-Blocking-Rewrite:psatest'  -o /tmp/mod_pagespeed_test.chaizhenhua/fetched_directory/wget_output.txt                                                   -O /tmp/mod_pagespeed_test.chaizhenhua/fetched_directory/fetch_until_output.1615 until $(fgrep -c BikeCrashIcn.png.pagespeed.ic) = 1
wget -q --header='X-PSA-Blocking-Rewrite:psatest'  -o /tmp/mod_pagespeed_test.chaizhenhua/fetched_directory/wget_output.txt                                                   -O /tmp/mod_pagespeed_test.chaizhenhua/fetched_directory/fetch_until_output.1615 http://localhost:8050/mod_pagespeed_example/styles/W.rewrite_css_images.css.pagespeed.cf.Hash.css and checking with fgrep -c BikeCrashIcn.png.pagespeed.ic
................................................................................................................................................................

@chaizhenhua
Copy link
Contributor Author

@yaoweibin

I tried to build mod_pagespeed from source as the README, but failed at gclient runhooks. i can build the latest version follow the page(https://developers.google.com/speed/docs/mod_pagespeed/build_from_source), but can't pass the tests even use 46055b4. i seen some apache config files in mod_pagespeed/src/install, should i add more directives to nginx.conf to run the test? or should I configure something else?

@yaoweibin
Copy link
Contributor

@chaizhenhua It may fail in some of tests. It's OK. You can see the note from README:

Each of these failed tests is a known issue:

compression is enabled for rewritten JS.
If you're running a version of nginx without etag support (pre-1.3.3) you won't see this issue, which is fine.
convert_meta_tags
insert_dns_prefetch

@@ -1919,11 +1985,11 @@ ngx_int_t ps_init(ngx_conf_t* cf) {
ngx_http_core_main_conf_t* cmcf = static_cast<ngx_http_core_main_conf_t*>(
ngx_http_conf_get_module_main_conf(cf, ngx_http_core_module));
ngx_http_handler_pt* h = static_cast<ngx_http_handler_pt*>(
ngx_array_push(&cmcf->phases[NGX_HTTP_CONTENT_PHASE].handlers));
ngx_array_push(&cmcf->phases[NGX_HTTP_PREACCESS_PHASE].handlers));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the NGX_HTTP_ACCESS_PHASE? You may break the access module.

My original idea of this problem was just move this pagepseed fetch content_handler to the last handler of the access_phase. And we don't need modify any of the code in the handler itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may break the access module? I don't know what you mean.
the ps_preaccess_handler just modify cfg_l->loc_conf / clcf and return NGX_DECLINED.

your idea is good and brief if we can determine the order.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen this code?

https://github.com/agentzh/srcache-nginx-module/blob/master/src/ngx_http_srcache_fetch.c#L205

It's tricky.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i seen the code. put to access phase might be better. but i don't know why skip post access phase.
and in access phase, we only need to check r->phase_handler == ph->next - 1 in the handler.

It's a good idea. I think we can insert ps_content_handler with generic_phase checker before TRY_FILES phase.

@chaizhenhua
Copy link
Contributor Author

@yaoweibin the patch has changed use the idea(https://github.com/agentzh/srcache-nginx-module/blob/master/src/ngx_http_srcache_fetch.c#L205
). it's better then before.

@@ -94,7 +94,7 @@ else
$psol_library_dir/libapr.a"
fi

pagespeed_libs="-lstdc++ $psol_library_binaries -lrt -pthread"
pagespeed_libs="-lstdc++ $psol_library_binaries -lrt -pthread -lm"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffkaufman has fixed it in a new pull request ( https://github.com/pagespeed/ngx_pagespeed/pull/192/files), you could delete this commit.

@chaizhenhua
Copy link
Contributor Author

with out -lm configure will fail on my system(
Linux localhost.localdomain 3.8.3-203.fc18.x86_64 #1 SMP Mon Mar 18 12:59:28 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux).
and following is the output of objs/autoconf.err

----------------------------------------
checking for psol

/usr/bin/ld: ../ngx_pagespeed/psol/lib/Release/linux/x64/pagespeed_automatic.a(62.putil.o.o): undefined reference to symbol 'floor@@GLIBC_2.2.5'
/usr/bin/ld: note: 'floor@@GLIBC_2.2.5' is defined in DSO /lib64/libm.so.6 so try adding it to the linker command line
/lib64/libm.so.6: could not read symbols: Invalid operation
collect2: error: ld returned 1 exit status
----------

ngx_http_handler_pt* h = static_cast<ngx_http_handler_pt*>(
ngx_array_push(&cmcf->phases[NGX_HTTP_CONTENT_PHASE].handlers));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the access phase to move the handler? It could reduce the number of movement elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add handler to access phase, nginx may add extre post_access_phase after access phase(see ngx_http_init_phase_handlers at http.c:460).
preaccess phase use generic checker, generic checker checks less then access_phase_checker, and it's enough for us.
the ps_preaccess_handler will be called only once, so the movements may be acceptable.

@yaoweibin
Copy link
Contributor

@jeffkaufman can you look at the link error with library m?

@jeffkaufman
Copy link
Contributor

@yaoweibin "can you look at the link error with library m?"

There was also just an error on the mailing list about undefined reference to symbol 'log@@GLIBC_2.0'. It looks like on some systems you need an explicit link for the math library to get floor, log etc? This is unrelated to the rest of this change, though, so I'm going to break it out into a separate pull request: #192

while (ph[i + 1].checker != ngx_http_core_try_files_phase
&& ph[i + 1].checker != ngx_http_core_content_phase) {
ph[i] = ph[i + 1];
ph[i].next --;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coding style:

ph[i].next--;

@yaoweibin
Copy link
Contributor

It looks good to me.

i++;
}

// insert content handler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jeffkaufman
Copy link
Contributor

The code is good and looks like it's doing the right thing, but I want to make sure it handles each of the cases properly before I merge it. I tested with proxy_pass, and verified that this configuration doesn't work without the change but does work with it:

## This is a temporary workaround that ensures requests for pagespeed       
## optimized resources go to the pagespeed handler.                         
#location ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+" { }         
#location ~ "^/ngx_pagespeed_static/" { }                                  
#location ~ "^/ngx_pagespeed_beacon$" { } 
location / {
  proxy_pass http://WEBSITE;
}

What other configurations should I test? Could someone give an example usage of try_files that exercises this change? What else might be a problem?

@yaoweibin
Copy link
Contributor

That's decent. The different process phase test cases should be added in this change:

  • content phase: proxy_pass & static file
  • try_files
  • access phase: access module
  • preaccess phase: limit_conn module
  • rewrite phase: rewrite & if block

All the phases could be simple and primary usage.

@jeffkaufman
Copy link
Contributor

This fails in the rewrite_images system test:

TEST: rewrite_images [system_test] inlines, compresses, and resizes.
...
Fetching http://localhost:8050/mod_pagespeed_example/rewrite_images.html --header=ModPagespeedFilters:rewrite_images  -H -p -S -o /tmp/mod_pagespeed_test.jefftk/fetched_directory/wget_output.txt -nd -P /tmp/mod_pagespeed_test.jefftk/fetched_directory -e robots=off until $(grep -c .pagespeed.ic) = 2

wget -q --header=ModPagespeedFilters:rewrite_images -H -p -S -o /tmp/mod_pagespeed_test.jefftk/fetched_directory/wget_output.txt -nd -P /tmp/mod_pagespeed_test.jefftk/fetched_directory -e robots=off http://localhost:8050/mod_pagespeed_example/rewrite_images.html and checking with grep -c .pagespeed.ic

To reproduce, run this command and note that it hangs:

wget -q --header=ModPagespeedFilters:rewrite_images  -H -p -S -nd -P /tmp/mod_pagespeed_test.jefftk/fetched_directory -e robots=off http://localhost:8050/mod_pagespeed_example/rewrite_images.html

@jeffkaufman
Copy link
Contributor

It's trying and failing to download http://localhost:8050/mod_pagespeed_example/images/xBikeCrashIcn.png.pagespeed.ic.2FihTuVNs_.png, which when run as a separate command downloads fine.

@jeffkaufman
Copy link
Contributor

--2013-03-28 16:58:32--  http://localhost:8050/mod_pagespeed_example/images/xBikeCrashIcn.png.pagespeed.ic.2FihTuVNs_.png
Reusing existing connection to localhost:8050.
HTTP request sent, awaiting response...

I think the problem is probably the Reusing existing connection to localhost:8050. We probably break on connection reuse.

@jeffkaufman
Copy link
Contributor

$ telnet localhost 8050
GET /ngx_pagespeed_static/js_defer.4Wv5zwfokU.js HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
Server: nginx/1.2.4
Date: Thu, 28 Mar 2013 21:04:19 GMT
Content-Type: application/javascript
Content-Length: 11422
Connection: keep-alive
Cache-Control: max-age=31536000

(function(){var e=!0,f=null,g=!1,aa=encodeURICompo...
[much javascript snipped]
...a,Y.M);Y.i(h,Y.M);})();
<html>
<head><title>404 Not Found</title></head>
<body bgcolor="white">
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.2.4</center>
</body>
</html>

Why are we putting an 404 page at the end of the response?

@jeffkaufman
Copy link
Contributor

I just confirmed that the 404 page appending doesn't happen on master.

@jeffkaufman
Copy link
Contributor

Because the 404 page is beyond the content length curl won't print it, and I think that's what's making wget hang.

@yaoweibin
Copy link
Contributor

@chaizhenhua Can you look at this broken test case?

@oschaaf
Copy link
Member

oschaaf commented Mar 29, 2013

Another problem I found is that on my machine the second requests seem to hang when using proxy pass.

server {
    listen       7001;
    pagespeed on;
    location / {
             proxy_pass http://www.foo.com;
    }
}

And then try to fetch a pagespeed resource 2 times from it using keep-alive, for example:

curl -v  http://127.0.0.1:7001/foo.js.pagespeed.jm.UUAZ1YvXPB.js  http://127.0.0.1:7001/foo.js.pagespeed.jm.UUAZ1YvXPB.js

The second request here will hang. After debugging a little, disabling the line at https://github.com/pagespeed/ngx_pagespeed/blob/master/src/ngx_pagespeed.cc#L2005 seems to fix this particular problem. However, this still needs some research as doing that probably breaks something else, plus I'm having trouble understanding the exact sequence of events here.

@jeffkaufman jeffkaufman reopened this Mar 29, 2013
@chaizhenhua
Copy link
Contributor Author

@yaoweibin OK. I'm working on it.

@chaizhenhua
Copy link
Contributor Author

@oschaaf disable (https://github.com/pagespeed/ngx_pagespeed/blob/master/src/ngx_pagespeed.cc#L2005)
these tests failed, are these relate to this patch?

Failing Tests:
  In-place resource optimization
  In-place resource optimization
  In-place resource optimization
  convert_meta_tags
  insert_dns_prefetch
  insert_dns_prefetch
FAIL.

@oschaaf
Copy link
Member

oschaaf commented Mar 29, 2013

@chaizhenhua no, those are expected failures and not related to this. In place resource optimization is not implemented yet, and the other tests fail because we currently send out the response headers too soon to be able to support the related pagespeed filters.

@chaizhenhua
Copy link
Contributor Author

nginx -s quit generated a coredump file.

@jeffkaufman
Copy link
Contributor

Removing the r->count++; does fix the "404 page" issue; great!

@jeffkaufman
Copy link
Contributor

Testing a configuration from #201, it looks like we still don't work with if blocks:

conf:
  # This is a complex if block, that could probably be replaced with a         
  # try_files, but we should support it if we can.                             
  if (!-d $request_filename) { set $rule_0 1$rule_0; }
  if (!-f $request_filename) { set $rule_0 2$rule_0; }
  if ($rule_0 = "21") {
    rewrite ^/(.*)$ /mod_pagespeed_example/index.html last;
  }

$ curl -H Host:ifblock.example.com http://localhost:8051/ngx_pagespeed_static/js_defer.4Wv5zwfokU.js
<html>...

which is the content of /mod_pagespeed_example/index.html.

Without the if we get:

(function(){var e=!0,f=null...

as we should.

Is this expected behavior? Should this patch be able to get in ahead of those ifs? Or do we just need to ask people to use try_files instead?

@chaizhenhua
Copy link
Contributor Author

@jeffkaufman
if has higher priority then other handler (see below). in the configuration if route request /mod_pagespeed_example/index.html and ps_content_handler do nothing with the request.
this verified that the if write module works well, if we put ps_handler before if these if block will fail.
I think we should not change nginx default rewrite logic. or the following config: will be confused:

   if (xxxx) {
      pagespeed on;
   }
typedef enum {
    NGX_HTTP_POST_READ_PHASE = 0,

    NGX_HTTP_SERVER_REWRITE_PHASE,  # if 

    NGX_HTTP_FIND_CONFIG_PHASE,
    NGX_HTTP_REWRITE_PHASE,         # if
    NGX_HTTP_POST_REWRITE_PHASE,    # if

    NGX_HTTP_PREACCESS_PHASE,

    NGX_HTTP_ACCESS_PHASE,
    NGX_HTTP_POST_ACCESS_PHASE,

    /* now we are here before NGX_HTTP_TRY_FILES_PHASE 
       and after NGX_HTTP_POST_ACCESS_PHASE  */
    NGX_HTTP_TRY_FILES_PHASE,
    NGX_HTTP_CONTENT_PHASE,

    NGX_HTTP_LOG_PHASE
} ngx_http_phases;

@chaizhenhua
Copy link
Contributor Author

hi, all, if other nginx module use libapr and call apr_terminate() in exit process before pagespeed, nginx will crash. now i have no idea about how to fix it.

@jeffkaufman
Copy link
Contributor

@chaizhenhua

if other nginx module use libapr and call apr_terminate() in exit process before pagespeed, nginx will crash

Ooh;m good catch! I've moved this to a separate issue: #202

jeffkaufman added a commit that referenced this pull request Apr 3, 2013
@jeffkaufman jeffkaufman merged commit 99e6627 into apache:master Apr 3, 2013
@chaizhenhua chaizhenhua deleted the content_handler branch May 8, 2013 10:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants