Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

Client IP is sent to downstream with Cache Purge request, instead of upstream pagespeed ip. #1068

Open
deweydb opened this issue Dec 10, 2015 · 8 comments

Comments

@deweydb
Copy link

deweydb commented Dec 10, 2015

I have nginx as a downstream caching proxy. When pagespeed emits a cache purge to the downstream, the original client IP is sent with it. Example from my logs:

172.218.???.??? - - [10/Dec/2015:18:38:51 -0500] "GET /purge/storage-auctions/affordable-mini-storage/unit-120 HTTP/1.1" 200 330 "https://dev.bid13.c
om/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36 mod_pagespeed/1.9.32.1
0-7423"

Where 172.218.???.??? is my public IP address of my home ISP, redacted for obvious reasons.

This forces me to use a insecure purge location block:

location ~ /purge(/.*) {
  allow all;
  proxy_cache_purge main $ps_capability_list$scheme$host$1$is_args$args;
}

Is there a workaround to this?

@oschaaf
Copy link
Member

oschaaf commented Dec 12, 2015

Are there any X-Forwarded-For or X-Real-IP headers involved? Custom log formatting?

@deweydb
Copy link
Author

deweydb commented Dec 12, 2015

Yes, i am using those because i needed my upstream to see the IP address of the client. without those the upstream saw 127.0.0.1:port.

My config:

# Define a mapping used to mark HTML as uncacheable.
map $upstream_http_content_type $new_cache_control_header_val {
    default $upstream_http_cache_control;
    "~*text/html" "no-cache, max-age=0";
}

###########################  
#  HTTP->HTTPS REDIRECT   #
###########################
server {
  listen         45.56.100.160:80;
  server_name    dev.bid13.com;
  pagespeed off;
  root /var/www/dev.bid13.com/public_html/drupal;
  location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc|webp)$ {
    expires 1M;
    add_header Cache-Control "public";
    try_files $uri =404;
  }
  location /{
    return 301 https://dev.bid13.com$request_uri;
  }
}

####################
#  DRUPAL BACKEND  #
####################
server {
    listen 127.0.0.1:9998;
    server_name dev.bid13.com;
    root /var/www/dev.bid13.com/public_html/drupal;
    access_log  /var/www/dev.bid13.com/log/drupal-access.log;
    error_log   /var/www/dev.bid13.com/log/drupal-error.log info;

    set_real_ip_from   127.0.0.1;
    real_ip_header     X-Forwarded-For;
    pagespeed off;
    index index.php;
    charset utf-8;
    include /etc/nginx/global/h5bp/basic.conf;
    include global/drupal-cached.conf;
    # nginx strips this header - but we want it to propagate down to the cache layer first
    fastcgi_pass_header X-Accel-Expires;
    # We don't trust drupal's cache control, we controll it ourselves with X-Accel-Expires
    fastcgi_hide_header Cache-Control;
}

####################
# PAGE SPEED LAYER #
####################
server {
    listen 127.0.0.1:9997;
    server_name dev.bid13.com;
    access_log  /var/www/dev.bid13.com/log/ps-access.log;
    error_log   /var/www/dev.bid13.com/log/ps-error.log info;

    set_real_ip_from   127.0.0.1;
    real_ip_header     X-Forwarded-For;

    pagespeed on;

    pagespeed FetchHttps enable;
#    pagespeed LowercaseHtmlNames on;
    pagespeed PreserveUrlRelativity off;
    pagespeed Domain bid13.com;
    pagespeed Domain *.bid13.com;
    pagespeed Domain *.cdn.bid13.com;

    # allow pagespeed to bypass https & get files directly from upstream backend
    pagespeed MapOriginDomain "http://localhost:9998" "https://dev.bid13.com";
    pagespeed MapOriginDomain "http://localhost:9998" "https://devkey1.cdn.bid13.com";
    pagespeed MapOriginDomain "http://localhost:9998" "https://devkey2.cdn.bid13.com";

    pagespeed LoadFromFile "https://dev.bid13.com" "/var/www/dev.bid13.com/public_html/drupal/";
    pagespeed LoadFromFile "https://devkey1.cdn.bid13.com" "/var/www/dev.bid13.com/public_html/drupal/";
    pagespeed LoadFromFile "https://devkey2.cdn.bid13.com" "/var/www/dev.bid13.com/public_html/drupal/";

    # Spread resources across our CDNs
    #pagespeed MapRewriteDomain "https://devkey1.cdn.bid13.com" "http://dev.bid13.com"; 
    pagespeed MapRewriteDomain "https://dev.bid13.com" "http://dev.bid13.com"; 
    pagespeed ShardDomain https://dev.bid13.com https://devkey1.cdn.bid13.com,https://devkey2.cdn.bid13.com;
#    pagespeed ShardDomain http://dev.bid13.com https://devkey1.cdn.bid13.com,https://devkey2.cdn.bid13.com;

    # pagespeed proxy cache integration
    pagespeed EnableCachePurge on;
    pagespeed DownstreamCachePurgeMethod "GET";
    pagespeed DownstreamCachePurgeLocationPrefix "http://127.0.0.1:9996/purge";
    pagespeed DownstreamCacheRebeaconingKey "<redacted>";


     # These are enabled by default
    pagespeed EnableFilters lazyload_images;
    pagespeed EnableFilters prioritize_critical_css;
    pagespeed EnableFilters convert_png_to_jpeg;
    pagespeed EnableFilters inline_google_font_css;
    pagespeed EnableFilters local_storage_cache;
    pagespeed EnableFilters convert_to_webp_lossless;
    pagespeed EnableFilters insert_image_dimensions;
    pagespeed EnableFilters inline_preview_images;
    pagespeed EnableFilters resize_mobile_images;
    pagespeed EnableFilters remove_comments;
    pagespeed EnableFilters collapse_whitespace;
    pagespeed EnableFilters add_instrumentation;
    pagespeed EnableFilters dedup_inlined_images;
    pagespeed EnableFilters insert_dns_prefetch;
    pagespeed EnableFilters extend_cache;

    location /ngx_pagespeed_statistics { 
        allow 127.0.0.1; # localhost
        allow 45.56.100.160; # server public ip
        deny all; 
    }
    location /ngx_pagespeed_global_statistics { 
        allow 127.0.0.1; # localhost
        allow 45.56.100.160; # server public ip
        deny all; 
    }
    location /ngx_pagespeed_message { 
        allow 127.0.0.1; # localhost
        allow 45.56.100.160; # server public ip
        deny all; 
    }
    location /pagespeed_console { 
        allow 127.0.0.1; # localhost
        allow 45.56.100.160; # server public ip
        deny all; 
    }
    location ~ ^/pagespeed_admin { 
        allow 127.0.0.1; # localhost
        allow 45.56.100.160; # server public ip
        deny all; 
    }
    location ~ ^/pagespeed_global_admin { 
        allow 127.0.0.1; # localhost
        allow 45.56.100.160; # server public ip
        deny all; 
    }

    location ~ "^/pagespeed_static/" { }
    location ~ "^/ngx_pagespeed_beacon$" { }

    location / {
        proxy_pass http://drupal_backend_dev;
        # Forward Host header to upstream server.
        proxy_set_header Host $host;
        # Pass client information upstream
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        # Strip the crap Expires header drupal gives us back
        # proxy_hide_header Expires;
        # proxy_hide_header Last-Modified;
        proxy_hide_header X-Powered-By;
        proxy_pass_header X-Accel-Expires;
    }
}

#######################   
#  PROXY CACHE LAYER  #
#######################
server {
    listen 127.0.0.1:9996;    
    server_name dev.bid13.com;
    access_log  /var/www/dev.bid13.com/log/cache-access.log;
    error_log   /var/www/dev.bid13.com/log/cache-error.log info;
    pagespeed off;

    set_real_ip_from   127.0.0.1;
    real_ip_header     X-Forwarded-For;
    set $upstream "pagespeed_layer_dev";

    # Block 2: Define prefix for proxy_cache_key based on the UserAgent.
    # Define placeholder PS-CapabilityList header values for large and small
    # screens with no UA dependent optimizations. Note that these placeholder
    # values should not contain any of ll, ii, dj, jw or ws, since these
    # codes will end up representing optimizations to be supported for the
    # request.
    set $default_ps_capability_list_for_large_screens "LargeScreen.SkipUADependentOptimizations";
    set $default_ps_capability_list_for_small_screens "TinyScreen.SkipUADependentOptimizations";

    # As a fallback, the PS-CapabilityList header that is sent to the upstream
    # PageSpeed server should be for a large screen device with no browser
    # specific optimizations.
    set $ps_capability_list $default_ps_capability_list_for_large_screens;

    # Cache-fragment 1: Desktop User-Agents that support lazyload_images (ll),
    # inline_images (ii) and defer_javascript (dj).
    # Note: Wget is added for testing purposes only.
    if ($http_user_agent ~* "Chrome/|Firefox/|MSIE |Safari|Wget") {
      set $ps_capability_list "ll,ii,dj:";
    }
    # Cache-fragment 2: Desktop User-Agents that support lazyload_images (ll),
    # inline_images (ii), defer_javascript (dj), webp (jw) and lossless_webp
    # (ws).
    if ($http_user_agent ~* "Chrome/[2][3-9]+\.|Chrome/[[3-9][0-9]+\.|Chrome/[0-9]{3,}\.") {
      set $ps_capability_list "ll,ii,dj,jw,ws:";
    }
    # Cache-fragment 3: This fragment contains (a) Desktop User-Agents that
    # match fragments 1 or 2 but should not because they represent older
    # versions of certain browsers or bots and (b) Tablet User-Agents that
    # correspond to large screens. These will only get optimizations that work
    # on all browsers and use image compression qualities applicable to large
    # screens. Note that even Tablets that are capable of supporting inline or
    # webp images, e.g. Android 4.1.2, will not get these advanced
    # optimizations.
    if ($http_user_agent ~* "Firefox/[1-2]\.|MSIE [5-8]\.|bot|Yahoo!|Ruby|RPT-HTTPClient|(Google \(\+https\:\/\/developers\.google\.com\/\+\/web\/snippet\/\))|Android|iPad|TouchPad|Silk-Accelerated|Kindle Fire") {
      set $ps_capability_list $default_ps_capability_list_for_large_screens;
    }
    # Cache-fragment 4: Mobiles and small screen Tablets will use image compression
    # qualities applicable to small screens, but all other optimizations will be
    # those that work on all browsers.
    if ($http_user_agent ~* "Mozilla.*Android.*Mobile*|iPhone|BlackBerry|Opera Mobi|Opera Mini|SymbianOS|UP.Browser|J-PHONE|Profile/MIDP|portalmmm|DoCoMo|Obigo|Galaxy Nexus|GT-I9300|GT-N7100|HTC One|Nexus [4|7|S]|Xoom|XT907") {
      set $ps_capability_list $default_ps_capability_list_for_small_screens;
    }

    set $bypass_reason "";
    # Block 3a: Bypass the cache for .pagespeed. resource. PageSpeed has its own
    # cache for these, and these could bloat up the caching layer.
    if ($uri ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") {
      set $bypass_cache "1";
      set $bypass_reason "Bypassed for pagespeed asset";
    }

    # PageSpeed's beacon dependent filters need the cache to let some requests
    # through to the backend.  This code below depends on the ngx_set_misc
    # module and randomly passes 5% of traffic to the backend for rebeaconing.
    set $should_beacon_header_val "";
    set $bypass_cache 0;
    set_random $rand 0 100;
    if ($rand ~* "^[0-4]$") {
      set $should_beacon_header_val "<redacted>";
      set $bypass_cache 1;
      set $bypass_reason "Bypassed For Rebeaconing"; 
    }
    # Block 3b: Only cache responses to clients that support gzip.  Most clients
    # do, and the cache holds much more if it stores gzipped responses.
    #if ($http_accept_encoding !~* gzip) {
    #  set $bypass_cache "1";
    #  set $bypass_reason "No Gzip Available"; 
    #}

    if ($uri ~ "live_update\.php") {
      set $bypass_cache "1";
      set $bypass_reason "Never cache live update requests";
      # skip pagespeed entirely
      set $upstream "drupal_backend_dev";
    }

    # Block 4: Location block for purge requests.
    location ~ /purge(/.*) {
      # This looks insecure, but we block access to /purge at the security layer.
      #add_header CachePurge "Purging resource from cache:$1";
      #add_header CacheKey $ps_capability_list$scheme$host$1$is_args$args;
      allow all;
      proxy_cache_purge main $ps_capability_list$scheme$host$1$is_args$args;
    }

    # Look for authenticated users by matching the cookeis sent
    set $drupal_auth "";
    if ($http_cookie ~* "SESS[a-z,0-9]{32}=([a-z,0-9]+);") {
       set $drupal_auth drupal_logged_in_$1;
       # even if pagespeed isn't rebeaconing, we still want to bypass cache
       set $bypass_cache 1;
       set $bypass_reason "Drupal Authentication Detected"; 

       # I'm not sure if this is correct!!!
       # disable rebeaconing header if user is authenticated
       # set $should_beacon_header_val "";
    }

    location / {
        # upstream pagespeed server
        proxy_pass http://$upstream;
        # Forward Host header to upstream server.
        proxy_set_header Host $host;
        # Pass client information upstream
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        # nginx proxy cache zone
        proxy_cache main;
        # Bypass cache creation for rebeaconing or authenticated users
        proxy_cache_bypass $bypass_cache;

        # Do not serve cached responses for authenticated users
        proxy_no_cache $drupal_auth;

        # Set the proxy cache key
        proxy_cache_key $ps_capability_list$scheme$host$uri$is_args$args;

        # Set the PS-CapabilityList header for PageSpeed server to respect.
        proxy_set_header PS-CapabilityList $ps_capability_list;

        # Allow rebeaconing
        proxy_set_header PS-ShouldBeacon $should_beacon_header_val;
        proxy_hide_header PS-ShouldBeacon;

        # Debug headers
        # add_header DebugCacheKey $ps_capability_list$scheme$host$uri$is_args$args;
        add_header X-Proxy-Cache $upstream_cache_status;
        add_header ByPassReason $bypass_reason; 

        # Generally this is overridden by drupal with the X-Accel-Expires header
        proxy_cache_valid 10m; # 200, 301 and 302 will be cached.

        # Fallback to stale cache on certain errors.
        proxy_cache_use_stale error timeout invalid_header http_500 http_502 http_504 http_404;

    }
}

#########################   
#  SECURIT / SSL LAYER  #
#########################
server {
    listen 45.56.100.160:443 ssl http2;
    server_name dev.bid13.com;
    ssl on;
    ssl_certificate      /etc/ssl/localcerts/dev.bid13.com.crt;
    ssl_certificate_key  /etc/ssl/localcerts/dev.bid13.com.key;
    ssl_prefer_server_ciphers on;
    ssl_ciphers ECDH+AESGCM:ECDH+AES256:ECDH+AES128:DH+3DES:!ADH:!AECDH:!MD5;
    ssl_dhparam /etc/ssl/localcerts/dhparam.pem;
    ssl_session_cache shared:SSL:20m;
    ssl_session_timeout 180m;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    access_log off;
    error_log off;
    root /var/www/dev.bid13.com/public_html/drupal;
    charset utf-8;

    pagespeed off;
    gzip_vary on;

    ####### REDIRECTS #######
    error_page 497 https://$host:443$request_uri;
    location = /facilities {
      return 301 /learnmore;
    }  
    location = /feedback {
      return 301 /contact-us;
    }  

    ##### SECURITY ######
    # deny any requests to cache purge from outside
    # all purge requests happen internally
    location ~ /purge(/.*) {
      deny all;
    }

    location /{
        proxy_pass http://cache_layer_dev;
        # Forward Host header to upstream server.
        proxy_set_header Host $host;
        # Pass client information upstream
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;

        # Prevent browser caching, but allow upstream to use cache control headers for proxy cache
        # Hide the upstream cache control header.
        proxy_hide_header Cache-Control;
        # Add the inferred Cache-Control header.
        add_header Cache-Control $new_cache_control_header_val;
    }
}

@oschaaf
Copy link
Member

oschaaf commented Dec 12, 2015

Thanks for the configuration, that should help reproducing. I suspect that the code that executes the purge requests copies over the incoming X-Forwarded-For header which would explain the symptoms you are seeing.
I'll have a look and post back here.

@oschaaf
Copy link
Member

oschaaf commented Dec 13, 2015

I was able to confirm the purge requests copying over X-Real-IP, I think we need to filter a few headers from purge requests before executing them.

@deweydb
Copy link
Author

deweydb commented Dec 13, 2015

Thanks for all your help Otto, I really appreciate it.

@jeffkaufman
Copy link
Contributor

In general you do want the purge request to copy over all the headers from the client request. If you have geocoding dependent content and so have set up a region-specific cache key that's dependent on the client IP, then if pagespeed didn't set X-Real-IP it wouldn't purge properly.

This is causing you a problem because to get purging to work you needed to open up your purge handler to the whole internet. I agree that's bad and not how it should be configured. Can you add a check that $realip_remote_addr (see docs) is localhost before purging?

@deweydb
Copy link
Author

deweydb commented Dec 14, 2015

It's actually not that unsafe (i think). Because on my main SSL block i drop all traffic to /purge:

##### SECURITY ######
# deny any requests to cache purge from outside
# all purge requests happen internally
location ~ /purge(/.*) {
  deny all;
}

And the upstream only listens to localhost, so effectively ip filtering purge requests to only come from the server itself.

@dvershinin
Copy link
Contributor

dvershinin commented Feb 13, 2017

With Downstream Caching + Varnish + Nginx as SSL terminator, there is no way to tell whether PURGE request is originating from localhost or not. Only for the simple fact that all the headers are copied by pagespeed's request - it looks like an external request to Varnish.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants