Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split-brain when relying on Last-Modified header #22

Closed
trevorparker opened this issue Feb 16, 2014 · 2 comments
Closed

Split-brain when relying on Last-Modified header #22

trevorparker opened this issue Feb 16, 2014 · 2 comments

Comments

@trevorparker
Copy link

rss2text relies on the Last-Modified response header when sending an If-Modified-Since request header to a server. This makes sense, but it assumes that servers hosting the same content will reliably send identical Last-Modified headers. If they don't, then you get a fun split-brain condition where you bounce between getting 200 OK and 304 Not Modified as responses.

It looks like you're stashing last_pulled_dt -- would there be any downsides to basing If-Modified-Since on that instead?

@Stantheman
Copy link
Owner

rss2text relies on the Last-Modified response header when sending an If-Modified-Since request header to a server. This makes sense, but it assumes that servers hosting the same content will reliably send identical Last-Modified headers.

For what it's worth, the RFC defining Last-Modified backs me up:

To get best results when sending an If-Modified-Since header field for cache validation, 
clients are advised to use the exact date string received in a previous Last-Modified 
header field whenever possible.

I was trying to figure out what you meant by split-brain originally, which lead me to discover exactly what you meant. Is this custom code flipping seconds or Nginx trying to actually make me sad?

➜  rss2text git:(master) for i in {1..5}; do curl -sI https://www.trevorparker.com/rss.xml | grep 'Last-Modified'; done
Last-Modified: Sun, 16 Feb 2014 02:11:17 GMT
Last-Modified: Sun, 16 Feb 2014 02:11:16 GMT
Last-Modified: Sun, 16 Feb 2014 02:11:17 GMT
Last-Modified: Sun, 16 Feb 2014 02:11:17 GMT
Last-Modified: Sun, 16 Feb 2014 02:11:16 GMT

Actually I just went on a hunt to figure out who else is doing this and why. It looks like feedburner feeds flip even more wildly. I re-ran the command above after hunting and now it's completely changed and travelled backwards:

➜  rss2text git:(master) ✗ for i in {1..5}; do curl -sI https://www.trevorparker.com/rss.xml | grep 'Last-Modified'; done
Last-Modified: Sat, 15 Feb 2014 19:31:13 GMT
Last-Modified: Sat, 15 Feb 2014 19:31:13 GMT
Last-Modified: Sat, 15 Feb 2014 19:31:13 GMT
Last-Modified: Sat, 15 Feb 2014 19:31:13 GMT
Last-Modified: Sat, 15 Feb 2014 19:31:13 GMT

Last_pulled_dt is based on the data inside the feed, usually the pubDate of the first item in the feed. On your blog, my cached last pull date is "2013-12-23T14:18:09Z", which would make every pull 200.

I'm going on a "stop the 200s" hunt now though.

@trevorparker
Copy link
Author

Last_pulled_dt is based on the data inside the feed

Ah, I incorrectly determined that last_pulled_dt is the last time you requested a feed.

Is this custom code flipping seconds or Nginx trying to actually make me sad?

This is a result of launching a Jekyll build at the same time across multiple containers, and the build taking slightly longer on one or more containers and crossing a seconds boundary. These are the newest in a set of containers, which also means:

and now it's completely changed and travelled backwards:

is going to happen when you get bounced back to an older container.

The Right Thing for me to do is to build once, then deploy -- which is what I was doing until yesterday. I might end up just moving the balancing out of DNS and let nginx balance based on ip_hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants