Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various TTL fixes #6

Merged
merged 5 commits into from Nov 1, 2013
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 23 additions & 10 deletions reppy/__init__.py
Expand Up @@ -49,7 +49,7 @@
import re
import time
import urlparse
import dateutil.parser
import email.utils

#####################################################
# Import our exceptions at the global level
Expand All @@ -71,8 +71,17 @@ def short_user_agent(strng):

@staticmethod
def parse_time(strng):
'''Parse a human-readable time into a timestamp'''
return time.mktime(dateutil.parser.parse(strng).timetuple())
'''Parse an HTTP-style (i.e. email-style) time into a timestamp'''
v = email.utils.parsedate_tz(strng)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the reason for this switch a matter of strictness? Or to remove the dateutil dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dateutil was ignoring time zones and parsing everything as GMT. This was causing robots.txt data to expire too soon or too late, depending on the time zone.

if v is None:
# Reject bad data
raise ValueError("Invalid time.")
if v[9] is None:
# Default time zone is GMT/UTC
v = list(v) # @$%?? Dutch
v[9] = 0
v = tuple(v)
return email.utils.mktime_tz(v)

@staticmethod
def get_ttl(headers, default):
Expand All @@ -83,21 +92,25 @@ def get_ttl(headers, default):
# Expires header, as per RFC2616 Sec. 13.2.4.
if headers.get('cache-control') is not None:
for directive in headers['cache-control'].split(','):
tokens = directive.lower().strip().partition('=')
tokens = directive.lower().partition('=')
t_name = tokens[0].strip()
t_value = tokens[2].strip()
# If we're not allowed to cache, then expires is now
if tokens[0].strip() in (
'no-cache', 'no-store', 'must-revalidate'):
if t_name in ('no-store', 'must-revalidate'):
return 0
elif tokens[0].strip() == 's-maxage':
elif t_name == 'no-cache' and t_value == '':
# Only honor no-cache if there is no =value after it
return 0
elif t_name == 's-maxage':
try:
# Since s-maxage should override max-age, return
return long(tokens[2])
return long(t_value)
except ValueError:
# Couldn't parse s-maxage as an integer
continue
elif tokens[0].strip() == 'max-age':
elif t_name == 'max-age':
try:
ttl = long(tokens[2])
ttl = long(t_value)
except ValueError:
# Couldn't parse max-age as an integer
continue
Expand Down