Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Is_gzipped for application/x-gzip;charset=utf-8 #2050

Merged
merged 7 commits into from Jul 8, 2016
Merged

[MRG+1] Is_gzipped for application/x-gzip;charset=utf-8 #2050

merged 7 commits into from Jul 8, 2016

Conversation

@Tethik
Copy link
Contributor

@Tethik Tethik commented Jun 12, 2016

Fix for #2049

@codecov-io
Copy link

@codecov-io codecov-io commented Jun 12, 2016

Current coverage is 83.34%

Merging #2050 into master will increase coverage by <.01%

Powered by Codecov. Last updated by b7925e4...23f99e9

@@ -54,4 +54,4 @@ def gunzip(data):
def is_gzipped(response):
"""Return True if the response is gzipped, or False otherwise"""
ctype = response.headers.get('Content-Type', b'')
return ctype in (b'application/x-gzip', b'application/gzip')
return b'application/x-gzip' in ctype or b'application/gzip' in ctype

This comment has been minimized.

@robsonpeixoto

robsonpeixoto Jun 12, 2016

Just a suggestion:

from six.moves import map # just if you would like a lazy solution

mimes = (b'application/x-gzip', b'application/gzip')
return any(map(lambda mime: mime in ctype, mimes))

This comment has been minimized.

@Tethik

Tethik Jun 12, 2016
Author Contributor

I did it as an any-statement first, but figured it would be overkill since there are only two comparisons. Another version:

any(mime in ctype for mime in (b'application/x-gzip', b'application/gzip'))
def test_is_gzipped(self):
hdrs = Headers({"Content-Type": "application/x-gzip"})
r1 = Response("http://www.example.com", headers=hdrs)
self.assertTrue(is_gzipped(r1))

This comment has been minimized.

@robsonpeixoto

robsonpeixoto Jun 12, 2016

I do not known about the scrapy code style, but IMO it should be 4 different tests.

Joakim Uddholm
@redapple
Copy link
Contributor

@redapple redapple commented Jun 14, 2016

According to RFC 7231, section-3.1.1.1,

The type, subtype, and parameter name tokens are case-insensitive.

so it's worth fixing that too.

Suggestion:

$ git diff
diff --git a/scrapy/utils/gz.py b/scrapy/utils/gz.py
index f174950..e520b42 100644
--- a/scrapy/utils/gz.py
+++ b/scrapy/utils/gz.py
@@ -1,3 +1,4 @@
+import re
 import struct

 try:
@@ -51,7 +52,9 @@ def gunzip(data):
     return output


+_is_gzipped_re = re.compile(br'^application/(x-)?gzip\b', re.I)
+
 def is_gzipped(response):
     """Return True if the response is gzipped, or False otherwise"""
     ctype = response.headers.get('Content-Type', b'')
-    return b'application/x-gzip' in ctype or b'application/gzip' in ctype
+    return _is_gzipped_re.search(ctype)
diff --git a/tests/test_utils_gz.py b/tests/test_utils_gz.py
index 3648d5c..02bf1d7 100644
--- a/tests/test_utils_gz.py
+++ b/tests/test_utils_gz.py
@@ -37,6 +37,16 @@ class GunzipTest(unittest.TestCase):
         r1 = Response("http://www.example.com", headers=hdrs)
         self.assertTrue(is_gzipped(r1))

+    def test_is_gzipped_case_insensitive(self):
+        hdrs = Headers({"Content-Type": "Application/X-Gzip"})
+        r1 = Response("http://www.example.com", headers=hdrs)
+        self.assertTrue(is_gzipped(r1))
+
+    def test_is_gzipped_not_quite(self):
+        hdrs = Headers({"Content-Type": "application/gzippppp"})
+        r1 = Response("http://www.example.com", headers=hdrs)
+        self.assertFalse(is_gzipped(r1))
+
     def test_is_gzipped_empty(self):
         r1 = Response("http://www.example.com")
         self.assertTrue(not is_gzipped(r1))
@@ -50,3 +60,7 @@ class GunzipTest(unittest.TestCase):
         hdrs = Headers({"Content-Type": "application/x-gzip;charset=utf-8"})
         r1 = Response("http://www.example.com", headers=hdrs)
         self.assertTrue(is_gzipped(r1))
+
+        hdrs = Headers({"Content-Type": "application/X-GZIP ; charset=utf-8"})
+        r1 = Response("http://www.example.com", headers=hdrs)
+        self.assertTrue(is_gzipped(r1))

@Tethik
Copy link
Contributor Author

@Tethik Tethik commented Jun 14, 2016

@redapple Alright, I'll look into it. Good catch on the gzipppp

@Tethik
Copy link
Contributor Author

@Tethik Tethik commented Jun 14, 2016

Done


def is_gzipped(response):
"""Return True if the response is gzipped, or False otherwise"""
ctype = response.headers.get('Content-Type', b'')
return ctype in (b'application/x-gzip', b'application/gzip')
return not _is_gzipped_re.search(ctype) is None

This comment has been minimized.

@redapple

redapple Jun 14, 2016
Contributor

I usually prefer return _is_gzipped_re.search(ctype) is not None

This comment has been minimized.

@Tethik

Tethik Jun 14, 2016
Author Contributor

Zzzz... fine ;)

This comment has been minimized.

@redapple

redapple Jun 14, 2016
Contributor

Sorry for nitpicking, but I first read (quickly) "not is_gzipped".

This comment has been minimized.

@Tethik

Tethik Jun 14, 2016
Author Contributor

Nahh it's fine. I agree is not looks better. I'm just lazy :)

Joakim Uddholm
@redapple
Copy link
Contributor

@redapple redapple commented Jun 14, 2016

Looks good to me.
Thanks @Tethik!

@redapple redapple changed the title Is_gzipped for application/x-gzip;charset=utf-8 [MRG+1] Is_gzipped for application/x-gzip;charset=utf-8 Jun 14, 2016
@@ -27,3 +28,39 @@ def test_gunzip_truncated_short(self):
with open(join(SAMPLEDIR, 'truncated-crc-error-short.gz'), 'rb') as f:
text = gunzip(f.read())
assert text.endswith(b'</html>')

def test_is_gzipped_right(self):

This comment has been minimized.

@robsonpeixoto

robsonpeixoto Jun 14, 2016

IMO should break this test in two different:

def test_is_gzipped_right(self):
    hdrs = Headers({"Content-Type": "application/gzip"})
    r1 = Response("http://www.example.com", headers=hdrs)
    self.assertTrue(is_gzipped(r1))

def test_is_x_gzipped_right(self):
    hdrs = Headers({"Content-Type": "application/x-gzip"})
    r1 = Response("http://www.example.com", headers=hdrs)
    self.assertTrue(is_gzipped(r1))
Joakim Uddholm
@eliasdorneles eliasdorneles merged commit d43a357 into scrapy:master Jul 8, 2016
2 checks passed
2 checks passed
codecov/patch 100% of diff hit (target 100%)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@redapple redapple modified the milestone: v1.1.1 Jul 8, 2016
eliasdorneles added a commit that referenced this pull request Jul 8, 2016
[backport][1.1] Is_gzipped for application/x-gzip;charset=utf-8 (PR #2050)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants