Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError handling gzipped responses on Google App Engine #2595

Closed
agfor opened this issue May 11, 2015 · 17 comments
Closed

UnicodeEncodeError handling gzipped responses on Google App Engine #2595

agfor opened this issue May 11, 2015 · 17 comments

Comments

@agfor
Copy link

agfor commented May 11, 2015

braintree/braintree_python#53 originally reported the issue. It's due to using requests 2.6.1+, which pulled in a version of urllib3 including f21c2a2b73e4256ba2787f8470dbee6872987d2d, which causes the problem.

I am able to reproduce using app engine development mode when switching https://github.com/agfor/braintree-python-appengine to use requests 2.6.1+ and un-commenting out the development mode enabling code in main.py.

@Lukasa
Copy link
Member

Lukasa commented May 11, 2015

Have you tried requests 2.7.0? That fixed a number of problems associated with that urllib3 release.

@agfor
Copy link
Author

agfor commented May 11, 2015

@Lukasa The problem happens in requests 2.7.0 as well -- all released versions after 2.6.0. The problem is in urllib3 versions before urllib3/urllib3@22a9713, so would be fixed by merging in a more recent urllib3.

@sigmavirus24
Copy link
Contributor

@agfor urllib3/urllib3@22a9713 should be in 2.7.0.

@agfor
Copy link
Author

agfor commented May 11, 2015

@sigmavirus24 @Lukasa It looks like there are actually two different problems, and that commit only fixed one. With requests 2.6.2 / urllib3 before urllib3/urllib3@22a9713 it blows up inside requests:

  File "/Users/agf/projects/appengine/braintree-python/requests/api.py", line 108, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/Users/agf/projects/appengine/braintree-python/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/Users/agf/projects/appengine/braintree-python/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/agf/projects/appengine/braintree-python/requests/sessions.py", line 605, in send
    r.content
  File "/Users/agf/projects/appengine/braintree-python/requests/models.py", line 750, in content
    self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
  File "/Users/agf/projects/appengine/braintree-python/requests/models.py", line 673, in generate
    for chunk in self.raw.stream(chunk_size, decode_content=True):
  File "/Users/agf/projects/appengine/braintree-python/requests/packages/urllib3/response.py", line 304, in stream
    for line in self.read_chunked(amt):
  File "/Users/agf/projects/appengine/braintree-python/requests/packages/urllib3/response.py", line 401, in read_chunked
    line = line.decode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 1: ordinal not in range(128)

but after that commit / in 2.7.0, we blow up later inside the Braintree library. This appears to be because requests has tried to convert gzipped data to Unicode as if it had already been un-gzipped, and so replaced most of it with the unicode replacement character:

  File "/Users/agf/projects/appengine/braintree-python/braintree/util/http.py", line 73, in __http_do
    return XmlUtil.dict_from_xml(response_body)
  File "/Users/agf/projects/appengine/braintree-python/braintree/util/xml_util.py", line 11, in dict_from_xml
    return Parser(xml).parse()
  File "/Users/agf/projects/appengine/braintree-python/braintree/util/parser.py", line 15, in __init__
    self.doc = minidom.parseString("><".join(re.split(">\s+<", xml)).strip())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/minidom.py", line 1924, in parseString
    return expatbuilder.parseString(string)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
    return builder.parseString(string)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

example unicode code points for the response body:

[31, 65533, 8, 0, 65533, 51, 81, 85, 0, 3, 65533, 88, 77, 111, 65533, 54, 16, 65533, 65533, 48, 124, 103, 36, 123, 65533, 32, 27, 40, 10, 10, 20, 5, 1698, 64, 65533, 77, 65533, 65533, 94, 2, 90, 28, 91, 108, 40, 65533, 37, 41, 65533, 943, 65533, 65533, 65533, 20, 81, 73, 78, 69, 15, 65533, 602, 121, 28, 14, 103, 65533, 51, 65533, 65533, 58261, 88, 28, 65533, 88, 65533, 65533, 65533, 114, 117, 65533, 46, 23, 32, 11, 376, 65533, 65533, 46, 31, 65533, 127, 34, 65533, 763, 65533, 83, 65533, 12, 65533, 65533, 22, 14, 81, 65533, 65533, 65533, 34, 65533, 44, 65533, 65533, 65533, 65533, 65533, 89, 65533, 63, 65533, 65533, 58, 65533, 106, 65533, 65533, 1685, 65533, 65533, 23, 96, 89, 1162, 65533, 1437, 52, 65533, 10, 530, 65533, 65533, 714, 65533, 24, 65533, 65533, 68, 65533, 85, 4, 65533, 65533, 65533, 65533, 65533, 89, 50, 21, 123, 48, 65533, 84, 45, 93, 65533, 74, 47, 65533, 52, 75, 65533, 47, 65533, 65533, 65533, 20, 37, 65533, 65533, 1058, 65533, 66, 65533, 65533, 84, 87, 5, 65533, 65533, 126, 65533, 47, 65533, 65533, 65533, 65533, 85, 65533, 37, 49, 65533, 95, 65533, 12, 3, 65533, 31, 11, 65533, 65533, 65533, 1177, 26, 65533, 73, 65533, 1, 65533, 17, 65533, 22, 65533, 65533, 65533, 37, 65533, 79, 65533, 43, 88, 65533, 65533, 116, 65533, 33, 37292, 86, 65533, 65533, 65533, 65533, 65533, 65533, 102, 115, 65533, 39, 65533, 65533, 47, 8, 65533, 107, 65533, 62, 65533, 126, 65533, 65533, 65533, 11, 65533, 65533, 88, 65533, 65533, 107, 65533, 17, 65533, 65533, 65533, 67, 20, 60977, 65533, 72, 90, 65, 68, 41, 36654, 80, 65533, 65533, 65533, 20, 65533, 64, 69, 65533, 65533, 543, 97, 107, 65533, 65533, 65533, 1189, 65533, 49, 65533, 65533, 30, 39, 65, 77, 65533, 65533, 694, 92, 8, 44, 65533, 127, 65533, 65533, 65533, 25, 0, 65533, 3, 65533, 12, 88, 27, 11, 65533, 1089, 100, 62, 19, 65533, 16, 65533, 10, 42, 65533, 65533, 65533, 55, 65533, 65533, 59, 18, 65533, 65533, 65533, 65533, 32, 66, 65, 71, 65533, 354, 52, 65533, 65533, 65533, 52, 106, 65533, 65533, 80, 65533, 75, 65533, 65533, 16, 65533, 65533, 123, 40, 89, 99, 54, 120, 49, 65533, 65533, 32, 57, 120, 65533, 93, 45, 89, 65533, 65533, 26, 65533, 86, 57, 53, 65533, 65533, 70, 74, 12, 65533, 121, 268, 88, 112, 78, 64, 5, 120, 43, 65533, 65533, 21, 101, 20, 83, 114, 65533, 65533, 47, 65533, 65533, 104, 65533, 12, 65533, 65533, 118, 46, 65533, 65533, 32, 65533, 109, 0, 65533, 96, 9, 24, 65533, 12, 65533, 24, 105, 37, 45, 68, 65533, 22, 112, 65533, 65533, 65533, 65533, 65533, 65533, 65533, 65533, 65533, 4, 116, 38, 65533, 89, 65533, 91, 121, 19, 19, 65533, 113, 56, 76, 87, 78, 65533, 30, 65533, 65533, 65533, 65533, 76, 79, 65533, 65533, 11, 65533, 42, 65533, 89, 96, 65533, 65533, 884, 81, 5, 65533, 113, 65533, 65533, 36, 13, 65533, 96, 65533, 65533, 95, 127, 65533, 65533, 37, 75, 65533, 4, 65533, 65533, 65533, 93, 89, 65533, 126, 52, 65533, 105, 103, 86, 58, 65533, 65533, 65533, 7, 65533, 65533, 65533, 31, 65533, 115, 65533, 16, 90, 440, 65533, 4, 65533, 63, 65533, 77, 65533, 122, 80, 65533, 65533, 9, 65533, 97, 65533, 113, 5, 65533, 65533, 22, 65533, 52, 34, 65533, 65533, 760, 75, 51, 65533, 103, 80, 65533, 30, 73, 51, 65533, 42, 56, 66, 65533, 65533, 57, 65533, 85, 74, 0, 65533, 65533, 124, 71, 65533, 65533, 116, 65533, 7, 116, 115, 29, 79, 65, 10, 106, 88, 91, 65533, 78, 61, 65, 65533, 14, 110, 65533, 65533, 47, 87, 65533, 65533, 119, 67, 57, 65533, 35, 65533, 121, 35, 110, 63, 1707, 65533, 38, 73, 32, 55, 127, 112, 75, 65533, 86, 65533, 65533, 85, 104, 110, 65533, 84, 86, 74, 65533, 50, 79, 55, 89, 50, 17, 78, 65533, 39, 65533, 6, 41, 65533, 58, 29, 65533, 65533, 65533, 1911, 65533, 65533, 65533, 55, 65533, 64, 65533, 30, 65533, 65533, 39, 65533, 89, 122, 65533, 65533, 84, 34, 4, 59, 65533, 62, 120, 69, 65533, 64, 106, 35, 65533, 65533, 57, 109, 111, 65533, 65533, 90, 108, 65533, 65533, 98, 107, 40, 65533, 65533, 1716, 65533, 126, 65533, 125, 51, 65533, 65533, 65533, 59, 65533, 99, 5, 88, 65533, 65533, 81, 65533, 65533, 74, 14, 120, 65533, 11, 45, 65533, 119, 32, 15, 65533, 40, 65533, 1, 65533, 65533, 74, 65533, 85, 71, 17149, 65533, 102, 65533, 90, 65533, 107, 104, 83, 65533, 65533, 31, 65533, 28, 65533, 484, 10, 66, 65533, 65533, 65533, 65533, 65533, 30, 65533, 84, 65533, 1175, 107, 35, 104, 65533, 37, 80, 65533, 74, 60, 33, 65533, 17, 3, 89, 3, 98, 65533, 65533, 65533, 111, 62, 91, 85, 109, 48, 65533, 88, 65533, 65533, 90, 120, 46, 54, 64, 65533, 65533, 65533, 65533, 19, 75, 78, 65533, 25, 58, 65533, 65533, 78, 1235, 81, 98, 65533, 65533, 4, 109, 65533, 65533, 65533, 65533, 117, 65533, 65533, 79, 103, 65533, 72, 58, 110, 65533, 106, 71, 65533, 65533, 65533, 2, 65533, 65533, 78, 65533, 93, 65533, 20, 65533, 65533, 64, 120, 7, 97, 65533, 101, 13, 65533, 0, 65533, 82, 306, 65533, 65533, 75, 65533, 65533, 91, 26, 55, 65533, 68, 45, 65533, 39, 37, 474, 51, 65533, 17, 21, 65533, 7, 105, 64, 0, 26, 106, 65533, 65533, 111, 30, 65533, 97, 84, 84, 65533, 65533, 20, 65533, 65533, 65533, 22, 65533, 124, 116, 4, 65533, 1851, 65533, 41, 61, 65533, 84, 65533, 65533, 119, 65533, 1555, 65533, 87, 66, 65533, 65533, 27, 65533, 22, 65533, 65533, 79, 65, 78, 53, 71, 47, 65533, 65533, 65533, 65533, 65533, 67, 65533, 65533, 54, 48, 77, 65533, 19, 52, 898, 65533, 45, 12, 1523, 44, 105, 65533, 65533, 59, 84, 65533, 104, 68, 65533, 84, 86, 65533, 32, 17, 33, 62, 65533, 65533, 94, 54, 66, 65533, 91, 65533, 69, 65533, 65533, 65533, 125, 124, 65533, 39, 65533, 65533, 35, 4, 65533, 113, 27, 65533, 46, 65533, 65533, 394, 65533, 65533, 65533, 76, 65533, 65533, 123, 62, 96, 127, 65533, 65533, 54, 54, 65533, 65533, 639, 21, 65533, 92, 51, 85, 65533, 65533, 911, 65533, 63, 9, 83, 65533, 65533, 65533, 31, 85, 59, 65533, 65533, 33, 65533, 85, 996, 65533, 65533, 68, 65533, 97, 65533, 65533, 65533, 54, 12, 65533, 65533, 65533, 23, 65533, 65533, 58, 65533, 72, 21, 65533, 832, 65533, 439, 31, 99, 38, 65533, 65533, 15, 65533, 65533, 65533, 3, 65533, 65533, 65533, 65533, 65533, 65533, 92, 29, 43, 21, 73, 91, 65533, 96, 93, 20, 17, 118, 65533, 25, 65533, 57, 65533, 63, 65533, 65533, 29, 65533, 74, 65533, 29, 25, 65533, 75, 36, 97, 117, 65533, 25, 65533, 100, 65533, 85, 30, 125, 87, 594, 57, 1048, 65533, 12, 14, 58, 102, 59, 67, 38, 51, 11, 122, 65533, 86, 65533, 62, 65533, 65533, 65533, 9, 65533, 65533, 65533, 65533, 103, 65533, 65533, 65533, 65533, 65533, 115, 65533, 65533, 65533, 65533, 65533, 75, 65533, 65533, 65533, 91, 65533, 65533, 65533, 36, 1806, 65533, 65533, 65533, 65533, 116, 65533, 727, 118, 43, 9, 97, 29, 89, 65533, 92, 65533, 67, 65533, 65533, 65533, 65533, 12, 24, 65533, 65533, 26, 103, 98, 1300, 65533, 79, 65533, 0, 0, 0, 65533, 65533, 3, 0, 124, 65533, 65533, 65533, 65533, 17, 0, 0]

Versions 2.6.0 and below work.

@sigmavirus24
Copy link
Contributor

This appears to be because requests has tried to convert gzipped data to Unicode as if it had already been un-gzipped

Can you provide how you're using requests so that we can understand where this comes from?

@agfor
Copy link
Author

agfor commented May 12, 2015

The Braintree client library calls response.text on a gzip-encoded response from the Braintree gateway to get the decoded response body. It doesn't work properly on Google App Engine, presumably because GAE proxies the request and causes the response to be chunked.

This diff against 2.7.0 causes things to work as they did on 2.6.0 and earlier:

diff --git a/requests/models.py b/requests/models.py
index 45b3ea9..bbe9437 100644
--- a/requests/models.py
+++ b/requests/models.py
@@ -781,9 +781,13 @@ class Response(object):
         if self.encoding is None:
             encoding = self.apparent_encoding

+        import gzip
+        content = gzip.GzipFile(fileobj=StringIO(self.content)).read()
+
+
         # Decode unicode from given encoding.
         try:
-            content = str(self.content, encoding, errors='replace')
+            content = str(content, encoding, errors='replace')
         except (LookupError, TypeError):
             # A LookupError is raised if the encoding was not found which could
             # indicate a misspelling or similar mistake.
@@ -791,7 +795,7 @@ class Response(object):
             # A TypeError can be raised if encoding is None
             #
             # So we try blindly encoding.
-            content = str(self.content, errors='replace')
+            content = str(content, errors='replace')

         return content

But I haven't had time to dig into why the content hasn't been gzip-decoded yet at this point, so I don't have a real fix.

@Lukasa
Copy link
Member

Lukasa commented May 12, 2015

I really need to find a sample URL, I think.

@agfor
Copy link
Author

agfor commented May 12, 2015

@sigmavirus24 @Lukasa I haven't been able to reproduce off of Google App Engine, so here is a minimal GAE repo you can test with the development environment that exhibits the problem: https://github.com/agfor/requests-2.7-appengine-fail

The core of it is:

import webapp2
import requests
import gzip
from StringIO import StringIO


URL = "https://api.sandbox.braintreegateway.com:443/merchants/pgd875t7kmgp5q6x/transactions"
HEADERS = {
    'X-ApiVersion': '4',
    'Content-type': 'application/xml',
    'Authorization': 'Basic aHl6a2h4OGRtcDd4Zmc1aDoxZWZiN2YzMWM0ZjM1ODIwNmIxMjk4OTIzYWU0OGE2YQ==',
    'Accept': 'application/xml',
    'User-Agent': 'Braintree Python 3.5.0'
}
BODY = """
<transaction>
    <amount>10.00</amount>
    <type>sale</type>
    <credit_card>
        <expiration_date>05/2020</expiration_date>
        <number>4111111111111111</number>
    </credit_card>
</transaction>
"""

class MainHandler(webapp2.RequestHandler):
    def get(self):
        response = requests.post(URL, headers=HEADERS, data=BODY)
        # this is still gzipped data
        print response.content
        # what response.content should be
        print gzip.GzipFile(fileobj=StringIO(response.content)).read()
        # essentially what is blowing up in the braintree library
        response.text.encode('ascii')

app = webapp2.WSGIApplication([
    ('/', MainHandler)
], debug=True)

It will dump the still-gzipped response body, and the un-gzipped response body, to the logs, then blow up on the response.text.encode('ascii') line, since text is producing gibberish full of unicode replacement characters from trying to encode gzipped data as unicode.

@Lukasa
Copy link
Member

Lukasa commented May 12, 2015

Aha, this is interesting. I also cannot reproduce this, so it's actually important that we're on GAE. It seems like this is going to be caused by GAE being different to stdlib Python.

This means, unfortunately, that this is a urllib3 issue (requests is unlikely to be at fault here), and in particular I think GAE is screwing this up. Unfortunately, it's screwing it up silently, which is doubly bad.

@shazow What's your position on GAE support in urllib3? I know requests considers it an unsupported platform...

@shazow
Copy link
Contributor

shazow commented May 12, 2015

I'd like to support GAE on urllib3. It used to work great, then GAE changed some stuff and now it's not. :/ Needs more investigation.

@shazow
Copy link
Contributor

shazow commented May 12, 2015

Possibly related to urllib3/urllib3#583

@agfor
Copy link
Author

agfor commented May 12, 2015

@shazow Just to close the loop, I was able to bisect the issue back to urllib3/urllib3@f21c2a2 specifically.

@Lukasa
Copy link
Member

Lukasa commented May 12, 2015

Here's a theory: does GAE patch HTTPResponse.read to do special GAE things, and our new read_chunked method break that?

@sigmavirus24
Copy link
Contributor

Here's a suggestion: Since GAE is a for-profit platform, why doesn't GAE provide support for it for OSS projects that attempt to support it? Or better yet, will they give us an environment to test against and pay us to support it? If not, I'm not certain any of us should be really supporting it.

@shazow
Copy link
Contributor

shazow commented May 12, 2015

@agfor Ah good find, thanks.

Freakin' chunked encoding, breaking errything.

Does this mean GAE still works for non-chunked/non-streaming?

@sigmavirus24 Lovely idea, any interest in reaching out to the GAE team and asking if they'll fund this? :P

@Lukasa
Copy link
Member

Lukasa commented May 12, 2015

Alrighty, looks like we're moving this to urllib3/urllib3#618.

@Lukasa Lukasa closed this as completed May 12, 2015
@agfor
Copy link
Author

agfor commented May 13, 2015

There are a few possible of workarounds for this probem:

  1. Use requests 2.6.0 or earlier
  2. Use requests 2.6.1 - 2.7.0 with agfor/requests@da863cc
  3. Use requests 2.7.1+ (when released) with the above patch and agfor/requests@a1847ae
  4. Switch the GAE httplib to use sockets by adding this to app.yaml:
env_variables:
  GAE_USE_SOCKETS_HTTPLIB : 'anyvalue'

That can affect your billing so please be careful before changing it.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants