Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib2: Content-Encoding #53709

Closed
guest mannequin opened this issue Aug 3, 2010 · 4 comments
Closed

urllib2: Content-Encoding #53709

guest mannequin opened this issue Aug 3, 2010 · 4 comments
Labels
easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@guest
Copy link
Mannequin

guest mannequin commented Aug 3, 2010

BPO 9500
Nosy @orsenthil, @bitdancer
Superseder
  • bpo-1508475: transparent gzip compression in urllib
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-11-20.20:17:57.572>
    created_at = <Date 2010-08-03.22:20:45.713>
    labels = ['easy', 'type-feature', 'library']
    title = 'urllib2: Content-Encoding'
    updated_at = <Date 2010-11-20.20:17:57.571>
    user = 'https://bugs.python.org/guest'

    bugs.python.org fields:

    activity = <Date 2010-11-20.20:17:57.571>
    actor = 'r.david.murray'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-11-20.20:17:57.572>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2010-08-03.22:20:45.713>
    creator = 'guest'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 9500
    keywords = ['easy']
    message_count = 4.0
    messages = ['112707', '112744', '112796', '121754']
    nosy_count = 4.0
    nosy_names = ['orsenthil', 'dstanek', 'r.david.murray', 'guest']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '1508475'
    type = 'enhancement'
    url = 'https://bugs.python.org/issue9500'
    versions = ['Python 3.2']

    @guest
    Copy link
    Mannequin Author

    guest mannequin commented Aug 3, 2010

    urllib2 doesn't support any real-world Content-Encoding scheme.

    "gzip" and "deflate" are standard compression schemes for HTTP and expected to be implemented by all clients. None of the default urllib2 handlers implements it.

    Common workarounds are available on the Google. Many people resort to fixing up HTTP responses within their application logic (=not good) due to lack of library support. And some wrote proper urllib2 handlers. Here's one for gzip support with deflate/zlib (HTTP spec is unclear on zlib vs. raw deflate format, hence some buggy servers) hacked on:

    # http://techknack.net/python-urllib2-handlers/    
    from gzip import GzipFile
    from StringIO import StringIO
    class ContentEncodingProcessor(urllib2.BaseHandler):
      """A handler to add gzip capabilities to urllib2 requests """
    
      # add headers to requests   
      def http_request(self, req):
        req.add_header("Accept-Encoding", "gzip, deflate")
        return req
    
      # decode
      def http_response(self, req, resp):
        old_resp = resp
        # gzip
        if resp.headers.get("content-encoding") == "gzip":
            gz = GzipFile(
                        fileobj=StringIO(resp.read()),
                        mode="r"
                      )
            resp = urllib2.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)
            resp.msg = old_resp.msg
        # deflate
        if resp.headers.get("content-encoding") == "deflate":
            gz = StringIO( deflate(resp.read()) )
            resp = urllib2.addinfourl(gz, old_resp.headers, old_resp.url, old_resp.code)  # 'class to add info() and
            resp.msg = old_resp.msg
        return resp
    
    # deflate support
    import zlib
    def deflate(data):   # zlib only provides the zlib compress format, not the deflate format;
      try:               # so on top of all there's this workaround:
        return zlib.decompress(data, -zlib.MAX_WBITS)
      except zlib.error:
        return zlib.decompress(data)

    @guest guest mannequin added the stdlib Python modules in the Lib dir label Aug 3, 2010
    @bitdancer
    Copy link
    Member

    Thanks for the suggestion.

    New features can only go into Python3, where the urllib/urllib2 have been harmonized into the urllib package. So what we would need in order to consider this for acceptance is a patch against py3k trunk urllib. Please see http://python.org/dev for information about how to develop a patch for submission.

    @bitdancer bitdancer added easy type-feature A feature request or enhancement labels Aug 4, 2010
    @guest
    Copy link
    Mannequin Author

    guest mannequin commented Aug 4, 2010

    Nah sorry, I've just been bothered to report it. As I don't run py3 can't write a patch anyway. And it wouldn't help for my current python 2.x setups also.
    I guess it's sufficient if this is googleable, and per-application workarounds are very much ok, as Python2 isn't that widely used for webapps.

    Also, httplib2 supports Content-Encoding. They still have that raw deflate vs. zlib bug, but that can be fixed. And as externally distributed lib will remedy the situation for all apps and Python < 2.8.

    However, it might be a better idea to add a note to the urllib/2 documentation instead. "No default handler for Content-Encoding..." because many people stumbled on this before (see google/stackoverflow).

    @bitdancer
    Copy link
    Member

    bpo-1508475 has a patch, though it still needs updated.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    easy stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant