Fix incorrect content-header assignment and charset bug #249

Closed
wants to merge 2 commits into
from

Projects

None yet

4 participants

mime_magic fix for Issue #247:
Originally mime_magic could assign the charset when it assigned
content_encoding. The change in 85bb3ed removed the ability of
mime_magic to set the character set. This reverts that mistake.

Cleanup fix 85bb3ed for bug #153:
The content-encoding header assignment is removed since the
intent was a mistake as described in bug #153.
adoliver added some commits Nov 28, 2013
@adoliver adoliver Allow mime_magic to assign charset, code cleanup.
mime_magic fix for Issue #247:
Originally mime_magic could assign the charset when it assigned
content_encoding. The change in 85bb3ed removed the ability of
mime_magic to set the character set. This reverts that mistake.

Cleanup fix 85bb3ed for bug #153:
The content-encoding header assignment is removed since the
intent was a mistake as described in bug #153.
d480998
@adoliver adoliver Add check for null content_charset 84480a4

Improvement on #213

My S3-hosted website was returning the header "content-encoding: utf-8" which invalid and confused clients like the W3C validator. I applied this patch, re-uploaded, and the problem was fixed. Thanks @adoliver !

Member
mdomsch commented Feb 2, 2014

I'm going to need some additional review on this one. I don't quite get this series, what's gone right and wrong over it. Some examples of the before and after behavior, and fixup to match current upstream master branch would be appreciated.

Member
mdomsch commented Feb 2, 2014

Why are we reporting a content-encoding at all on gzip'd files? I would expect the web server to add a Content-Encoding header if it gzip's the file itself, so the receiver knows to automatically gunzip it. Files starting out on disk as foo.gz shouldn't get a Content-Enconding: gzip header on it, otherwise, when downloaded, the file will get written to the disk as foo.gz, but will no longer be gzipped. yes?

adoliver commented Feb 4, 2014

I agree that the Content-Encoding should not be added by the upload client. You are correct about how that header should be working. That is why I removed the header from the code, the same conclusion was reached in #153 but the subsequent commit did not clean up the code well.
It has been awhile so I will update my fork and review the specific reasons for my changes again.

Member
mdomsch commented Feb 10, 2014

adoliver - any progress with this please?

Sorry, I had some unexpected demands on my time. I'll be able to devote
time this weekend if not sooner to syncing up my fork.

Member
mdomsch commented Feb 18, 2014

My bug/charset branch now has one patch which I think addresses it. I
simply rebased adoliver's patch. Does this look right now?

From 218d2f1 Mon Sep 17 00:00:00 2001
From: Matt Domsch matt@domsch.com
Date: Mon, 17 Feb 2014 22:51:31 -0600
Subject: [PATCH] set content-type, content-charset, and not content-encoding

Patches originally by adoliver, rebased for current master branch.

Content-Encoding should only be set by the HTTP server, not by us.

S3/Config.py | 1 -
S3/S3.py | 16 ++++++++--------
s3cmd | 1 -
3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/S3/Config.py b/S3/Config.py
index 302b9d0..038514b 100644
--- a/S3/Config.py
+++ b/S3/Config.py
@@ -92,7 +92,6 @@ class Config(object):
debug_exclude = {}
debug_include = {}
encoding = "utf-8"

  • add_content_encoding = True
    urlencoding_mode = "normal"
    log_target_prefix = ""
    reduced_redundancy = False
    diff --git a/S3/S3.py b/S3/S3.py
    index 4dffd6d..60f8512 100644
    --- a/S3/S3.py
    +++ b/S3/S3.py
    @@ -410,22 +410,22 @@ class S3(object):
 ## MIME-type handling
 content_type = self.config.mime_type
  •    content_encoding = None
    
  •    content_charset = None
     if filename != "-" and not content_type and
    

    self.config.guess_mime_type:
    if self.config.use_mime_magic:

  •            (content_type, content_encoding) = mime_magic(filename)
    
  •            (content_type, content_charset) = mime_magic(filename)
         else:
    
  •            (content_type, content_encoding) =
    

    mimetypes.guess_type(filename)

  •            (content_type, content_charset) =
    

    mimetypes.guess_type(filename)
    if not content_type:

  •        content_type = self.config.default_mime_type
    
  •        content_charset = self.config.default_mime_type
    
  •    if not content_charset:
    
  •        content_charset = self.config.encoding.upper()
    
     ## add charset to content type
    
  •    if self.add_encoding(filename, content_type):
    
  •        content_type = content_type + "; charset=" +
    

    self.config.encoding.upper()

  •    if self.add_encoding(filename, content_type) and content_charset
    

    is not None:

  •        content_type = content_type + "; charset=" + content_charset
    
     headers["content-type"] = content_type
    
  •    if content_encoding is not None and
    

    self.config.add_content_encoding:

  •        headers["content-encoding"] = content_encoding
    
     ## Other Amazon S3 attributes
     if self.config.acl_public:
    

    diff --git a/s3cmd b/s3cmd
    index 095ce7b..c8814e4 100755
    --- a/s3cmd
    +++ b/s3cmd
    @@ -2022,7 +2022,6 @@ def main():
    optparser.add_option( "--server-side-encryption",
    dest="server_side_encryption", action="store_true", help="Specifies that
    server-side encryption will be used when putting objects.
    ")

    optparser.add_option( "--encoding", dest="encoding",
    metavar="ENCODING", help="Override autodetected terminal and filesystem
    encoding (character set). Autodetected: %s" % preferred
    _encoding)

  • optparser.add_option( "--disable-content-encoding",
    dest="add_content_encoding", action="store_false", help="Don't include a
    Content-encoding header to the the uploaded objects")
    optparser.add_option( "--add-encoding-exts",
    dest="add_encoding_exts", metavar="EXTENSIONs", help="Add encoding to these
    comma delimited extensions i.e. (css,js,html) when uploadin
    g to S3 )")
    optparser.add_option( "--verbatim", dest="urlencoding_mode",
    action="store_const", const="verbatim", help="Use the S3 name as given on
    the command line. No pre-processing, encoding
    , etc. Use with caution!")

1.8.5.3

Member
mdomsch commented Feb 18, 2014

Oops. One fix on top of that. These are on my bug/charset branch.

diff --git a/S3/S3.py b/S3/S3.py
index 60f8512..f72e10b 100644
--- a/S3/S3.py
+++ b/S3/S3.py
@@ -417,7 +417,7 @@ class S3(object):
else:
(content_type, content_charset) =
mimetypes.guess_type(filename)
if not content_type:

  •        content_charset = self.config.default_mime_type
    
  •        content_type = self.config.default_mime_type
     if not content_charset:
         content_charset = self.config.encoding.upper()
    

On Mon, Feb 17, 2014 at 10:55 PM, Matt Domsch matt@domsch.com wrote:

My bug/charset branch now has one patch which I think addresses it. I
simply rebased adoliver's patch. Does this look right now?

From 218d2f1 Mon Sep 17 00:00:00 2001
From: Matt Domsch matt@domsch.com
Date: Mon, 17 Feb 2014 22:51:31 -0600
Subject: [PATCH] set content-type, content-charset, and not
content-encoding

Patches originally by adoliver, rebased for current master branch.

Content-Encoding should only be set by the HTTP server, not by us.

S3/Config.py | 1 -
S3/S3.py | 16 ++++++++--------
s3cmd | 1 -
3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/S3/Config.py b/S3/Config.py
index 302b9d0..038514b 100644
--- a/S3/Config.py
+++ b/S3/Config.py
@@ -92,7 +92,6 @@ class Config(object):
debug_exclude = {}
debug_include = {}
encoding = "utf-8"

  • add_content_encoding = True
    urlencoding_mode = "normal"
    log_target_prefix = ""
    reduced_redundancy = False
    diff --git a/S3/S3.py b/S3/S3.py
    index 4dffd6d..60f8512 100644
    --- a/S3/S3.py
    +++ b/S3/S3.py
    @@ -410,22 +410,22 @@ class S3(object):
 ## MIME-type handling
 content_type = self.config.mime_type
  •    content_encoding = None
    
  •    content_charset = None
     if filename != "-" and not content_type and
    

    self.config.guess_mime_type:
    if self.config.use_mime_magic:

  •            (content_type, content_encoding) = mime_magic(filename)
    
  •            (content_type, content_charset) = mime_magic(filename)
         else:
    
  •            (content_type, content_encoding) =
    

    mimetypes.guess_type(filename)

  •            (content_type, content_charset) =
    

    mimetypes.guess_type(filename)
    if not content_type:

  •        content_type = self.config.default_mime_type
    
  •        content_charset = self.config.default_mime_type
    
  •    if not content_charset:
    
  •        content_charset = self.config.encoding.upper()
    
     ## add charset to content type
    
  •    if self.add_encoding(filename, content_type):
    
  •        content_type = content_type + "; charset=" +
    

    self.config.encoding.upper()

  •    if self.add_encoding(filename, content_type) and content_charset
    

    is not None:

  •        content_type = content_type + "; charset=" + content_charset
    
     headers["content-type"] = content_type
    
  •    if content_encoding is not None and
    

    self.config.add_content_encoding:

  •        headers["content-encoding"] = content_encoding
    
     ## Other Amazon S3 attributes
     if self.config.acl_public:
    

    diff --git a/s3cmd b/s3cmd
    index 095ce7b..c8814e4 100755
    --- a/s3cmd
    +++ b/s3cmd
    @@ -2022,7 +2022,6 @@ def main():
    optparser.add_option( "--server-side-encryption",
    dest="server_side_encryption", action="store_true", help="Specifies that
    server-side encryption will be used when putting objects.
    ")

    optparser.add_option( "--encoding", dest="encoding",
    metavar="ENCODING", help="Override autodetected terminal and filesystem
    encoding (character set). Autodetected: %s" % preferred
    _encoding)

  • optparser.add_option( "--disable-content-encoding",
    dest="add_content_encoding", action="store_false", help="Don't include a
    Content-encoding header to the the uploaded objects")
    optparser.add_option( "--add-encoding-exts",
    dest="add_encoding_exts", metavar="EXTENSIONs", help="Add encoding to these
    comma delimited extensions i.e. (css,js,html) when uploadin
    g to S3 )")
    optparser.add_option( "--verbatim", dest="urlencoding_mode",
    action="store_const", const="verbatim", help="Use the S3 name as given on
    the command line. No pre-processing, encoding
    , etc. Use with caution!")

1.8.5.3

My apologies for not getting my fork ready for you. Those changes look correct.

Member
mdomsch commented Feb 22, 2014

I merged my change which was the same as adoliver's.

@mdomsch mdomsch closed this Feb 22, 2014
mietek commented Aug 30, 2014

I agree that the Content-Encoding should not be added by the upload client.

Should it not? Consider Amazon advice regarding “Serving Compressed Files from Amazon S3”:

Add a Content-Encoding header field for each compressed file and set the field value to gzip.

It would be nice if s3cmd sync had an option to do this automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment