Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
content_encoding != charset, bug #153 #157
Thus spake Castedo Ellerman:
I just got bitten by this because I used a recent s3cmd to upload a
The problem is that the code you added is confounding the "charset"
The data member "self.config.encoding" of class S3 in S3.py is a
In contrast the variable "content_encoding" originally added by
It is with your merge that you incorrectly assign a charset
This patch from Castedo fixes the mistake.
#153 Thus spake Castedo Ellerman: I believe the commit 64eb484 you did on 2012-07-31 04:04:24Z added incorrect code that sets the HTTP Content-Encoding header to be a character encoding (typically "UTF-8"). I just got bitten by this because I used a recent s3cmd to upload a file and now a Perl script that uses the "get" function from LWP::Simple to download that file fails because that file uploaded by s3cmd got the meta-tagged to have "Content-Encoding" as "UTF-8" which is an invalid value. This is the same root cause for this bug affecting Richard Harrington: https://sourceforge.net/mailarchive/message.php?msg_id=30653510 The problem is that the code you added is confounding the "charset" with "content encoding". Here is the definition of HTTP "content encoding": http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11 which is different and independent from charset (e.g. UTF-8). The data member "self.config.encoding" of class S3 in S3.py is a charset (looks like by default set to "utf-8"). In contrast the variable "content_encoding" originally added by Karsten Sperling <firstname.lastname@example.org> is supposed to be an HTTP content encoding. It is with your merge that you incorrectly assign a charset (self.config.encoding.upper()) to the variable content_encoding which is intended to be used at an HTTP Contend-Encoding header and use content_encoding as a charset for the content_type. self.config.encoding.upper() and content_encoding should remain completely independent of each other, they are two different things. This patch from Castedo fixes the mistake.