content_encoding != charset, bug #153 #157

Merged
merged 1 commit into from May 23, 2013

Conversation

Projects
None yet
3 participants
@mdomsch
Member

mdomsch commented Apr 24, 2013

#153

Thus spake Castedo Ellerman:
I believe the commit 64eb484 you did on 2012-07-31 04:04:24Z added
incorrect code that sets the HTTP Content-Encoding header to be a
character encoding (typically "UTF-8").

I just got bitten by this because I used a recent s3cmd to upload a
file and now a Perl script that uses the "get" function from
LWP::Simple to download that file fails because that file uploaded by
s3cmd got the meta-tagged to have "Content-Encoding" as "UTF-8" which
is an invalid value. This is the same root cause for this bug
affecting Richard Harrington:
https://sourceforge.net/mailarchive/message.php?msg_id=30653510

The problem is that the code you added is confounding the "charset"
with "content encoding". Here is the definition of HTTP "content
encoding":
http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11
which is different and independent from charset (e.g. UTF-8).

The data member "self.config.encoding" of class S3 in S3.py is a
charset (looks like by default set to "utf-8").

In contrast the variable "content_encoding" originally added by
Karsten Sperling mail@ksperling.net is supposed to be an HTTP
content encoding.

It is with your merge that you incorrectly assign a charset
(self.config.encoding.upper()) to the variable content_encoding which
is intended to be used at an HTTP Contend-Encoding header and use
content_encoding as a charset for the
content_type. self.config.encoding.upper() and content_encoding should
remain completely independent of each other, they are two different
things.

This patch from Castedo fixes the mistake.

content_encoding != charset, bug #153
#153

Thus spake Castedo Ellerman:
I believe the commit 64eb484 you did on 2012-07-31 04:04:24Z added
incorrect code that sets the HTTP Content-Encoding header to be a
character encoding (typically "UTF-8").

I just got bitten by this because I used a recent s3cmd to upload a
file and now a Perl script that uses the "get" function from
LWP::Simple to download that file fails because that file uploaded by
s3cmd got the meta-tagged to have "Content-Encoding" as "UTF-8" which
is an invalid value. This is the same root cause for this bug
affecting Richard Harrington:
  https://sourceforge.net/mailarchive/message.php?msg_id=30653510

The problem is that the code you added is confounding the "charset"
with "content encoding". Here is the definition of HTTP "content
encoding":
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.5
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11
which is different and independent from charset (e.g. UTF-8).

The data member "self.config.encoding" of class S3 in S3.py is a
charset (looks like by default set to "utf-8").

In contrast the variable "content_encoding" originally added by
Karsten Sperling <mail@ksperling.net> is supposed to be an HTTP
content encoding.

It is with your merge that you incorrectly assign a charset
(self.config.encoding.upper()) to the variable content_encoding which
is intended to be used at an HTTP Contend-Encoding header and use
content_encoding as a charset for the
content_type. self.config.encoding.upper() and content_encoding should
remain completely independent of each other, they are two different
things.

This patch from Castedo fixes the mistake.

mdomsch added a commit to mdomsch/s3cmd that referenced this pull request May 11, 2013

@mludvig mludvig merged commit 85bb3ed into s3tools:master May 23, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment