Skip to content

Understand Content Type, Mime type guessing and Magic

Florent Viard edited this page Mar 19, 2020 · 3 revisions

What is a MIME type and why using the Content-Type header?

When a file is uploaded to a S3 server, the client can indicate its media type (MIME type) the Content-Type header.
(More info about media types can be found here: MIME types definition (Mozilla docs)

This type info is not strictly needed, but it can be useful to render the right content when the resource is directly accessed by a web browser through it's URL.


Why the guessed MIME type is sometimes wrong?

Often, issues are reported by users when the automatically selected "media type" is not the one that they are expecting.
This is the case, for example, when an user is expecting "application/javascript" to be used, but that this file is uploaded with the "text/plain" type.

The problem is that s3cmd is not directly responsible for the incorrect guesses.

By default, guess_mime_type and mime-magic options are enabled.
That means that the detection will be performed using 2 different methods:

  • guess_mime_type: Will try to guess the right type to use based on the "file extension".
  • mime-magic: The content of the file itself will be inspected to determine its type.

For both of these operations, s3cmd will only use external libraries and simply ask them for their opinion on the type to associate with any given file.

So, when the guess is wrong, it means that all the external libraries that are used are wrong on the type of your file.

In such a case, there are 3 solutions:

  • Use the commandline options to force s3cmd to use a specific mime-type for all the uploaded files.
  • Report the issue upstream to the external libraries so that they can fix their detection rules
  • Create or fix the correct rule for them, locally inside your machine.

Configure locally a "guess_mime_type" file extension rule

"guess_mime_type" function uses the "mimetypes" Python standard library. The database of rules of this library is hardcoded but extendable.

On Windows, additional rules will come from the Windows registry.

On Linux (and OSX?), additional rules will be extracted from any existing file of the following list:

knownfiles = [
    "/etc/mime.types",
    "/etc/httpd/mime.types",                    # Mac OS X
    "/etc/httpd/conf/mime.types",               # Apache
    "/etc/apache/mime.types",                   # Apache 1
    "/etc/apache2/mime.types",                  # Apache 2
    "/usr/local/etc/httpd/conf/mime.types",
    "/usr/local/lib/netscape/mime.types",
    "/usr/local/etc/httpd/conf/mime.types",     # Apache 1.2
    "/usr/local/etc/mime.types",                # Apache 1.3
    ]

The format of a line is:

MIME/TYPE  [...TAB...]   extensions.

For example:

application/x-yaml                                yml yaml

In most distributions, the file /etc/mime.types is already pre-filled by some packages. For example, for Debian linux, it is mime-support and for Fedora it is mailcap.


Configure locally a "mime-magic" pattern recognition rule

The "Magic" file recognition rules are based on "content" patterns that have to find inside files.
For example, if the string "PNG" is found in the first bytes of a file, it will be identified as a PNG image. And so, it will be associated with the "image/png" mime type.

For this task, s3cmd is using one of multiple magic python libraries that could be installed on the machine (python-magic, filemagic, ...) but most of them will be using the database rules of the "shared-mime-info" project/package.

In addition to the default rules, additional custom rules can be added in the file /etc/magic.
More info about the syntax (that is a little bit complex) can be found here:
magic - Format of the /etc/magic file (IBM)

Inside s3cmd source, there is a magic sample file, that can be used to improve the detection of the media types of commonly used web content files.