Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

MIME type detection does not work with python-magic (Ubuntu 13.10) #198

Closed
e-t-u opened this Issue · 14 comments

8 participants

@e-t-u

s3cmd does not work with Ubuntu 13.10 python-magic. I assume that the problem is also in other systems but I have not tried.

The MIME type of uploaded files is random. None of my CSS files gets the right MIME type, big part of HTML files gets also wrong MIME type. Typical wrong HTML MIME is text/x-c++ ja CSS gets MIME text/x-asm or text/troff.

This is naturally a showstopper when uploading large web servers.

As a workaround, if you remove package python-magic, s3cmd falls back to simple detection that seems to work.

@praveenmarkandu

I can confirm this on Ubuntu 12.04. Removing python-magic fixes it.

@e-t-u

I verified that the problem is within package python-magic in Ubuntu. The following test script gives the same useless interpretation for HTML and CSS files:

#!/usr/bin/python                                                                                                                                                     
import magic
import sys
m = magic.open(magic.MIME_TYPE)
m.load()
for f in sys.argv[1:]:
    print(f, m.file(f))

The problem with s3cmd is that there is no option to force fallback to mimetypes.guess_type(file). This option should be on by default in Ubuntu distribution. Filename based quessing gives much better estimates for MIME types than this version of python-magic.

@e-t-u

Next try:

pip install python-magic

Result is the same. The detection is not good enough to upload any decent sized web site. It made about the same errors.

The test program changes into:

#!/usr/bin/python
import magic
import sys
m = magic.Magic(mime=True)
for f in sys.argv[1:]:
    print(f, m.from_file(f))
@e-t-u

I copied the magic file from the project to /etc/magic. I use brand new pip install python-magic. I have a small project where there are 27 html files, 5 css files and 5 js files
these are detected as:

27 html files:

  • 21 text/html
  • 2 text/x-c++
  • 4 application/javascript (there are couple of lines Google analytics etc. code identically in all documents)

5 CSS files:

  • 2 text/css
  • 2 text/plain
  • 1 text/troff

3 Javascript files:

  • all text/plain

No browser works with any file with wrong MIME type.

It is not practical to upload 500 pages static web site without fallback to reliable file name detection mode. In S3 you have to correct MIME types for each file individually and it takes forever.

@FedericoCeratto FedericoCeratto referenced this issue from a commit
Commit has since been removed from the repository and is no longer available.
@FedericoCeratto FedericoCeratto referenced this issue in getpelican/pelican
Closed

Workaround for s3cmd mimetype detection #1282

@mdomsch
Owner

--no-mime-magic appears to be the solution here. Closing.

@mdomsch mdomsch closed this
@jamshid

Do you mean "--no-mime-magic --guess-mime-type" will consistently upload foo.js with "Content-type: application/javascript"? Perfect. Unfortunately I didn't know about the latter option, so I switched from "s3cmd sync" to http://www.jets3t.org/ "synchronize". It has the advantage that it can do many uploads in parallel (with below settings). Example, in case this helps someone else:

# Synchronize runs fast (parallel) and sets correct Content-type
cat > ./synchronize.properties << EOF
s3service.https-only=false
s3service.s3-endpoint=${S3DOMAIN_HOST}
s3service.s3-endpoint-http-port=8085
accesskey=${S3TOKEN}
secretkey=secret
httpclient.max-connections=25
storage-service.admin-max-thread-count=25
threaded-service.max-thread-count=25
EOF
sh /opt/jets3t-0.9.0/bin/synchronize.sh  --properties ./synchronize.properties up mybucket/app ./app/*
@mpybkk

This is marked as resolved, but when using the command like "s3cmd --recursive modify" it always sets the mime type to "binary/octet-stream"

I have tried numerous options, like --no-mime-magic --guess-mime-type, removing values from the config. It continuously overrides any mime settings already set for my file on S3.

@gf3

I'm also experiencing the same issue as @mpybkk

@gf3

Even after I explicitly set --mime-type="xxx"

@jamiesonbecker

This problem continues in Ubuntu 14.04 LTS; be sure to sudo apt-get remove python-magic and sudo pip uninstall python-magic. @mdomsch still an open issue - should be left open?

@traviscollins

Use the --no-mime-magic option, and s3cmd will not use python-magic to guess a mime type.

@jamiesonbecker

good advice @traviscollins. @mpybkk indicated that this problem seems to persist even with --no-mime-magic set, so --no-mime-magic doesn't seem to be doing what it's supposed to. Maybe just being available via import is causing a side effect.

Incidentally, this seems to be a problem with python-magic upstream, which may imply a libmagic regression; new bug: ahupp/python-magic#75

@traviscollins

I'm using a Centos 6.6 machine to recursively deploy files from a Jenkins build output to S3 using s3cmd. Without --no-mime-magic, a .css file was being set as text/x-c++. I confirmed the detection using python-magic (simple python script to dump value) and then learned that python-magic is just a wrapper for libmagic, which is exposed through the unix file command. I found that file -i style.css was the culprit (it returns text/x-css++ for one very long css file, and text/plain for another).

When use the --no-mime-magic -M flags together, the S3 file is set with the correct text/css mime-type.

@jamiesonbecker

-M, --guess-mime-type, is not available in older versions of s3cmd. There might be some interactions due to recent changes to s3cmd. Perhaps @e-t-u (or @gf3 /@mpybkk) can confirm if it's still a problem w/ pip installed s3cmd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.