MIME type detection does not work with python-magic (Ubuntu 13.10) #198

Closed
e-t-u opened this Issue Jul 17, 2013 · 18 comments

Projects

None yet

10 participants

@e-t-u
e-t-u commented Jul 17, 2013

s3cmd does not work with Ubuntu 13.10 python-magic. I assume that the problem is also in other systems but I have not tried.

The MIME type of uploaded files is random. None of my CSS files gets the right MIME type, big part of HTML files gets also wrong MIME type. Typical wrong HTML MIME is text/x-c++ ja CSS gets MIME text/x-asm or text/troff.

This is naturally a showstopper when uploading large web servers.

As a workaround, if you remove package python-magic, s3cmd falls back to simple detection that seems to work.

@praveenmarkandu

I can confirm this on Ubuntu 12.04. Removing python-magic fixes it.

@e-t-u
e-t-u commented Jul 31, 2013

I verified that the problem is within package python-magic in Ubuntu. The following test script gives the same useless interpretation for HTML and CSS files:

#!/usr/bin/python                                                                                                                                                     
import magic
import sys
m = magic.open(magic.MIME_TYPE)
m.load()
for f in sys.argv[1:]:
    print(f, m.file(f))

The problem with s3cmd is that there is no option to force fallback to mimetypes.guess_type(file). This option should be on by default in Ubuntu distribution. Filename based quessing gives much better estimates for MIME types than this version of python-magic.

@e-t-u
e-t-u commented Jul 31, 2013

Next try:

pip install python-magic

Result is the same. The detection is not good enough to upload any decent sized web site. It made about the same errors.

The test program changes into:

#!/usr/bin/python
import magic
import sys
m = magic.Magic(mime=True)
for f in sys.argv[1:]:
    print(f, m.from_file(f))
@e-t-u
e-t-u commented Jul 31, 2013

I copied the magic file from the project to /etc/magic. I use brand new pip install python-magic. I have a small project where there are 27 html files, 5 css files and 5 js files
these are detected as:

27 html files:

  • 21 text/html
  • 2 text/x-c++
  • 4 application/javascript (there are couple of lines Google analytics etc. code identically in all documents)

5 CSS files:

  • 2 text/css
  • 2 text/plain
  • 1 text/troff

3 Javascript files:

  • all text/plain

No browser works with any file with wrong MIME type.

It is not practical to upload 500 pages static web site without fallback to reliable file name detection mode. In S3 you have to correct MIME types for each file individually and it takes forever.

@mdomsch
Member
mdomsch commented Mar 16, 2014

--no-mime-magic appears to be the solution here. Closing.

@mdomsch mdomsch closed this Mar 16, 2014
@jamshid
jamshid commented Apr 30, 2014

Do you mean "--no-mime-magic --guess-mime-type" will consistently upload foo.js with "Content-type: application/javascript"? Perfect. Unfortunately I didn't know about the latter option, so I switched from "s3cmd sync" to http://www.jets3t.org/ "synchronize". It has the advantage that it can do many uploads in parallel (with below settings). Example, in case this helps someone else:

# Synchronize runs fast (parallel) and sets correct Content-type
cat > ./synchronize.properties << EOF
s3service.https-only=false
s3service.s3-endpoint=${S3DOMAIN_HOST}
s3service.s3-endpoint-http-port=8085
accesskey=${S3TOKEN}
secretkey=secret
httpclient.max-connections=25
storage-service.admin-max-thread-count=25
threaded-service.max-thread-count=25
EOF
sh /opt/jets3t-0.9.0/bin/synchronize.sh  --properties ./synchronize.properties up mybucket/app ./app/*
@mpybkk
mpybkk commented Sep 18, 2014

This is marked as resolved, but when using the command like "s3cmd --recursive modify" it always sets the mime type to "binary/octet-stream"

I have tried numerous options, like --no-mime-magic --guess-mime-type, removing values from the config. It continuously overrides any mime settings already set for my file on S3.

@gf3
Contributor
gf3 commented Oct 23, 2014

I'm also experiencing the same issue as @mpybkk

@gf3
Contributor
gf3 commented Oct 23, 2014

Even after I explicitly set --mime-type="xxx"

@jamiesonbecker

This problem continues in Ubuntu 14.04 LTS; be sure to sudo apt-get remove python-magic and sudo pip uninstall python-magic. @mdomsch still an open issue - should be left open?

@traviscollins

Use the --no-mime-magic option, and s3cmd will not use python-magic to guess a mime type.

@jamiesonbecker

good advice @traviscollins. @mpybkk indicated that this problem seems to persist even with --no-mime-magic set, so --no-mime-magic doesn't seem to be doing what it's supposed to. Maybe just being available via import is causing a side effect.

Incidentally, this seems to be a problem with python-magic upstream, which may imply a libmagic regression; new bug: ahupp/python-magic#75

@traviscollins

I'm using a Centos 6.6 machine to recursively deploy files from a Jenkins build output to S3 using s3cmd. Without --no-mime-magic, a .css file was being set as text/x-c++. I confirmed the detection using python-magic (simple python script to dump value) and then learned that python-magic is just a wrapper for libmagic, which is exposed through the unix file command. I found that file -i style.css was the culprit (it returns text/x-css++ for one very long css file, and text/plain for another).

When use the --no-mime-magic -M flags together, the S3 file is set with the correct text/css mime-type.

@jamiesonbecker

-M, --guess-mime-type, is not available in older versions of s3cmd. There might be some interactions due to recent changes to s3cmd. Perhaps @e-t-u (or @gf3 /@mpybkk) can confirm if it's still a problem w/ pip installed s3cmd.

@monikabhadauria

I am also facing this issue as @mpybkk , when I use --no-mime-magic and -M together , it overrides sthe mime type to "binary/octet-stream" for all files except jss and css.

Any, update.

@gf3
Contributor
gf3 commented May 1, 2015

@monikabhadauria i've switched to manually setting the mime-type for files. there was a bug related to that, but it's fixed now.

@jamiesonbecker

@monikabhadauria @gf3 That seems like it might be a slightly different issue. Run these tests and mention if you see the same behavior.

[root@localhost css]# file -i main-style.css
main-style.css: text/x-c++; charset=us-ascii
[root@ localhost css]# cat /etc/redhat-release
CentOS release 6.6 (Final)

See bug reports in downstream:
ahupp/python-magic#75 (comment)
s3tools/s3cmd#198 (comment)

Another option might be aws cp.

@cmc333333 cmc333333 referenced this issue in forumone/letgirlslearn Jun 4, 2015
Merged

Specify no-magic so building works on linux machines #8

@sebfindling

I'm having this issue too, and haven't been able to solve it with any command suggested here. Also using Ubuntu on AWS. Of 2 CSS files, one gets text/css and the other, text/plain. It's a pain since Chrome doesn't render CSS with wrong mime type.

@thraxil thraxil added a commit to ccnmtl/flgstatic that referenced this issue Jun 24, 2015
@thraxil thraxil disable s3cmd's broken 'mime magic'
instead just use simple file extension based mime type guessing.

see: s3tools/s3cmd#198
2e67370
@liljenstolpe liljenstolpe pushed a commit to liljenstolpe/www.asgaard.org that referenced this issue Sep 10, 2015
Christopher LILJENSTOLPE added no-mime-magic to get css detection working again: s3tools/s3cmd… 702f078
@marvinpinto marvinpinto added a commit to marvinpinto/disjoint.ca that referenced this issue Nov 22, 2015
@marvinpinto marvinpinto Use the s3_website gem to sync the contents of the website
Essentially ran into this problem: s3tools/s3cmd#198

This manifested in css files being transmitted with a mime type of
text/plain, and which resulted in browsers not rendering these CSS files
properly, with a final result of the site looking like 💩
80555fd
@marvinpinto marvinpinto added a commit to marvinpinto/disjoint.ca that referenced this issue Nov 22, 2015
@marvinpinto marvinpinto Use the s3_website gem to sync the contents of the website
The problem I was running into was that css files were being downloaded
as text/plain files, and browsers did what one would expect them to do
with a text/plain file, which is download and save it.

Essentially this: s3tools/s3cmd#198

This manifested in css files being transmitted with a mime type of
text/plain, and which resulted in browsers not rendering these CSS files
properly, with a final result of the site looking like 💩
dd28d16
@3xp0n3nt 3xp0n3nt added a commit to forerunnergames/peril that referenced this issue Mar 25, 2016
@3xp0n3nt 3xp0n3nt Travis: Disable broken s3cmd mime type magic.
- Use simple file extension based mime type guessing as a workaround for
  s3tools/s3cmd#198

- This will cause *.css & other file types to be properly recognized.
  The original issue causes html pages to be displayed without any css
  styling, among other problems.
92635e7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment