New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIME type detection does not work with python-magic (Ubuntu 13.10) #198

Closed
e-t-u opened this Issue Jul 17, 2013 · 19 comments

Comments

Projects
None yet
@e-t-u

e-t-u commented Jul 17, 2013

s3cmd does not work with Ubuntu 13.10 python-magic. I assume that the problem is also in other systems but I have not tried.

The MIME type of uploaded files is random. None of my CSS files gets the right MIME type, big part of HTML files gets also wrong MIME type. Typical wrong HTML MIME is text/x-c++ ja CSS gets MIME text/x-asm or text/troff.

This is naturally a showstopper when uploading large web servers.

As a workaround, if you remove package python-magic, s3cmd falls back to simple detection that seems to work.

@praveenmarkandu

This comment has been minimized.

Show comment
Hide comment
@praveenmarkandu

praveenmarkandu Jul 23, 2013

I can confirm this on Ubuntu 12.04. Removing python-magic fixes it.

praveenmarkandu commented Jul 23, 2013

I can confirm this on Ubuntu 12.04. Removing python-magic fixes it.

@e-t-u

This comment has been minimized.

Show comment
Hide comment
@e-t-u

e-t-u Jul 31, 2013

I verified that the problem is within package python-magic in Ubuntu. The following test script gives the same useless interpretation for HTML and CSS files:

#!/usr/bin/python                                                                                                                                                     
import magic
import sys
m = magic.open(magic.MIME_TYPE)
m.load()
for f in sys.argv[1:]:
    print(f, m.file(f))

The problem with s3cmd is that there is no option to force fallback to mimetypes.guess_type(file). This option should be on by default in Ubuntu distribution. Filename based quessing gives much better estimates for MIME types than this version of python-magic.

e-t-u commented Jul 31, 2013

I verified that the problem is within package python-magic in Ubuntu. The following test script gives the same useless interpretation for HTML and CSS files:

#!/usr/bin/python                                                                                                                                                     
import magic
import sys
m = magic.open(magic.MIME_TYPE)
m.load()
for f in sys.argv[1:]:
    print(f, m.file(f))

The problem with s3cmd is that there is no option to force fallback to mimetypes.guess_type(file). This option should be on by default in Ubuntu distribution. Filename based quessing gives much better estimates for MIME types than this version of python-magic.

@e-t-u

This comment has been minimized.

Show comment
Hide comment
@e-t-u

e-t-u Jul 31, 2013

Next try:

pip install python-magic

Result is the same. The detection is not good enough to upload any decent sized web site. It made about the same errors.

The test program changes into:

#!/usr/bin/python
import magic
import sys
m = magic.Magic(mime=True)
for f in sys.argv[1:]:
    print(f, m.from_file(f))

e-t-u commented Jul 31, 2013

Next try:

pip install python-magic

Result is the same. The detection is not good enough to upload any decent sized web site. It made about the same errors.

The test program changes into:

#!/usr/bin/python
import magic
import sys
m = magic.Magic(mime=True)
for f in sys.argv[1:]:
    print(f, m.from_file(f))
@e-t-u

This comment has been minimized.

Show comment
Hide comment
@e-t-u

e-t-u Jul 31, 2013

I copied the magic file from the project to /etc/magic. I use brand new pip install python-magic. I have a small project where there are 27 html files, 5 css files and 5 js files
these are detected as:

27 html files:

  • 21 text/html
  • 2 text/x-c++
  • 4 application/javascript (there are couple of lines Google analytics etc. code identically in all documents)

5 CSS files:

  • 2 text/css
  • 2 text/plain
  • 1 text/troff

3 Javascript files:

  • all text/plain

No browser works with any file with wrong MIME type.

It is not practical to upload 500 pages static web site without fallback to reliable file name detection mode. In S3 you have to correct MIME types for each file individually and it takes forever.

e-t-u commented Jul 31, 2013

I copied the magic file from the project to /etc/magic. I use brand new pip install python-magic. I have a small project where there are 27 html files, 5 css files and 5 js files
these are detected as:

27 html files:

  • 21 text/html
  • 2 text/x-c++
  • 4 application/javascript (there are couple of lines Google analytics etc. code identically in all documents)

5 CSS files:

  • 2 text/css
  • 2 text/plain
  • 1 text/troff

3 Javascript files:

  • all text/plain

No browser works with any file with wrong MIME type.

It is not practical to upload 500 pages static web site without fallback to reliable file name detection mode. In S3 you have to correct MIME types for each file individually and it takes forever.

@mdomsch

This comment has been minimized.

Show comment
Hide comment
@mdomsch

mdomsch Mar 16, 2014

Member

--no-mime-magic appears to be the solution here. Closing.

Member

mdomsch commented Mar 16, 2014

--no-mime-magic appears to be the solution here. Closing.

@mdomsch mdomsch closed this Mar 16, 2014

@jamshid

This comment has been minimized.

Show comment
Hide comment
@jamshid

jamshid Apr 30, 2014

Do you mean "--no-mime-magic --guess-mime-type" will consistently upload foo.js with "Content-type: application/javascript"? Perfect. Unfortunately I didn't know about the latter option, so I switched from "s3cmd sync" to http://www.jets3t.org/ "synchronize". It has the advantage that it can do many uploads in parallel (with below settings). Example, in case this helps someone else:

# Synchronize runs fast (parallel) and sets correct Content-type
cat > ./synchronize.properties << EOF
s3service.https-only=false
s3service.s3-endpoint=${S3DOMAIN_HOST}
s3service.s3-endpoint-http-port=8085
accesskey=${S3TOKEN}
secretkey=secret
httpclient.max-connections=25
storage-service.admin-max-thread-count=25
threaded-service.max-thread-count=25
EOF
sh /opt/jets3t-0.9.0/bin/synchronize.sh  --properties ./synchronize.properties up mybucket/app ./app/*

jamshid commented Apr 30, 2014

Do you mean "--no-mime-magic --guess-mime-type" will consistently upload foo.js with "Content-type: application/javascript"? Perfect. Unfortunately I didn't know about the latter option, so I switched from "s3cmd sync" to http://www.jets3t.org/ "synchronize". It has the advantage that it can do many uploads in parallel (with below settings). Example, in case this helps someone else:

# Synchronize runs fast (parallel) and sets correct Content-type
cat > ./synchronize.properties << EOF
s3service.https-only=false
s3service.s3-endpoint=${S3DOMAIN_HOST}
s3service.s3-endpoint-http-port=8085
accesskey=${S3TOKEN}
secretkey=secret
httpclient.max-connections=25
storage-service.admin-max-thread-count=25
threaded-service.max-thread-count=25
EOF
sh /opt/jets3t-0.9.0/bin/synchronize.sh  --properties ./synchronize.properties up mybucket/app ./app/*
@mpybkk

This comment has been minimized.

Show comment
Hide comment
@mpybkk

mpybkk Sep 18, 2014

This is marked as resolved, but when using the command like "s3cmd --recursive modify" it always sets the mime type to "binary/octet-stream"

I have tried numerous options, like --no-mime-magic --guess-mime-type, removing values from the config. It continuously overrides any mime settings already set for my file on S3.

mpybkk commented Sep 18, 2014

This is marked as resolved, but when using the command like "s3cmd --recursive modify" it always sets the mime type to "binary/octet-stream"

I have tried numerous options, like --no-mime-magic --guess-mime-type, removing values from the config. It continuously overrides any mime settings already set for my file on S3.

@gf3

This comment has been minimized.

Show comment
Hide comment
@gf3

gf3 Oct 23, 2014

Contributor

I'm also experiencing the same issue as @mpybkk

Contributor

gf3 commented Oct 23, 2014

I'm also experiencing the same issue as @mpybkk

@gf3

This comment has been minimized.

Show comment
Hide comment
@gf3

gf3 Oct 23, 2014

Contributor

Even after I explicitly set --mime-type="xxx"

Contributor

gf3 commented Oct 23, 2014

Even after I explicitly set --mime-type="xxx"

@jamiesonbecker

This comment has been minimized.

Show comment
Hide comment
@jamiesonbecker

jamiesonbecker Oct 27, 2014

This problem continues in Ubuntu 14.04 LTS; be sure to sudo apt-get remove python-magic and sudo pip uninstall python-magic. @mdomsch still an open issue - should be left open?

jamiesonbecker commented Oct 27, 2014

This problem continues in Ubuntu 14.04 LTS; be sure to sudo apt-get remove python-magic and sudo pip uninstall python-magic. @mdomsch still an open issue - should be left open?

@traviscollins

This comment has been minimized.

Show comment
Hide comment
@traviscollins

traviscollins Mar 11, 2015

Use the --no-mime-magic option, and s3cmd will not use python-magic to guess a mime type.

traviscollins commented Mar 11, 2015

Use the --no-mime-magic option, and s3cmd will not use python-magic to guess a mime type.

@jamiesonbecker

This comment has been minimized.

Show comment
Hide comment
@jamiesonbecker

jamiesonbecker Mar 11, 2015

good advice @traviscollins. @mpybkk indicated that this problem seems to persist even with --no-mime-magic set, so --no-mime-magic doesn't seem to be doing what it's supposed to. Maybe just being available via import is causing a side effect.

Incidentally, this seems to be a problem with python-magic upstream, which may imply a libmagic regression; new bug: ahupp/python-magic#75

jamiesonbecker commented Mar 11, 2015

good advice @traviscollins. @mpybkk indicated that this problem seems to persist even with --no-mime-magic set, so --no-mime-magic doesn't seem to be doing what it's supposed to. Maybe just being available via import is causing a side effect.

Incidentally, this seems to be a problem with python-magic upstream, which may imply a libmagic regression; new bug: ahupp/python-magic#75

@traviscollins

This comment has been minimized.

Show comment
Hide comment
@traviscollins

traviscollins Mar 11, 2015

I'm using a Centos 6.6 machine to recursively deploy files from a Jenkins build output to S3 using s3cmd. Without --no-mime-magic, a .css file was being set as text/x-c++. I confirmed the detection using python-magic (simple python script to dump value) and then learned that python-magic is just a wrapper for libmagic, which is exposed through the unix file command. I found that file -i style.css was the culprit (it returns text/x-css++ for one very long css file, and text/plain for another).

When use the --no-mime-magic -M flags together, the S3 file is set with the correct text/css mime-type.

traviscollins commented Mar 11, 2015

I'm using a Centos 6.6 machine to recursively deploy files from a Jenkins build output to S3 using s3cmd. Without --no-mime-magic, a .css file was being set as text/x-c++. I confirmed the detection using python-magic (simple python script to dump value) and then learned that python-magic is just a wrapper for libmagic, which is exposed through the unix file command. I found that file -i style.css was the culprit (it returns text/x-css++ for one very long css file, and text/plain for another).

When use the --no-mime-magic -M flags together, the S3 file is set with the correct text/css mime-type.

@jamiesonbecker

This comment has been minimized.

Show comment
Hide comment
@jamiesonbecker

jamiesonbecker Mar 11, 2015

-M, --guess-mime-type, is not available in older versions of s3cmd. There might be some interactions due to recent changes to s3cmd. Perhaps @e-t-u (or @gf3 /@mpybkk) can confirm if it's still a problem w/ pip installed s3cmd.

jamiesonbecker commented Mar 11, 2015

-M, --guess-mime-type, is not available in older versions of s3cmd. There might be some interactions due to recent changes to s3cmd. Perhaps @e-t-u (or @gf3 /@mpybkk) can confirm if it's still a problem w/ pip installed s3cmd.

@monikabhadauria

This comment has been minimized.

Show comment
Hide comment
@monikabhadauria

monikabhadauria May 1, 2015

I am also facing this issue as @mpybkk , when I use --no-mime-magic and -M together , it overrides sthe mime type to "binary/octet-stream" for all files except jss and css.

Any, update.

monikabhadauria commented May 1, 2015

I am also facing this issue as @mpybkk , when I use --no-mime-magic and -M together , it overrides sthe mime type to "binary/octet-stream" for all files except jss and css.

Any, update.

@gf3

This comment has been minimized.

Show comment
Hide comment
@gf3

gf3 May 1, 2015

Contributor

@monikabhadauria i've switched to manually setting the mime-type for files. there was a bug related to that, but it's fixed now.

Contributor

gf3 commented May 1, 2015

@monikabhadauria i've switched to manually setting the mime-type for files. there was a bug related to that, but it's fixed now.

@jamiesonbecker

This comment has been minimized.

Show comment
Hide comment
@jamiesonbecker

jamiesonbecker May 1, 2015

@monikabhadauria @gf3 That seems like it might be a slightly different issue. Run these tests and mention if you see the same behavior.

[root@localhost css]# file -i main-style.css
main-style.css: text/x-c++; charset=us-ascii
[root@ localhost css]# cat /etc/redhat-release
CentOS release 6.6 (Final)

See bug reports in downstream:
ahupp/python-magic#75 (comment)
s3tools/s3cmd#198 (comment)

Another option might be aws cp.

jamiesonbecker commented May 1, 2015

@monikabhadauria @gf3 That seems like it might be a slightly different issue. Run these tests and mention if you see the same behavior.

[root@localhost css]# file -i main-style.css
main-style.css: text/x-c++; charset=us-ascii
[root@ localhost css]# cat /etc/redhat-release
CentOS release 6.6 (Final)

See bug reports in downstream:
ahupp/python-magic#75 (comment)
s3tools/s3cmd#198 (comment)

Another option might be aws cp.

@sebolio

This comment has been minimized.

Show comment
Hide comment
@sebolio

sebolio Jun 8, 2015

I'm having this issue too, and haven't been able to solve it with any command suggested here. Also using Ubuntu on AWS. Of 2 CSS files, one gets text/css and the other, text/plain. It's a pain since Chrome doesn't render CSS with wrong mime type.

sebolio commented Jun 8, 2015

I'm having this issue too, and haven't been able to solve it with any command suggested here. Also using Ubuntu on AWS. Of 2 CSS files, one gets text/css and the other, text/plain. It's a pain since Chrome doesn't render CSS with wrong mime type.

thraxil added a commit to ccnmtl/flgstatic that referenced this issue Jun 24, 2015

disable s3cmd's broken 'mime magic'
instead just use simple file extension based mime type guessing.

see: s3tools/s3cmd#198

liljenstolpe added a commit to liljenstolpe/www.asgaard.org that referenced this issue Sep 10, 2015

marvinpinto added a commit to marvinpinto/disjoint.ca that referenced this issue Nov 22, 2015

Use the s3_website gem to sync the contents of the website
Essentially ran into this problem: s3tools/s3cmd#198

This manifested in css files being transmitted with a mime type of
text/plain, and which resulted in browsers not rendering these CSS files
properly, with a final result of the site looking like 💩

marvinpinto added a commit to marvinpinto/disjoint.ca that referenced this issue Nov 22, 2015

Use the s3_website gem to sync the contents of the website
The problem I was running into was that css files were being downloaded
as text/plain files, and browsers did what one would expect them to do
with a text/plain file, which is download and save it.

Essentially this: s3tools/s3cmd#198

This manifested in css files being transmitted with a mime type of
text/plain, and which resulted in browsers not rendering these CSS files
properly, with a final result of the site looking like 💩

3xp0n3nt added a commit to forerunnergames/peril that referenced this issue Mar 25, 2016

Travis: Disable broken s3cmd mime type magic.
- Use simple file extension based mime type guessing as a workaround for
  s3tools/s3cmd#198

- This will cause *.css & other file types to be properly recognized.
  The original issue causes html pages to be displayed without any css
  styling, among other problems.
@idmontie

This comment has been minimized.

Show comment
Hide comment
@idmontie

idmontie Sep 2, 2017

Decided to try to install python-magic on my Docker image so that I would stop getting the warning from s3cmd, but it just marked most of my CSS, HTML, and JS files as plain/text.

idmontie commented Sep 2, 2017

Decided to try to install python-magic on my Docker image so that I would stop getting the warning from s3cmd, but it just marked most of my CSS, HTML, and JS files as plain/text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment