Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't open a .docx file #229

Closed
song147liang opened this issue Nov 25, 2015 · 9 comments
Closed

can't open a .docx file #229

song147liang opened this issue Nov 25, 2015 · 9 comments

Comments

@song147liang
Copy link

The code is like this,

from docx import Document
from docx.shared import Inches
document=Document('123.docx')

but i get an error like this,

Traceback (most recent call last):
File "C:/Python27/myDocx.py", line 5, in
document=Document('123.docx')
File "C:\Python27\lib\site-packages\python_docx-0.8.5-py2.7.egg\docx\api.py", line 28, in Document
raise ValueError(tmpl % (docx, document_part.content_type))
ValueError: file '123.docx' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml'

How to handle this problem?

@scanny
Copy link
Contributor

scanny commented Nov 25, 2015

What application did you use to create 123.docx?

And does it open up okay in Microsoft Word?

That's an odd content type for it to have. It sounds like a Microsoft Office theme file, one that only contains color palettes and perhaps certain styles.

@kmarcello
Copy link

The same issue here and I have used Microsoft office to create a name.docx

@lieuzhenghong
Copy link

lieuzhenghong commented Jan 24, 2017

@scanny, I have the same issue and the file opens perfectly fine in LibreOffice Writer, Microsoft Word online and Google Docs (after conversion).

ValueError: file './form_letter.doc' is not a Word file, content type is 'application/vnd.openxmlformats-officedocument.themeManager+xml'

I tried to view the XML file with opc browse file.doc core.xml but this gives KeyError: No item with name 'core.xml'.

Could it be because the version of Microsoft Word that was used to build the file was too old?

@scanny
Copy link
Contributor

scanny commented Jan 24, 2017

The opc command should be:

opc browse file.docx \[Content_Types\].xml

It's odd for the file to end with .doc. Usually that means it's saved in the legacy pre-Word 2007 format.

@lieuzhenghong
Copy link

lieuzhenghong commented Jan 24, 2017

It's odd for the file to end with .doc. Usually that means it's saved in the legacy pre-Word 2007 format.

Does this library not work for the legacy .doc format?

Below are the results of running opc browse file.docx \[Content_Types\].xml:

opc browse form_letter.doc \[Content_Types\].xml
<?xml version=\'1.0\' encoding=\'UTF-8\' standalone=\'yes\'?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">  
  <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Default Extension="xml" ContentType="application/xml"/>
  <Override PartName="/theme/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/theme/theme/themeManager.xml" ContentType="application/vnd.openxmlformats-officedocument.themeManager+xml"/>
</Types>

@lieuzhenghong
Copy link

@scanny Sorry, I RTFM-d and

You can open any Word 2007 or later file this way (.doc files from Word 2003 and earlier won’t work). While you might not be able to manipulate all the contents yet, whatever is already in there will load and save just fine.

I converted the .doc file to .docx and now it works. Thank you!

@scanny
Copy link
Contributor

scanny commented Jan 24, 2017

Glad you got it working :)

@scanny scanny closed this as completed Jan 24, 2017
@mohammedyunus009
Copy link

Helps convert doc to docx. this code u may include in your script

import glob
import subprocess

for doc in glob.iglob("*.doc"):
    subprocess.call(['soffice', '--headless', '--convert-to', 'docx', doc])

DISCLAIMER: it will work only on linux(ubuntu )

@shwangdev
Copy link

where can I download the "soffice" binary for windows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants