Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract embedded attachments from Microsoft Office documents #4861

Closed
garethrees opened this issue Sep 21, 2018 · 2 comments
Closed

Extract embedded attachments from Microsoft Office documents #4861

garethrees opened this issue Sep 21, 2018 · 2 comments
Labels
f:request-analysis improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) x:uk

Comments

@garethrees
Copy link
Member

We've seen a couple of cases (example) where authorities send a word document (.docx) with embedded attachments that seem impossible to open, presumably unless you have some specific version of Office.

It turns out you can extract them:

  • Change extension from .docx to .zip
  • unzip PATH_TO_FILE.zip
  • Attachments are in word/embeddings
  • PDFs get extracted with a .bin extension; change them to .pdf
$ tree .
.
├── 20180027329releasedwithredactions.zip
├── [Content_Types].xml
├── _rels
│   └── .rels
├── docProps
│   ├── app.xml
│   └── core.xml
└── word
    ├── _rels
    │   └── document.xml.rels
    ├── document.xml
    ├── embeddings
    │   ├── Microsoft_Word_Document.docx
    │   ├── Microsoft_Word_Document1.docx
    │   ├── Microsoft_Word_Document2.docx
    │   ├── Microsoft_Word_Document3.docx
    │   ├── oleObject1.bin
    │   ├── oleObject2.bin
    │   └── oleObject3.bin
    ├── fontTable.xml
    ├── media
    │   ├── image1.emf
    │   ├── image2.emf
    │   ├── image3.emf
    │   ├── image4.emf
    │   ├── image5.emf
    │   ├── image6.emf
    │   └── image7.emf
    ├── settings.xml
    ├── styles.xml
    ├── theme
    │   └── theme1.xml
    └── webSettings.xml

7 directories, 26 files

You can do this without the command line, but I ran in to the problem of the .zip extracting to a .cpgz. To extract this you can try downloading with a different browser (didn't work for me) or download The Unarchiver. (Via http://osxdaily.com/2013/02/13/open-zip-cpgz-file/)

@garethrees garethrees added x:uk f:request-analysis improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) 0 - backlog labels Sep 21, 2018
@garethrees
Copy link
Member Author

Duplicate of #62

@aravindkathiroju
Copy link

This doesn’t work for some reason. I have an outlook oleobject and when I change it extension
To .msg , outlook com interface doesn’t recognise or read it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f:request-analysis improvement Improves existing functionality (UI tweaks, refactoring, performance, etc) x:uk
Projects
None yet
Development

No branches or pull requests

2 participants