Skip to content

officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.

License

Notifications You must be signed in to change notification settings

fbernhart/officeextractor

Repository files navigation

officeextractor

Test Status Build Status Coverage Status
Version Info PyPI Version PyPI Downloads
Compatibility Python Versions
Style Code Style: Black pre-commit

About

officeextractor is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).


Supported File Types

Supported File Types Supported Media Formats
Microsoft Word docx, docm, dotm, dotx images
Microsoft Excel xlsx, xlsb, xlsm, xltm, xltx images
Microsoft PowerPoint potx, ppsm, ppsx, pptm, pptx, potm images, video & audio
LibreOffice Writer odt, ott images
LibreOffice Calc ods, ots images
LibreOffice Impress odp, otp, odg images
NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

pip install officeextractor

Usage

>>> import officeextractor

>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")

4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png

1 media file extracted from Folder/File2.xlsx:
- 1 png
Parameters

officeextractor.extract(src, dest, log=True)

src : str, list of str or tuple of str

Either a single file (string) or several files (list/tuple of strings) as relative or full path.

dest : str

Output directory as relative or full path. If the directory doesn't exist, it will be created.

log : bool, optional

Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.


Release Notes

Can be found here on GitHub


Licence

GNU General Public License v3.0

About

officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages