Converts presentation files (PowerPoint/Impress, or anything else that LibreOffice can read as a presentation) to Beamer-friendly, Pandoc-style markdown.
It's a two part conversion: one script (pptx-to-yaml.py
)
converts from pptx (or ppt, odp etc) into an intermediate
YAML format, then another (yaml-to-md.py
) converts the
YAML to Pandoc-style markdown.
As a convenience, a Bash wrapper script is provided (converter.sh
)
which calls both of these, and handles a few other graphics-conversion
tasks (such as converting SVG files, which LaTeX can't natively read,
to encapsulated PostScript files, which it can).
$ ./pptx-to-yaml.py [--use-server HOST:PORT] INPUT_FILE OUTPUT_FILE IMAGE_DIR
INPUT_FILE
is the path to some ppt, pptx or Impress file.
OUTPUT_FILE
is the name of the YAML file to be written.
IMAGE_DIR
is a directory where images will be extracted to.
It will be created if it doesn't exist.
See soffice-server below for details of
what --use-server
is for.
Title text, "outline" text (i.e. bullets) and embedded graphics like JPEGs or PNGs are handled reasonably well. Formatting such as italics, bold or colouring of text is not preserved. Nor are numbered lists - they're converted into bulleted lists.
Embedded "metafiles" (EMF or WMF vector graphics)
should get converted to SVG. (And thence to EPS, if you
use convert.sh
.)
If it finds any tables, drawing shapes (arrows/boxes etc),
pptx-to-yaml.py
tries to collect them all together
and export them as an SVG.
pptx-to-yaml.py
attempts to start an soffice
process
and communicate with it over port 2002 on the local host;
it's the soffice
process that knows how to read
and manipulate PowerPoint files.
However, the HOST:PORT
arguments can be supplied if you prefer
to run your own instance of soffice
as a separate process.
Which you might want to, since:
a. If you have a lot of files to convert, you can just
keep one soffice
process running, and re-use it,
avoiding the time taken to start a new process for
each document.
b. Sometimes pptx-to-yaml.py
just doesn't seem to
start the soffice
process up correctly - I have no idea why.
So you could start the server process using something like the following:
$ xterm -e 'soffice --accept="socket,host=localhost,port=2002;urp;" \
--norestore --nologo --nodefault --headless' &
... which will open an soffice
instance running in its own terminal
window; and then specify HOST
and PORT
to pptx-to-yaml.py
.
$ ./yaml-to-md.py INPUT_FILE OUTPUT_FILE
Just takes an input file and output file.
usage:
$ ./convert.sh [INPUT_FILE..]
Convenience wrapper around pptx-to-yaml and yaml-to-md. Also converts SVG files to encapsulated PostScript (EPS) for use by LaTeX, and attempts to use Pandoc to create LaTeX and PDF files. (If it fails, that means the .md file needs some tidying, so the PDF file just isn't produced.)
For pptx-to-yaml.py
and yaml-to-md.py
:
- Python 3.5 or greater
- LibreOffice 5.1.6. On Ubuntu 16.04 (xenial), this
can be installed with
sudo apt-get install libreoffice
. python3-uno
. On Ubuntu, this can be installed withsudo apt-get install python3-uno
.- pyyaml. Most easily installed with something like
pip3 install --user pyyaml
.
For convert.sh
:
- Requires bash, sed, Inkscape (for converting SVG to EPS) and Pandoc (for converting .md to .tex or .pdf).
Exported/graphics files are all referred to by absolute pathname, so if you want to move your generated files around, you'll have to edit any references to them in the YAML/markdown, as appropriate.
Not at all portable, and not tested on any other platform other than Ubuntu 16.04, nor with any other version of LibreOffice than 5.1.6.
You can if you want, but there's no guarantee I'll fix them. The scripts are really just offered as a starting point for anyone else who wants to improve them.
This software is in the public domain. Do with it what you will.
If you manage to improve it, it would be nice to hear from
you. Try contacting me on Twitter, handle
@phlummox
.
If you get some error saying pptx-to-yaml.py
couldn't connect to the
server -- kill any stray soffice process and try again.
If it still fails, possibly add a bigger time.sleep
in the script, or just run your own server process.
Try opening them in lodraw.
Lots of things could have gone wrong. By default, Pandoc uses pdflatex, which will choke on many Unicode symbols. Graphics might not have converted. etc.
The only thing to do is take a look at the original PowerPoint file, and the generated markdown, and see if you can fix whatever went wrong.