Palabra

Tools for leaving Microsoft Word behind, among other things.

note: I'm just a writer looking for tools to help other writers do more powerful things with their text. This is my first Github project. I'm new to all of this and I could use your help! I've blogged about my thoughts: http://nocategories.net/tag/text-processing/ http://nocategories.net/ephemera/leaving-word-behind/

##Tools to do the Work

Pandoc is a powerful command-line tool for converting many types of written documents, from one file format to another. (Pandoc is for Windows, Mac or Linux.) So I'll just use Pandoc for this, right? There's a catch. Pandoc doesn't work very well with .doc files and I still have many .doc files. Before I can use Pandoc to convert all my files into a text-based format, I'll need to use something first, to convert everything to a format that Pandoc can read.

Textutil is a command line utility baked into OS X. (Windows users, please chime in with any tools for windows that might help!) Textutil is similar to Pandoc, in that it can convert written documents from several formats into other formats, but although it can convert to .txt, it doesn't understand the Markdown formatting syntax, so I can't use Textutil to create the final product, unless I want to lose all my formatting. I don't have much formatting to lose, but still, that's not an option.

The trick, then, is to use Textutil to convert .doc files into .html files, and then to use Pandoc to convert .html files into .txt files with Markdown.

I saw some forum posts that suggest that the following Terminal command might work (on OS X) as a way to combine Textutil with Pandoc:

find . -name '*.doc' -print0 | xargs -0 sh -c 'textutil -convert html "$0" -stdout | pandoc -f html -t markdown -o "${0%.*}.md"'

…but I couldn't get that to work for one file, let alone for dozens. I saw another post that said that bash loops might do the trick, and they did, but the examples weren't written for Pandoc so I wrote some code...

##Text Conversion Workflow

Install Pandoc
Grab the two shell scripts that I wrote. I've posted them to Github and cleverly named them "Palabra".
Install the two files convert1.sh and convert2.sh into a directory full of .doc files that you would like to convert into Markdown-flavoured .txt files.
Point your terminal to that directory and from the terminal type "sh convert1.sh" This will convert all the .doc files in the directory into html files. (edit the file to say .docx if that's the kind you want to change)
then type "sh convert2.sh". This will convert all the html files you just made in step 2 into Markdown-flavoured .txt files.
Done!

… well, almost done. At this point, you've converted all your .doc files into Markdown-flavoured .txt files. To handle the .docx files, just edit line 2 of convert1.sh to read .docx instead of .doc and repeat the steps. (You should be able to do the same thing with .rtf but I haven't tried it yet.)

One Word Document, Many Texts

One of my word .docx files was special in that it contains a copy of every one of my poems (a couple hundred, maybe?). That has gotten to be cumbersome after a few years, so I've decided to convert it into a set of text files. For extra credit, I wanted to name each file according to the first line of text, which in my case happened to be the title of the poem. I've lost the order that the poems were in, for now, but that wasn't so important to me anyway.

Here are some tips and tricks that helped me along the way

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
01_convert-word-to-markdown		01_convert-word-to-markdown
02_extras		02_extras
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_convert-word-to-markdown

01_convert-word-to-markdown

02_extras

02_extras

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Palabra

One Word Document, Many Texts

other useful links

About

Releases

Packages

License

jenguiliano/Palabra

Folders and files

Latest commit

History

Repository files navigation

Palabra

One Word Document, Many Texts

other useful links

About

Resources

License

Stars

Watchers

Forks