process-pst: Command-line program for turning PSTs into email+loadfiles

Copyright (c) 2010 Aranetic LLC.

process-pst is a command-line tool that does basic, first-pass processing for PST files.

process-pst custodian1.pst custodian1

This will create the directory custodian1 (if it doesn't already exist), and an EDRM XML loadfile custodian1/edrm-loadfile.xml. Any emails found in custodian1 will be converted to RFC822 format and stored in custodian1, and any attachments will be extracted.

We are also interested in supporting simple text extraction and other loadfile formats, including Concordance- and Summation-compatible loadfiles. Your patches are extremely welcome!

Note that process-pst is distributed under a "share and share alike" license: If you distribute copies of process-pst, you must make the source code available under the terms of the GNU Affero General Public License, version 3. And if you modify process-pst and allow other people to interact with it over a network, you have certain obligations to provide them with the modified source code. You may have certain other obligations. See the full text of the license for details. You can think of this as the vendor version of pro bono work on behalf of the larger legal community.


This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License (the "License") as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but it is provided on an "AS-IS" basis and WITHOUT ANY WARRANTY; without even the implied warranties of MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see

Downloading the source code

git clone
cd process-pst
git submodule update --init


process-pst uses CMake to manage the build process.


First, use MacPorts to install Boost 1.42, GCC 4.4, CMake 2.8 and iconv:

sudo port install boost @1.42.0
sudo port install gcc44 cmake libiconv

To run the unit tests, you will also want to install Ruby, rubygems, and bundler.

On 10.6, gcc44 (as of 13 Jul 2010) currently inappropriately builds/installs libgcc_s, see Move these out of the way:

sudo mkdir /opt/local/lib/gcc44/hidden
sudo mv /opt/local/lib/gcc44/libgcc_s* /opt/local/lib/gcc44/hidden/

Then, install the necessary Ruby gems and build using CMake:

bundle install
CC=gcc-mp-4.4 CXX=g++-mp-4.4 cmake .

cmake may also find the system libiconv instead of the MacPorts one, and fail to link. If so, replace the ICONV_LIBRARY line in CMakeCache.txt with one pointing at MacPorts, as follows:


and rerun the cmake to regenerate the build files, and make again.


These instructions have been tested on a pristine Ubuntu 10.04 system created using Amazon's EC2 service and the 32-bit ami-2d4aa444.

First, set up your system with the necessary compilers, libraries and gems:

sudo apt-get install cmake g++-4.4 ruby ruby-dev build-essential \
  libxml2-dev libxslt-dev git-core
tar xzf rubygems-1.3.7.tgz
(cd rubygems-1.3.7 && sudo ruby setup.rb)
sudo gem1.8 install bundler -v 0.9.26

Boost must be installed manually, using GCC 4.4 as describe here:

tar xzf boost_1_42_0.tar.gz
cd boost_1_42_0
echo "using gcc : 4.4 : /usr/bin/g++-4.4 ; " >> tools/build/v2/user-config.jam
./ --prefix=/opt/boost --without-libraries=python
./bjam --toolset=gcc-4.4
sudo ./bjam install
cd ..

Next, check out process-pst:

git clone git://
cd process-pst
git submodule update --init

Then, install the necessary Ruby gems and build using CMake:

bundle install
CC=gcc-4.4 CXX=g++-4.4 cmake -D BOOST_ROOT=/opt/boost .

Running the unit tests

You can run the unit tests using CMake:


All the tests should pass.


Convert PST files to RCF822 *.eml files and generate electronic discovery load files.







