diff --git a/README.md b/README.md
index 64ad6fd..434c02a 100644
--- a/README.md
+++ b/README.md
@@ -112,7 +112,12 @@ There are also other algorithms for building the BWT for large read collections
 * `BlockArray` uses now 8 MB blocks instead of 1 MB blocks, changing the native file format.
 * More space-efficient rank/select construction for the BWT.
 * Formats: RopeBWT (new), faster writing in SGA format.
-* `bwt_merge`: Multiple input files, faster RA/BWT merging, multithreaded verification, adjustable input/output formats and temp directory, better default parameters.
+* `bwt_merge`: Several improvements:
+  * Multiple input files in different formats.
+  * Faster RA/BWT merging.
+  * Multithreaded verification.
+  * Adjustable temp directory.
+  * Better default parameters.
 
 ### Version 0.2.1
 
@@ -145,6 +150,10 @@ There are also other algorithms for building the BWT for large read collections
   * in reverse lexicographic order
   * by position in the reference
 * `bwt_merge`: Option to remove duplicate sequences.
+* `bwt_merge`: Option to write intermediate merge results to temporary files.
+  * the latest result to avoid restarting after crashes
+  * all intermediate results
+* `bwt_merge`: Option to use different merge parameters for each merge.
 * `bwt_convert`: Build rank/select only when necessary.
 * Documentation in the wiki.
 
diff --git a/bwt_convert.cpp b/bwt_convert.cpp
index b3f36e9..090b71b 100644
--- a/bwt_convert.cpp
+++ b/bwt_convert.cpp
@@ -56,9 +56,19 @@ main(int argc, char** argv)
     {
     case 'i':
       input_tag = optarg;
+      if(!formatExists(input_tag))
+      {
+        std::cerr << "bwt_convert: Invalid input format: " << input_tag << std::endl;
+        std::exit(EXIT_FAILURE);
+      }
       break;
     case 'o':
       output_tag = optarg;
+      if(!formatExists(output_tag))
+      {
+        std::cerr << "bwt_convert: Invalid output format: " << output_tag << std::endl;
+        std::exit(EXIT_FAILURE);
+      }
       break;
     case '?':
     default:
@@ -66,17 +76,13 @@ main(int argc, char** argv)
     }
   }
 
-  if(optind < argc) { input_name = argv[optind]; }
-  else
-  {
-    std::cerr << "bwt_convert: Input file not specified" << std::endl;
-  }
-  if(optind + 1 < argc) { output_name = argv[optind + 1]; }
-  else
+  if(optind + 1 >= argc)
   {
     std::cerr << "bwt_convert: Output file not specified" << std::endl;
     std::exit(EXIT_FAILURE);
   }
+  input_name = argv[optind];
+  output_name = argv[optind + 1];
 
   std::cout << "Input:   " << input_name << " (" << input_tag << ")" << std::endl;
   std::cout << "Output:  " << output_name << " (" << output_tag << ")" << std::endl;
diff --git a/bwt_merge.cpp b/bwt_merge.cpp
index 97f4c41..e42e394 100644
--- a/bwt_merge.cpp
+++ b/bwt_merge.cpp
@@ -89,9 +89,22 @@ main(int argc, char** argv)
       break;
     case 'i':
       tokenize(optarg, input_formats, ',');
+      for(size_type i = 0; i < input_formats.size(); i++)
+      {
+        if(!formatExists(input_formats[i]))
+        {
+          std::cerr << "bwt_merge: Invalid input format: " << input_formats[i] << std::endl;
+          std::exit(EXIT_FAILURE);
+        }
+      }
       break;
     case 'o':
       output_format = optarg;
+      if(!formatExists(output_format))
+      {
+        std::cerr << "bwt_merge: Invalid output format: " << output_format << std::endl;
+        std::exit(EXIT_FAILURE);
+      }
       break;
     case '?':
     default:
diff --git a/formats.cpp b/formats.cpp
index 9cb3f2b..9d74533 100644
--- a/formats.cpp
+++ b/formats.cpp
@@ -446,6 +446,18 @@ SGAFormat::write(std::ofstream& out, const BlockArray& data, const NativeHeader&
 
 //------------------------------------------------------------------------------
 
+bool
+formatExists(const std::string& format)
+{
+  return (format == NativeFormat::tag)
+      || (format == PlainFormatD::tag)
+      || (format == PlainFormatS::tag)
+      || (format == RFMFormat::tag)
+      || (format == SDSLFormat::tag)
+      || (format == RopeFormat::tag)
+      || (format == SGAFormat::tag);
+}
+
 void
 printFormats(std::ostream& stream)
 {
diff --git a/formats.h b/formats.h
index 231e5cb..ac827f6 100644
--- a/formats.h
+++ b/formats.h
@@ -157,6 +157,8 @@ struct SGAFormat
 
 //------------------------------------------------------------------------------
 
+bool formatExists(const std::string& format);
+
 void printFormats(std::ostream& stream);
 
 template<class Format>
diff --git a/paper/paper.tex b/paper/paper.tex
index 133695b..a25d750 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -211,7 +211,7 @@
 
 \Section{Implementation}
 
-We have implemented the proposed enhancements to the \BWT{} merging algorithm in a tool (\BWTmerge) intended for merging the \BWT{}s of large collections of short reads. \BWTmerge{} is written in C++, and the source code is available at GitHub.\footnote{\url{https://github.com/jltsiren/bwt-merge}} The implementation is built on top of the \emph{SDSL library} \cite{Gog2014b} and uses the features of C++11 extensively. As a result, it needs a fairly recent C++11 compiler to compile. We have built \BWTmerge{} on Linux and OS~X using g++.
+We have implemented the improved \BWT{} merging algorithm as a tool for merging the \BWT{}s of large collections of short reads. The tool, \BWTmerge{}, is written in C++, and the source code is available at GitHub.\footnote{\url{https://github.com/jltsiren/bwt-merge}} The implementation is built on top of the \emph{SDSL library} \cite{Gog2014b} and uses the features of C++11 extensively. As a result, it needs a fairly recent C++11 compiler to compile. We have successfully built \BWTmerge{} on Linux and OS~X using g++.
 
 The target environment of \BWTmerge{} is a \emph{single node} of a \emph{computer cluster}. The system should have tens of CPU cores and hundreds of gigabytes of memory. The amount of local disk space might not be much larger than memory size, while there can be plenty of shared disk space available. The number of search threads is equal to the number of CPU cores, while the merge phase uses just one producer thread and one consumer thread. By adjusting the sizes of run buffers and thread buffers and the number of merge buffers, \BWTmerge{} should work reasonably well in different environments.
 
@@ -261,9 +261,9 @@
 \item \RS{} is from the \emph{ReadServer project}, which uses all low-coverage and exome data from the phase 3. After error correction, trimming the reads to 73 or 100~bp, and merging the duplicates, there are 53.0~billion unique reads for a total of 4.88~Tbp. The reads are in 16 run-length encoded \BWT{}s built using the \emph{String Graph Assembler} \cite{Simpson2012}, distributed according to the last two bases.
 
 \end{itemize}
-See Table~\ref{table:datasets} for further details on the datasets.
+See Table~\ref{table:datasets} for further details on the datasets. We used a development version of \BWTmerge{} that was essentially equivalent to v0.3 for the experiments. For the other tools, we used the versions that were available on GitHub in October~2015.
 
-\smallbreak\noindent\textbf{Benchmarking.} For benchmarking with different parameter values, we converted four \BWT{} files (AA, TT, AT, and TA) containing a total of 1.49~Tbp from the \RS{} dataset to the native format used by \BWTmerge. Then we merged the \BWT{}s (in the given order). We used 128~MB or 256~MB run buffers and 256~MB or 512~MB thread buffers. The number of merge buffers was 4 or 5 with 512~MB thread buffers and 5 or 6 with 256~MB thread buffers, so that the files on disk were always merged from either 8~GB or 16~GB of thread buffers.
+\smallbreak\noindent\textbf{Benchmarking.} For benchmarking with different parameter values, we converted four \BWT{} files (AA, TT, AT, and TA) containing a total of 1.49~Tbp from the \RS{} dataset to the \emph{native format} of \BWTmerge. This format includes the \BWT{} and the \rank/\select{} structures required by the FM-index. We then merged the \BWT{}s (in the given order). We used 128~MB or 256~MB run buffers and 256~MB or 512~MB thread buffers. The number of merge buffers was 4 or 5 with 512~MB thread buffers and 5 or 6 with 256~MB thread buffers, so that the files on disk were always merged from either 8~GB or 16~GB of thread buffers.
 
 \begin{figure}[t!]
 \begin{center}